fix: BexleyCouncil - adapt to async page loading by InertiaUK · Pull Request #2001 · robbrad/UKBinCollectionData

InertiaUK · 2026-05-03T15:04:01Z

Bexley's WasteWorks site moved from server-side meta-refresh to client-side JS polling (via refresh.auto.min.js with data-every="2"). The old approach of requesting ?page_loading=1 without a session or the correct headers no longer returns bin data.

Changes:

Use a requests.Session for cookie persistence (the server tracks computation state per session)
Hit the main URL first to trigger server-side schedule computation
Poll ?page_loading=1 with x-requested-with: fetch header, matching what the site's JS does
Bumped timeout from 30s to 60s per request
Up to 30 poll attempts at 2s intervals (60s total) instead of 5 attempts at 3s

Tested against multiple UPRNs on the live site.

Summary by CodeRabbit

Bug Fixes

Improved Bexley council bin collection data retrieval reliability and accuracy through enhanced date parsing for collection status messages, ensuring users see correct collection schedules

Refactor

Refined data fetching mechanism for Bexley council to provide more robust and stable bin collection information retrieval

Bexley's WasteWorks site moved from server-side meta-refresh to client-side JS polling. The old approach of requesting ?page_loading=1 no longer returns bin data. The fix uses a session for cookie persistence, hits the main URL first to trigger server-side computation, then polls ?page_loading=1 with the x-requested-with: fetch header that the site's refresh.js expects. Timeout bumped from 30s to 60s per request, with up to 30 poll attempts at 2s intervals.

coderabbitai · 2026-05-03T15:04:12Z

📝 Walkthrough

Walkthrough

The Bexley Council data parser now uses a persistent requests.Session with custom headers to establish server-side state before polling for page readiness with up to 30 iterations. Date parsing adds special handling for "collected today" and "could not be collected today" strings, mapping them to the current datetime instead of parsing ordinal date formats.

Changes

Bexley Council Page Fetch and Date Parsing

Layer / File(s)	Summary
Session & Polling Infrastructure `uk_bin_collection/uk_bin_collection/councils/BexleyCouncil.py` (lines 18–45)	Replaced simple `requests.get` with `requests.Session`, initial page request, and 30-iteration polling loop checking for `waste-service-name` in `?page_loading=1` responses with sleep between attempts.
Date Parsing Logic `uk_bin_collection/uk_bin_collection/councils/BexleyCouncil.py` (lines 58–82)	Added case-insensitive detection of "collected today" and "could not be collected today"; when matched, set `parsed_date` to `datetime.now()`. Otherwise, parse date strings with ordinal cleanup and fallback date formats.

Sequence Diagram

sequenceDiagram
    actor User
    participant Script as BexleyCouncil Parser
    participant Session as requests.Session
    participant Server as Bexley Server

    User->>Script: parse_data(page)
    Script->>Session: Set headers (User-Agent, x-requested-with)
    Session->>Server: GET page (initial request)
    Server-->>Session: Response + cookies/state
    activate Script
    loop Polling (up to 30 iterations)
        Script->>Session: GET page?page_loading=1
        Session->>Server: Fetch with x-requested-with: fetch
        Server-->>Session: HTML response
        Script->>Script: Check for waste-service-name
        alt Found
            Script->>Script: Break polling loop
            deactivate Script
        else Not found
            Script->>Script: Sleep 200ms, retry
        end
    end
    Script->>Script: Parse collection dates (handle "collected today")
    Script->>User: Return parsed bin collection data

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The changes introduce a polling mechanism with state management and session handling alongside conditional date-parsing logic. The review requires understanding the intent of the polling loop, verifying the page-readiness check is robust, and confirming the date-parsing fallback logic handles all expected date formats correctly.

Possibly related PRs

Enhance HTTP requests with retry logic and headers #1691: Applies identical session-based polling and date parsing updates to the same Bexley Council module; direct code-level parallel.

Suggested reviewers

dp247

Poem

🐰 A rabbit hopped through Bexley's gate,
Polling pages, checking state,
When "today" is on the screen,
DateTime shines, fresh and keen,
Sessions persist through cookie crumbs—
Parse the bins, the data comes! 🗑️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adapting BexleyCouncil to handle async page loading by implementing session-based polling with proper headers.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 60 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-03T15:05:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (2055b98).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2001   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

uk_bin_collection/uk_bin_collection/councils/BexleyCouncil.py (1)

73-90: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't swallow unexpected date formats here.

print(...) drops the bin entry and returns partial data, which makes Bexley markup changes easy to miss. This parser should raise with the raw date string instead.

Proposed fix

-                    except ValueError as e:
-                        print(f"Error parsing date for {bin_type}: {e}")
+                    except ValueError as e:
+                        raise ValueError(
+                            f"Unexpected Bexley collection date for {bin_type}: {next_collection}"
+                        ) from e

Based on learnings: In `uk_bin_collection/**/*.py`, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/BexleyCouncil.py` around lines
73 - 90, The code currently swallows date parsing errors (in the try/except
around parsing next_collection) and just prints them, which drops the bin entry;
instead, replace the print swallow with an explicit raise so failures surface:
in the block that uses remove_ordinal_indicator_from_date_string, parsed_date,
date_format and data["bins"].append (inside the try/except), remove the
print(...) in the except ValueError and raise a new ValueError (or re-raise)
that includes the bin_type and the raw next_collection string plus the original
exception message so callers see the unexpected format and the raw input that
failed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/BexleyCouncil.py`:
- Around line 35-45: The polling loop that fetches the council page (using
session.get and inspecting response.text for "waste-service-name") must raise an
explicit error when all 30 attempts still return the loader fragment instead of
falling through to parsing; modify the logic around the for-attempts loop to
check after the loop if "waste-service-name" is not in response.text and raise a
clear RuntimeError (or a project-standard exception) with context (e.g. last
response.status_code and a brief snippet or note about loader/timeout) so
BeautifulSoup(...) is never invoked on a still-loading page; update the code
paths around the variables response, session.get and the subsequent
BeautifulSoup call to use this raised error.
- Around line 30-45: The current polling in BexleyCouncil.py uses timeout=60 per
request which can exceed a 60s overall deadline when combined with the initial
session.get, multiple polls and sleeps; change this to track a single deadline
(use time.monotonic()) set to now + 60s before the initial request, compute
remaining = deadline - now before each network call, and pass
timeout=min(remaining, 60) to session.get (for both the initial bootstrap
request and the "{page}?page_loading=1" polls), abort/raise a timeout if
remaining <= 0, and also cap the time.sleep() between attempts to not sleep past
the remaining budget. Ensure identifiers referenced are the existing session.get
calls and the polling loop around response =
session.get(f"{page}?page_loading=1", ...).

---

Outside diff comments:
In `@uk_bin_collection/uk_bin_collection/councils/BexleyCouncil.py`:
- Around line 73-90: The code currently swallows date parsing errors (in the
try/except around parsing next_collection) and just prints them, which drops the
bin entry; instead, replace the print swallow with an explicit raise so failures
surface: in the block that uses remove_ordinal_indicator_from_date_string,
parsed_date, date_format and data["bins"].append (inside the try/except), remove
the print(...) in the except ValueError and raise a new ValueError (or re-raise)
that includes the bin_type and the raw next_collection string plus the original
exception message so callers see the unexpected format and the raw input that
failed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a1ce570-1916-4ab3-83ba-b787de63b659

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 2055b98.

📒 Files selected for processing (1)

uk_bin_collection/uk_bin_collection/councils/BexleyCouncil.py

coderabbitai · 2026-05-03T15:08:54Z

+        # Initial request triggers server-side schedule computation and sets session cookie
+        session.get(page, timeout=60)
+
+        # Poll for results — server computes async, JS client polls ?page_loading=1
+        # with x-requested-with header to get the content fragment
+        response = None
+        for attempt in range(30):
+            response = session.get(
+                f"{page}?page_loading=1",
+                headers={"x-requested-with": "fetch"},
+                timeout=60,
+            )
            response.raise_for_status()
            if "waste-service-name" in response.text:
-                found = True
                break
-            time.sleep(3)
-
-        if not found:
-            raise ValueError("Bexley WasteWorks page did not return expected content after 3 attempts")
+            time.sleep(2)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce a real 60-second overall deadline.

timeout=60 is per request, so this can block for far longer than 60 seconds once you include the bootstrap request, 30 polls, and sleeps. Track a deadline and cap each request by the remaining budget.

Proposed fix

- # Initial request triggers server-side schedule computation and sets session cookie - session.get(page, timeout=60) + deadline = time.monotonic() + 60 + + # Initial request triggers server-side schedule computation and sets session cookie + bootstrap = session.get(page, timeout=10) + bootstrap.raise_for_status() # Poll for results — server computes async, JS client polls ?page_loading=1 # with x-requested-with header to get the content fragment response = None - for attempt in range(30): + for _ in range(30): + remaining = deadline - time.monotonic() + if remaining <= 0: + raise TimeoutError("Timed out waiting for Bexley waste data to load") response = session.get( f"{page}?page_loading=1", headers={"x-requested-with": "fetch"}, - timeout=60, + timeout=min(10, remaining), ) response.raise_for_status() if "waste-service-name" in response.text: break - time.sleep(2) + time.sleep(min(2, max(0, deadline - time.monotonic())))

🧰 Tools

🪛 Ruff (0.15.12)

[warning] 36-36: Loop control variable attempt not used within loop body

Rename unused attempt to _attempt

(B007)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/BexleyCouncil.py` around lines 30 - 45, The current polling in BexleyCouncil.py uses timeout=60 per request which can exceed a 60s overall deadline when combined with the initial session.get, multiple polls and sleeps; change this to track a single deadline (use time.monotonic()) set to now + 60s before the initial request, compute remaining = deadline - now before each network call, and pass timeout=min(remaining, 60) to session.get (for both the initial bootstrap request and the "{page}?page_loading=1" polls), abort/raise a timeout if remaining <= 0, and also cap the time.sleep() between attempts to not sleep past the remaining budget. Ensure identifiers referenced are the existing session.get calls and the polling loop around response = session.get(f"{page}?page_loading=1", ...).

coderabbitai · 2026-05-03T15:08:54Z

+        response = None
+        for attempt in range(30):
+            response = session.get(
+                f"{page}?page_loading=1",
+                headers={"x-requested-with": "fetch"},
+                timeout=60,
+            )
            response.raise_for_status()
            if "waste-service-name" in response.text:
-                found = True
                break
-            time.sleep(3)
-
-        if not found:
-            raise ValueError("Bexley WasteWorks page did not return expected content after 3 attempts")
+            time.sleep(2)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Raise when the page never leaves the loading state.

If all 30 polls return the loader fragment, this still falls through to BeautifulSoup(response.text) and returns an empty result instead of surfacing a timeout/site change.

Proposed fix

response = None - for attempt in range(30): + for _ in range(30): response = session.get( f"{page}?page_loading=1", headers={"x-requested-with": "fetch"}, timeout=60, ) response.raise_for_status() if "waste-service-name" in response.text: break time.sleep(2) + else: + raise TimeoutError("Timed out waiting for Bexley waste data to load")

Based on learnings: In `uk_bin_collection/**/*.py`, when parsing council bin collection data, prefer explicit failures over silent defaults or swallowed errors.

🧰 Tools

🪛 Ruff (0.15.12)

[warning] 36-36: Loop control variable attempt not used within loop body

Rename unused attempt to _attempt

(B007)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/BexleyCouncil.py` around lines 35 - 45, The polling loop that fetches the council page (using session.get and inspecting response.text for "waste-service-name") must raise an explicit error when all 30 attempts still return the loader fragment instead of falling through to parsing; modify the logic around the for-attempts loop to check after the loop if "waste-service-name" is not in response.text and raise a clear RuntimeError (or a project-standard exception) with context (e.g. last response.status_code and a brief snippet or note about loader/timeout) so BeautifulSoup(...) is never invoked on a still-loading page; update the code paths around the variables response, session.get and the subsequent BeautifulSoup call to use this raised error.

coderabbitai Bot reviewed May 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: BexleyCouncil - adapt to async page loading#2001

fix: BexleyCouncil - adapt to async page loading#2001
InertiaUK wants to merge 1 commit intorobbrad:masterfrom
InertiaUK:fix/BexleyCouncil-async-loading

InertiaUK commented May 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 3, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 3, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 3, 2026

Uh oh!

coderabbitai Bot May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

InertiaUK commented May 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 3, 2026

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

InertiaUK commented May 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 3, 2026 •

edited

Loading