fix: ExeterCityCouncil - add postcode lookup for UPRN-miss addresses by InertiaUK · Pull Request #2003 · robbrad/UKBinCollectionData

InertiaUK · 2026-05-03T23:03:15Z

About 55% of Ordnance Survey UPRNs in the Exeter area are not in the council's internal database. The address-finder API returns [] for these, causing IndexError: list index out of range on data[0]["Results"].

Root cause: Exeter's ?qsource=UPRN endpoint only recognises UPRNs with the 100040xxxxxx prefix. Newer prefixes (10094x, 10091x, 100041x) return empty arrays.

Fix: Added a ?qsource=POSTCODE lookup path that works for all addresses. The endpoint returns every property in the postcode with full bin schedule HTML. The scraper matches by leading house number against the label field. Falls back to the original UPRN lookup for backward compatibility.

Also added postcode, house_number, and skip_get_url to input.json.

Tested with multiple Exeter postcodes and house numbers on the live API.

Summary by CodeRabbit

New Features
- Exeter City Council: Bin schedule lookup now supports postcode and house number as input options.
Bug Fixes
- Improved robustness of schedule data parsing to handle edge cases.
- Enhanced lookup fallback mechanisms for improved reliability when retrieving collection schedules.

About 55% of Ordnance Survey UPRNs in the Exeter area are not recognised by the council's address-finder API, which returns an empty array and causes an IndexError on data[0]["Results"]. The fix adds a postcode+house_number lookup path using qsource=POSTCODE, which returns all addresses in the postcode with full bin schedule HTML. The scraper matches by leading house number against the label field. Falls back to UPRN lookup for backward compatibility. Also added postcode, house_number, and skip_get_url to input.json so the scraper receives the address fields it needs.

coderabbitai · 2026-05-03T23:03:27Z

Warning

Rate limit exceeded

@InertiaUK has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 47 minutes and 16 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 31d6e816-8b50-4a8c-bd45-c99d2d6a0470

📥 Commits

Reviewing files that changed from the base of the PR and between 3f8f6dc and f0aa34d.

📒 Files selected for processing (1)

uk_bin_collection/tests/input.json

📝 Walkthrough

Walkthrough

ExeterCityCouncil's bin collection lookup is enhanced to support address resolution via postcode and house number (paon). Test configuration is updated to include these new fields. The scraper attempts postcode-based address matching with fallback to UPRN lookup before parsing bin schedules.

Changes

ExeterCityCouncil Address Resolution Enhancement

Layer / File(s)	Summary
Test Configuration `uk_bin_collection/tests/input.json`	ExeterCityCouncil test entry gains `postcode`, `house_number`, and `skip_get_url` fields alongside existing `LAD24CD`.
Address Lookup Logic `uk_bin_collection/uk_bin_collection/councils/ExeterCityCouncil.py` (lines 18–64)	`parse_data` now attempts postcode + paon address lookup via address-finder API (qsource=POSTCODE), matches entries by numeric house-number prefix, and falls back to first entry; UPRN lookup only proceeds when `uprn` is provided.
HTML Parsing Robustness `uk_bin_collection/uk_bin_collection/councils/ExeterCityCouncil.py` (lines 65–78)	Parsing now guards against missing `<h3>` collection-date tags following `<h2>` blocks; `soup.prettify()` call removed; bin entry construction preserved with validated date extraction.
Module Imports `uk_bin_collection/uk_bin_collection/councils/ExeterCityCouncil.py` (line 1)	Import statement swapped from `time` to `re` to support regex-based numeric prefix matching.

Sequence Diagram

sequenceDiagram
    participant Test as Test Input
    participant API as Address Finder API
    participant Parser as HTML Parser
    participant Result as Bin Schedule

    Test->>Parser: postcode + paon provided?
    alt Postcode + paon present
        Parser->>API: Query with qsource=POSTCODE
        API-->>Parser: Address entries list
        Parser->>Parser: Match by numeric paon prefix
        alt Match found
            Parser->>Parser: Use matched entry
        else No match
            Parser->>Parser: Use first entry
        end
    else No postcode/paon
        alt UPRN provided
            Parser->>Parser: Use UPRN lookup path
        else No UPRN
            Parser-->>Result: Return empty bins
        end
    end
    Parser->>Parser: Parse HTML bin schedule<br/>(guard <h3> tags)
    Parser-->>Result: Return populated bin data

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: add direct mobile API support to North Hertfordshire District Council scraper #1715: Adds postcode/paon → UPRN address-lookup logic with API fallback to other council scrapers, matching the same functional pattern.
May 2026 Release #1996: Implements postcode/house-number address resolution with numeric prefix matching and UPRN fallback across multiple council parsers.

Suggested reviewers

dp247

Poem

🐰 A postcode path now leads the way,
Where houses hide by night and day,
No UPRN to fetch or find,
Just labels matched with rabbit's mind!
The bins appear, all sorted bright,
Through guards and guards, we parse just right!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and specifically summarizes the main change: adding postcode lookup functionality to ExeterCityCouncil to handle UPRN-miss addresses.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 47 minutes and 16 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-03T23:05:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (f0aa34d).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2003   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@uk_bin_collection/tests/input.json`:
- Around line 916-919: The wiki_note for the Exeter fixture is out of sync with
the new input format: update the wiki_note string (the fixture entry above the
diff) to instruct callers to provide postcode and house_number and set
skip_get_url: true instead of telling them to pass only a UPRN; ensure the note
mentions the keys "postcode", "house_number" and "skip_get_url" so the
documented/manual path matches the parser change.

In `@uk_bin_collection/uk_bin_collection/councils/ExeterCityCouncil.py`:
- Around line 37-63: The code currently uses paon_num/paon_prefix and
label.strip().startswith(paon_prefix) which can erroneously match "5" to "52";
change the matching logic in the loop over data to extract the leading token
from label and compare it exactly (e.g. match the first token with a
word-boundary or exact token equality against paon_prefix) instead of using
startswith, and remove the silent fallback that assigns results_html =
data[0].get("Results"); if no exact match is found leave results_html unset so
the existing UPRN branch (check_uprn and the requests call) runs, and if both
PAON and UPRN branches fail raise a clear exception (or return an explicit
error) rather than returning bindata or silently returning the first result.
- Around line 69-71: The code silently skips a bin section when h3 =
section.find_next("h3") returns None, which can hide parsing breakages; in the
ExeterCityCouncil parser (the loop that inspects each section in class
ExeterCityCouncil / its parse method), replace the silent continue with an
explicit raised exception (e.g., ValueError or a CouncilParsingError) that
includes contextual info (the section HTML or an identifying message) so the
caller fails loudly when a collection date/title is missing; ensure the
exception references the failing symbol (section.find_next("h3")) and provides
clear context for debugging.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2dc40604-5480-4c06-a553-6be590c418ad

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 3f8f6dc.

📒 Files selected for processing (2)

uk_bin_collection/tests/input.json
uk_bin_collection/uk_bin_collection/councils/ExeterCityCouncil.py

coderabbitai · 2026-05-03T23:08:00Z

+        "LAD24CD": "E07000041",
+        "postcode": "EX2 4NT",
+        "house_number": "5",
+        "skip_get_url": true


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update Exeter's fixture note to match the new input path.

This entry now depends on postcode/house_number plus skip_get_url, but the existing wiki_note above still tells users to pass only a UPRN. That leaves the documented/manual path inconsistent with the parser change and can send Exeter users back into the failing lookup flow.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/tests/input.json` around lines 916 - 919, The wiki_note for the Exeter fixture is out of sync with the new input format: update the wiki_note string (the fixture entry above the diff) to instruct callers to provide postcode and house_number and set skip_get_url: true instead of telling them to pass only a UPRN; ensure the note mentions the keys "postcode", "house_number" and "skip_get_url" so the documented/manual path matches the parser change.

coderabbitai · 2026-05-03T23:08:00Z

+                paon_num = re.match(r"^(\d+)", str(user_paon).strip())
+                paon_prefix = paon_num.group(1) if paon_num else str(user_paon).strip()
+
+                for entry in data:
+                    label = entry.get("label", "")
+                    if label.strip().startswith(paon_prefix):
+                        results_html = entry.get("Results")
+                        break

-        URI = f"https://exeter.gov.uk/repositories/hidden-pages/address-finder/?qsource=UPRN&qtype=bins&term={user_uprn}"
+                # Fallback: first entry if no match
+                if not results_html and data:
+                    results_html = data[0].get("Results")

-        response = requests.get(URI)
-        response.raise_for_status()
+        # Fall back to UPRN lookup (original method)
+        if not results_html and user_uprn:
+            check_uprn(user_uprn)
+            response = requests.get(
+                f"https://exeter.gov.uk/repositories/hidden-pages/address-finder/?qsource=UPRN&qtype=bins&term={user_uprn}",
+                timeout=30,
+            )
+            response.raise_for_status()
+            data = response.json()
+            if data:
+                results_html = data[0].get("Results")

-        data = response.json()
+        if not results_html:
+            return bindata


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't turn postcode misses into another property's schedule.

startswith(paon_prefix) will match "5" against labels like "52 ...", and the data[0]/empty-result fallbacks then hide the miss instead of surfacing it. That can return a valid-looking schedule for the wrong address. Match the leading property token exactly, and if nothing matches, fall through to the UPRN branch or raise a clear exception rather than selecting the first result or returning empty data.

Suggested direction

- paon_num = re.match(r"^(\d+)", str(user_paon).strip()) - paon_prefix = paon_num.group(1) if paon_num else str(user_paon).strip() + normalized_paon = str(user_paon).strip() + paon_num = re.match(r"^(\d+)\b", normalized_paon) + paon_prefix = paon_num.group(1) if paon_num else normalized_paon.casefold() for entry in data: - label = entry.get("label", "") - if label.strip().startswith(paon_prefix): + label = entry.get("label", "").strip() + label_num = re.match(r"^(\d+)\b", label) + label_prefix = label_num.group(1) if label_num else label.casefold() + if label_prefix == paon_prefix: results_html = entry.get("Results") break - # Fallback: first entry if no match - if not results_html and data: - results_html = data[0].get("Results") + if not results_html and not user_uprn: + raise ValueError( + f"No Exeter address matched PAON '{user_paon}' within postcode '{user_postcode}'" + ) - if not results_html: - return bindata + if not results_html: + raise ValueError("Exeter lookup returned no schedule HTML")

Based on learnings: in uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/ExeterCityCouncil.py` around lines 37 - 63, The code currently uses paon_num/paon_prefix and label.strip().startswith(paon_prefix) which can erroneously match "5" to "52"; change the matching logic in the loop over data to extract the leading token from label and compare it exactly (e.g. match the first token with a word-boundary or exact token equality against paon_prefix) instead of using startswith, and remove the silent fallback that assigns results_html = data[0].get("Results"); if no exact match is found leave results_html unset so the existing UPRN branch (check_uprn and the requests call) runs, and if both PAON and UPRN branches fail raise a clear exception (or return an explicit error) rather than returning bindata or silently returning the first result.

coderabbitai · 2026-05-03T23:08:00Z

+            h3 = section.find_next("h3")
+            if not h3:
+                continue


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail loudly when a bin section has no date.

Skipping the section here silently drops a collection from the response. If Exeter changes the markup, this returns partial data instead of showing that the parser is broken.

Suggested fix

h3 = section.find_next("h3") if not h3: - continue + raise ValueError( + f"Missing collection date for Exeter bin section '{bin_type}'" + ) collection_date = h3.text.strip()

Based on learnings: in uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/ExeterCityCouncil.py` around lines 69 - 71, The code silently skips a bin section when h3 = section.find_next("h3") returns None, which can hide parsing breakages; in the ExeterCityCouncil parser (the loop that inspects each section in class ExeterCityCouncil / its parse method), replace the silent continue with an explicit raised exception (e.g., ValueError or a CouncilParsingError) that includes contextual info (the section HTML or an identifying message) so the caller fails loudly when a collection date/title is missing; ensure the exception references the failing symbol (section.find_next("h3")) and provides clear context for debugging.

The postcode lookup path works without skip_get_url. The field was added as a workaround for a CAPTCHA false-positive in our API layer, now fixed separately.

coderabbitai Bot reviewed May 3, 2026

View reviewed changes

remove skip_get_url from ExeterCityCouncil — not needed

f0aa34d

The postcode lookup path works without skip_get_url. The field was added as a workaround for a CAPTCHA false-positive in our API layer, now fixed separately.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ExeterCityCouncil - add postcode lookup for UPRN-miss addresses#2003

fix: ExeterCityCouncil - add postcode lookup for UPRN-miss addresses#2003
InertiaUK wants to merge 2 commits intorobbrad:masterfrom
InertiaUK:fix/ExeterCityCouncil-postcode-lookup

InertiaUK commented May 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 3, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 3, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 3, 2026

Uh oh!

coderabbitai Bot May 3, 2026

Uh oh!

coderabbitai Bot May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

InertiaUK commented May 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

InertiaUK commented May 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 3, 2026 •

edited

Loading

codecov Bot commented May 3, 2026 •

edited

Loading