Skip to content

fix: ExeterCityCouncil - add postcode lookup for UPRN-miss addresses#2003

Open
InertiaUK wants to merge 2 commits intorobbrad:masterfrom
InertiaUK:fix/ExeterCityCouncil-postcode-lookup
Open

fix: ExeterCityCouncil - add postcode lookup for UPRN-miss addresses#2003
InertiaUK wants to merge 2 commits intorobbrad:masterfrom
InertiaUK:fix/ExeterCityCouncil-postcode-lookup

Conversation

@InertiaUK
Copy link
Copy Markdown
Contributor

@InertiaUK InertiaUK commented May 3, 2026

About 55% of Ordnance Survey UPRNs in the Exeter area are not in the council's internal database. The address-finder API returns [] for these, causing IndexError: list index out of range on data[0]["Results"].

Root cause: Exeter's ?qsource=UPRN endpoint only recognises UPRNs with the 100040xxxxxx prefix. Newer prefixes (10094x, 10091x, 100041x) return empty arrays.

Fix: Added a ?qsource=POSTCODE lookup path that works for all addresses. The endpoint returns every property in the postcode with full bin schedule HTML. The scraper matches by leading house number against the label field. Falls back to the original UPRN lookup for backward compatibility.

Also added postcode, house_number, and skip_get_url to input.json.

Tested with multiple Exeter postcodes and house numbers on the live API.

Summary by CodeRabbit

  • New Features

    • Exeter City Council: Bin schedule lookup now supports postcode and house number as input options.
  • Bug Fixes

    • Improved robustness of schedule data parsing to handle edge cases.
    • Enhanced lookup fallback mechanisms for improved reliability when retrieving collection schedules.

About 55% of Ordnance Survey UPRNs in the Exeter area are not recognised
by the council's address-finder API, which returns an empty array and
causes an IndexError on data[0]["Results"].

The fix adds a postcode+house_number lookup path using
qsource=POSTCODE, which returns all addresses in the postcode with
full bin schedule HTML. The scraper matches by leading house number
against the label field. Falls back to UPRN lookup for backward
compatibility.

Also added postcode, house_number, and skip_get_url to input.json so
the scraper receives the address fields it needs.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 3, 2026

Warning

Rate limit exceeded

@InertiaUK has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 47 minutes and 16 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 31d6e816-8b50-4a8c-bd45-c99d2d6a0470

📥 Commits

Reviewing files that changed from the base of the PR and between 3f8f6dc and f0aa34d.

📒 Files selected for processing (1)
  • uk_bin_collection/tests/input.json
📝 Walkthrough

Walkthrough

ExeterCityCouncil's bin collection lookup is enhanced to support address resolution via postcode and house number (paon). Test configuration is updated to include these new fields. The scraper attempts postcode-based address matching with fallback to UPRN lookup before parsing bin schedules.

Changes

ExeterCityCouncil Address Resolution Enhancement

Layer / File(s) Summary
Test Configuration
uk_bin_collection/tests/input.json
ExeterCityCouncil test entry gains postcode, house_number, and skip_get_url fields alongside existing LAD24CD.
Address Lookup Logic
uk_bin_collection/uk_bin_collection/councils/ExeterCityCouncil.py (lines 18–64)
parse_data now attempts postcode + paon address lookup via address-finder API (qsource=POSTCODE), matches entries by numeric house-number prefix, and falls back to first entry; UPRN lookup only proceeds when uprn is provided.
HTML Parsing Robustness
uk_bin_collection/uk_bin_collection/councils/ExeterCityCouncil.py (lines 65–78)
Parsing now guards against missing <h3> collection-date tags following <h2> blocks; soup.prettify() call removed; bin entry construction preserved with validated date extraction.
Module Imports
uk_bin_collection/uk_bin_collection/councils/ExeterCityCouncil.py (line 1)
Import statement swapped from time to re to support regex-based numeric prefix matching.

Sequence Diagram

sequenceDiagram
    participant Test as Test Input
    participant API as Address Finder API
    participant Parser as HTML Parser
    participant Result as Bin Schedule

    Test->>Parser: postcode + paon provided?
    alt Postcode + paon present
        Parser->>API: Query with qsource=POSTCODE
        API-->>Parser: Address entries list
        Parser->>Parser: Match by numeric paon prefix
        alt Match found
            Parser->>Parser: Use matched entry
        else No match
            Parser->>Parser: Use first entry
        end
    else No postcode/paon
        alt UPRN provided
            Parser->>Parser: Use UPRN lookup path
        else No UPRN
            Parser-->>Result: Return empty bins
        end
    end
    Parser->>Parser: Parse HTML bin schedule<br/>(guard <h3> tags)
    Parser-->>Result: Return populated bin data
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • dp247

Poem

🐰 A postcode path now leads the way,
Where houses hide by night and day,
No UPRN to fetch or find,
Just labels matched with rabbit's mind!
The bins appear, all sorted bright,
Through guards and guards, we parse just right!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically summarizes the main change: adding postcode lookup functionality to ExeterCityCouncil to handle UPRN-miss addresses.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 47 minutes and 16 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (f0aa34d).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2003   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@uk_bin_collection/tests/input.json`:
- Around line 916-919: The wiki_note for the Exeter fixture is out of sync with
the new input format: update the wiki_note string (the fixture entry above the
diff) to instruct callers to provide postcode and house_number and set
skip_get_url: true instead of telling them to pass only a UPRN; ensure the note
mentions the keys "postcode", "house_number" and "skip_get_url" so the
documented/manual path matches the parser change.

In `@uk_bin_collection/uk_bin_collection/councils/ExeterCityCouncil.py`:
- Around line 37-63: The code currently uses paon_num/paon_prefix and
label.strip().startswith(paon_prefix) which can erroneously match "5" to "52";
change the matching logic in the loop over data to extract the leading token
from label and compare it exactly (e.g. match the first token with a
word-boundary or exact token equality against paon_prefix) instead of using
startswith, and remove the silent fallback that assigns results_html =
data[0].get("Results"); if no exact match is found leave results_html unset so
the existing UPRN branch (check_uprn and the requests call) runs, and if both
PAON and UPRN branches fail raise a clear exception (or return an explicit
error) rather than returning bindata or silently returning the first result.
- Around line 69-71: The code silently skips a bin section when h3 =
section.find_next("h3") returns None, which can hide parsing breakages; in the
ExeterCityCouncil parser (the loop that inspects each section in class
ExeterCityCouncil / its parse method), replace the silent continue with an
explicit raised exception (e.g., ValueError or a CouncilParsingError) that
includes contextual info (the section HTML or an identifying message) so the
caller fails loudly when a collection date/title is missing; ensure the
exception references the failing symbol (section.find_next("h3")) and provides
clear context for debugging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2dc40604-5480-4c06-a553-6be590c418ad

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 3f8f6dc.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/ExeterCityCouncil.py

Comment thread uk_bin_collection/tests/input.json Outdated
Comment on lines +916 to +919
"LAD24CD": "E07000041",
"postcode": "EX2 4NT",
"house_number": "5",
"skip_get_url": true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update Exeter's fixture note to match the new input path.

This entry now depends on postcode/house_number plus skip_get_url, but the existing wiki_note above still tells users to pass only a UPRN. That leaves the documented/manual path inconsistent with the parser change and can send Exeter users back into the failing lookup flow.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/tests/input.json` around lines 916 - 919, The wiki_note for
the Exeter fixture is out of sync with the new input format: update the
wiki_note string (the fixture entry above the diff) to instruct callers to
provide postcode and house_number and set skip_get_url: true instead of telling
them to pass only a UPRN; ensure the note mentions the keys "postcode",
"house_number" and "skip_get_url" so the documented/manual path matches the
parser change.

Comment on lines +37 to +63
paon_num = re.match(r"^(\d+)", str(user_paon).strip())
paon_prefix = paon_num.group(1) if paon_num else str(user_paon).strip()

for entry in data:
label = entry.get("label", "")
if label.strip().startswith(paon_prefix):
results_html = entry.get("Results")
break

URI = f"https://exeter.gov.uk/repositories/hidden-pages/address-finder/?qsource=UPRN&qtype=bins&term={user_uprn}"
# Fallback: first entry if no match
if not results_html and data:
results_html = data[0].get("Results")

response = requests.get(URI)
response.raise_for_status()
# Fall back to UPRN lookup (original method)
if not results_html and user_uprn:
check_uprn(user_uprn)
response = requests.get(
f"https://exeter.gov.uk/repositories/hidden-pages/address-finder/?qsource=UPRN&qtype=bins&term={user_uprn}",
timeout=30,
)
response.raise_for_status()
data = response.json()
if data:
results_html = data[0].get("Results")

data = response.json()
if not results_html:
return bindata
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't turn postcode misses into another property's schedule.

startswith(paon_prefix) will match "5" against labels like "52 ...", and the data[0]/empty-result fallbacks then hide the miss instead of surfacing it. That can return a valid-looking schedule for the wrong address. Match the leading property token exactly, and if nothing matches, fall through to the UPRN branch or raise a clear exception rather than selecting the first result or returning empty data.

Suggested direction
-                paon_num = re.match(r"^(\d+)", str(user_paon).strip())
-                paon_prefix = paon_num.group(1) if paon_num else str(user_paon).strip()
+                normalized_paon = str(user_paon).strip()
+                paon_num = re.match(r"^(\d+)\b", normalized_paon)
+                paon_prefix = paon_num.group(1) if paon_num else normalized_paon.casefold()

                 for entry in data:
-                    label = entry.get("label", "")
-                    if label.strip().startswith(paon_prefix):
+                    label = entry.get("label", "").strip()
+                    label_num = re.match(r"^(\d+)\b", label)
+                    label_prefix = label_num.group(1) if label_num else label.casefold()
+                    if label_prefix == paon_prefix:
                         results_html = entry.get("Results")
                         break

-                # Fallback: first entry if no match
-                if not results_html and data:
-                    results_html = data[0].get("Results")
+                if not results_html and not user_uprn:
+                    raise ValueError(
+                        f"No Exeter address matched PAON '{user_paon}' within postcode '{user_postcode}'"
+                    )

-        if not results_html:
-            return bindata
+        if not results_html:
+            raise ValueError("Exeter lookup returned no schedule HTML")

Based on learnings: in uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/ExeterCityCouncil.py` around
lines 37 - 63, The code currently uses paon_num/paon_prefix and
label.strip().startswith(paon_prefix) which can erroneously match "5" to "52";
change the matching logic in the loop over data to extract the leading token
from label and compare it exactly (e.g. match the first token with a
word-boundary or exact token equality against paon_prefix) instead of using
startswith, and remove the silent fallback that assigns results_html =
data[0].get("Results"); if no exact match is found leave results_html unset so
the existing UPRN branch (check_uprn and the requests call) runs, and if both
PAON and UPRN branches fail raise a clear exception (or return an explicit
error) rather than returning bindata or silently returning the first result.

Comment on lines +69 to +71
h3 = section.find_next("h3")
if not h3:
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail loudly when a bin section has no date.

Skipping the section here silently drops a collection from the response. If Exeter changes the markup, this returns partial data instead of showing that the parser is broken.

Suggested fix
             h3 = section.find_next("h3")
             if not h3:
-                continue
+                raise ValueError(
+                    f"Missing collection date for Exeter bin section '{bin_type}'"
+                )
             collection_date = h3.text.strip()

Based on learnings: in uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/ExeterCityCouncil.py` around
lines 69 - 71, The code silently skips a bin section when h3 =
section.find_next("h3") returns None, which can hide parsing breakages; in the
ExeterCityCouncil parser (the loop that inspects each section in class
ExeterCityCouncil / its parse method), replace the silent continue with an
explicit raised exception (e.g., ValueError or a CouncilParsingError) that
includes contextual info (the section HTML or an identifying message) so the
caller fails loudly when a collection date/title is missing; ensure the
exception references the failing symbol (section.find_next("h3")) and provides
clear context for debugging.

The postcode lookup path works without skip_get_url. The field was
added as a workaround for a CAPTCHA false-positive in our API layer,
now fixed separately.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant