fix: ExeterCityCouncil - add postcode lookup for UPRN-miss addresses#2003
fix: ExeterCityCouncil - add postcode lookup for UPRN-miss addresses#2003InertiaUK wants to merge 2 commits intorobbrad:masterfrom
Conversation
About 55% of Ordnance Survey UPRNs in the Exeter area are not recognised by the council's address-finder API, which returns an empty array and causes an IndexError on data[0]["Results"]. The fix adds a postcode+house_number lookup path using qsource=POSTCODE, which returns all addresses in the postcode with full bin schedule HTML. The scraper matches by leading house number against the label field. Falls back to UPRN lookup for backward compatibility. Also added postcode, house_number, and skip_get_url to input.json so the scraper receives the address fields it needs.
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughExeterCityCouncil's bin collection lookup is enhanced to support address resolution via postcode and house number (paon). Test configuration is updated to include these new fields. The scraper attempts postcode-based address matching with fallback to UPRN lookup before parsing bin schedules. ChangesExeterCityCouncil Address Resolution Enhancement
Sequence DiagramsequenceDiagram
participant Test as Test Input
participant API as Address Finder API
participant Parser as HTML Parser
participant Result as Bin Schedule
Test->>Parser: postcode + paon provided?
alt Postcode + paon present
Parser->>API: Query with qsource=POSTCODE
API-->>Parser: Address entries list
Parser->>Parser: Match by numeric paon prefix
alt Match found
Parser->>Parser: Use matched entry
else No match
Parser->>Parser: Use first entry
end
else No postcode/paon
alt UPRN provided
Parser->>Parser: Use UPRN lookup path
else No UPRN
Parser-->>Result: Return empty bins
end
end
Parser->>Parser: Parse HTML bin schedule<br/>(guard <h3> tags)
Parser-->>Result: Return populated bin data
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 47 minutes and 16 seconds.Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2003 +/- ##
=======================================
Coverage 86.67% 86.67%
=======================================
Files 9 9
Lines 1141 1141
=======================================
Hits 989 989
Misses 152 152 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@uk_bin_collection/tests/input.json`:
- Around line 916-919: The wiki_note for the Exeter fixture is out of sync with
the new input format: update the wiki_note string (the fixture entry above the
diff) to instruct callers to provide postcode and house_number and set
skip_get_url: true instead of telling them to pass only a UPRN; ensure the note
mentions the keys "postcode", "house_number" and "skip_get_url" so the
documented/manual path matches the parser change.
In `@uk_bin_collection/uk_bin_collection/councils/ExeterCityCouncil.py`:
- Around line 37-63: The code currently uses paon_num/paon_prefix and
label.strip().startswith(paon_prefix) which can erroneously match "5" to "52";
change the matching logic in the loop over data to extract the leading token
from label and compare it exactly (e.g. match the first token with a
word-boundary or exact token equality against paon_prefix) instead of using
startswith, and remove the silent fallback that assigns results_html =
data[0].get("Results"); if no exact match is found leave results_html unset so
the existing UPRN branch (check_uprn and the requests call) runs, and if both
PAON and UPRN branches fail raise a clear exception (or return an explicit
error) rather than returning bindata or silently returning the first result.
- Around line 69-71: The code silently skips a bin section when h3 =
section.find_next("h3") returns None, which can hide parsing breakages; in the
ExeterCityCouncil parser (the loop that inspects each section in class
ExeterCityCouncil / its parse method), replace the silent continue with an
explicit raised exception (e.g., ValueError or a CouncilParsingError) that
includes contextual info (the section HTML or an identifying message) so the
caller fails loudly when a collection date/title is missing; ensure the
exception references the failing symbol (section.find_next("h3")) and provides
clear context for debugging.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2dc40604-5480-4c06-a553-6be590c418ad
📒 Files selected for processing (2)
uk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/ExeterCityCouncil.py
| "LAD24CD": "E07000041", | ||
| "postcode": "EX2 4NT", | ||
| "house_number": "5", | ||
| "skip_get_url": true |
There was a problem hiding this comment.
Update Exeter's fixture note to match the new input path.
This entry now depends on postcode/house_number plus skip_get_url, but the existing wiki_note above still tells users to pass only a UPRN. That leaves the documented/manual path inconsistent with the parser change and can send Exeter users back into the failing lookup flow.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/tests/input.json` around lines 916 - 919, The wiki_note for
the Exeter fixture is out of sync with the new input format: update the
wiki_note string (the fixture entry above the diff) to instruct callers to
provide postcode and house_number and set skip_get_url: true instead of telling
them to pass only a UPRN; ensure the note mentions the keys "postcode",
"house_number" and "skip_get_url" so the documented/manual path matches the
parser change.
| paon_num = re.match(r"^(\d+)", str(user_paon).strip()) | ||
| paon_prefix = paon_num.group(1) if paon_num else str(user_paon).strip() | ||
|
|
||
| for entry in data: | ||
| label = entry.get("label", "") | ||
| if label.strip().startswith(paon_prefix): | ||
| results_html = entry.get("Results") | ||
| break | ||
|
|
||
| URI = f"https://exeter.gov.uk/repositories/hidden-pages/address-finder/?qsource=UPRN&qtype=bins&term={user_uprn}" | ||
| # Fallback: first entry if no match | ||
| if not results_html and data: | ||
| results_html = data[0].get("Results") | ||
|
|
||
| response = requests.get(URI) | ||
| response.raise_for_status() | ||
| # Fall back to UPRN lookup (original method) | ||
| if not results_html and user_uprn: | ||
| check_uprn(user_uprn) | ||
| response = requests.get( | ||
| f"https://exeter.gov.uk/repositories/hidden-pages/address-finder/?qsource=UPRN&qtype=bins&term={user_uprn}", | ||
| timeout=30, | ||
| ) | ||
| response.raise_for_status() | ||
| data = response.json() | ||
| if data: | ||
| results_html = data[0].get("Results") | ||
|
|
||
| data = response.json() | ||
| if not results_html: | ||
| return bindata |
There was a problem hiding this comment.
Don't turn postcode misses into another property's schedule.
startswith(paon_prefix) will match "5" against labels like "52 ...", and the data[0]/empty-result fallbacks then hide the miss instead of surfacing it. That can return a valid-looking schedule for the wrong address. Match the leading property token exactly, and if nothing matches, fall through to the UPRN branch or raise a clear exception rather than selecting the first result or returning empty data.
Suggested direction
- paon_num = re.match(r"^(\d+)", str(user_paon).strip())
- paon_prefix = paon_num.group(1) if paon_num else str(user_paon).strip()
+ normalized_paon = str(user_paon).strip()
+ paon_num = re.match(r"^(\d+)\b", normalized_paon)
+ paon_prefix = paon_num.group(1) if paon_num else normalized_paon.casefold()
for entry in data:
- label = entry.get("label", "")
- if label.strip().startswith(paon_prefix):
+ label = entry.get("label", "").strip()
+ label_num = re.match(r"^(\d+)\b", label)
+ label_prefix = label_num.group(1) if label_num else label.casefold()
+ if label_prefix == paon_prefix:
results_html = entry.get("Results")
break
- # Fallback: first entry if no match
- if not results_html and data:
- results_html = data[0].get("Results")
+ if not results_html and not user_uprn:
+ raise ValueError(
+ f"No Exeter address matched PAON '{user_paon}' within postcode '{user_postcode}'"
+ )
- if not results_html:
- return bindata
+ if not results_html:
+ raise ValueError("Exeter lookup returned no schedule HTML")Based on learnings: in uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/uk_bin_collection/councils/ExeterCityCouncil.py` around
lines 37 - 63, The code currently uses paon_num/paon_prefix and
label.strip().startswith(paon_prefix) which can erroneously match "5" to "52";
change the matching logic in the loop over data to extract the leading token
from label and compare it exactly (e.g. match the first token with a
word-boundary or exact token equality against paon_prefix) instead of using
startswith, and remove the silent fallback that assigns results_html =
data[0].get("Results"); if no exact match is found leave results_html unset so
the existing UPRN branch (check_uprn and the requests call) runs, and if both
PAON and UPRN branches fail raise a clear exception (or return an explicit
error) rather than returning bindata or silently returning the first result.
| h3 = section.find_next("h3") | ||
| if not h3: | ||
| continue |
There was a problem hiding this comment.
Fail loudly when a bin section has no date.
Skipping the section here silently drops a collection from the response. If Exeter changes the markup, this returns partial data instead of showing that the parser is broken.
Suggested fix
h3 = section.find_next("h3")
if not h3:
- continue
+ raise ValueError(
+ f"Missing collection date for Exeter bin section '{bin_type}'"
+ )
collection_date = h3.text.strip()Based on learnings: in uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/uk_bin_collection/councils/ExeterCityCouncil.py` around
lines 69 - 71, The code silently skips a bin section when h3 =
section.find_next("h3") returns None, which can hide parsing breakages; in the
ExeterCityCouncil parser (the loop that inspects each section in class
ExeterCityCouncil / its parse method), replace the silent continue with an
explicit raised exception (e.g., ValueError or a CouncilParsingError) that
includes contextual info (the section HTML or an identifying message) so the
caller fails loudly when a collection date/title is missing; ensure the
exception references the failing symbol (section.find_next("h3")) and provides
clear context for debugging.
The postcode lookup path works without skip_get_url. The field was added as a workaround for a CAPTCHA false-positive in our API layer, now fixed separately.
About 55% of Ordnance Survey UPRNs in the Exeter area are not in the council's internal database. The address-finder API returns
[]for these, causingIndexError: list index out of rangeondata[0]["Results"].Root cause: Exeter's
?qsource=UPRNendpoint only recognises UPRNs with the100040xxxxxxprefix. Newer prefixes (10094x,10091x,100041x) return empty arrays.Fix: Added a
?qsource=POSTCODElookup path that works for all addresses. The endpoint returns every property in the postcode with full bin schedule HTML. The scraper matches by leading house number against thelabelfield. Falls back to the original UPRN lookup for backward compatibility.Also added
postcode,house_number, andskip_get_urltoinput.json.Tested with multiple Exeter postcodes and house numbers on the live API.
Summary by CodeRabbit
New Features
Bug Fixes