Add GOV.UK mirror site (port 40015)#32
Open
lamawmouk wants to merge 1 commit into
Open
Conversation
Adds a Flask mirror of https://www.gov.uk/ as the 16th WebHarbor site, running on port 40015. ## What's mirrored - 16 top-level topics (Money and tax, Visas and immigration, Driving, ...) - 44 subtopics - 15 government departments (HMRC, DfE, Home Office, DVLA, NHS England, ...) with real ministers / permanent secretaries / employee counts - 73 guidance articles (Self Assessment, Income Tax, Universal Credit, Skilled Worker visa, passport applications, vehicle tax, ...) - 20 announcements (press releases, news stories, speeches) - Search across articles / announcements / departments ## Visual fidelity Uses the official MIT-licensed govuk-frontend v6.1.0 CSS + JS + GDS Transport font + crown SVG. Templates use the canonical Design System component DOM (govuk-header, govuk-breadcrumbs, govuk-summary-list, govuk-pagination, govuk-grid-row, etc.) so an agent's selectors match the real GOV.UK. Content licensed under the Open Government Licence v3.0 (synthesized in the spirit of GOV.UK guidance; no upstream copy embedded). ## Folder layout Matches the canonical site layout (compare wolfram_alpha, google_search): sites/gov_uk/ |-- _health.py |-- app.py |-- seed_data.py |-- tasks.jsonl |-- instance_seed/ (HF-managed) |-- static/{css,js,fonts,icons,images,external_cache}/ \`-- templates/ ## Wiring - websyn_start.sh: gov_uk appended to SITES, 15->16 counts - control_server.py: gov_uk added to SITES - Dockerfile: EXPOSE 40000-40015 ## Pre-PR verification (passed) - docker build webharbor:dev clean (5.92 GB) - 16/16 sites bind in 2s - All gov_uk routes (/, /browse, /browse/<topic>, /browse/<t>/<s>, /guidance/<slug>, /government/organisations[/<dept>], /government/announcements, /search, /_health) return 200 - /reset/gov_uk -> {ready: true}, md5 byte-identical pre/post - Byte-identical after docker restart ## Asset PR Seed DB (gov_uk.tar.gz, 32 KB) uploaded as HF PR: https://huggingface.co/datasets/ChilleD/WebHarbor/discussions/22 .assets-revision will be bumped to the HF merge SHA once that PR lands.
96f4916 to
63b73a5
Compare
Author
|
@Raibows would you be able to review this when you have a chance? Thanks! 🙏 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
Adds a Flask mirror of gov.uk as the 16th WebHarbor site (port 40015), with topic browse, guidance article detail, department directory, announcements, and search. Uses the official MIT-licensed govuk-frontend v6.1.0 for canonical Design System DOM.
Companion HuggingFace PR: https://huggingface.co/datasets/ChilleD/WebHarbor/discussions/22
What's in this PR
sites/gov_uk/:app.pyseed_data.pytemplates/*.htmlstatic/{css,js,fonts,icons}/tasks.jsonlRegistration (sync per AGENTS.md):
gov_ukadded towebsyn_start.shandcontrol_server.py,DockerfileEXPOSE bumped to 40000-40015.Verification
All checks in AGENTS.md § Pre-PR checks pass: image builds clean, 16/16 sites alive, every gov_uk route returns 200,
POST /reset/gov_ukbyte-identical pre/post (md5f6931b6c…), and identical afterdocker restart.Notes
govuk-frontend.min.cssonly patched with onesedto rewriteurl(/assets/...)→ relative paths so they resolve through Flask's/static/..assets-revisionstill points atmain; will bump to the HF merge SHA after that PR is reviewed.