web-archive.txt is an open specification for web archive interoperability, defining a human- and machine-readable plain-text descriptor published via the Well-Known URI pattern (RFC8615). It provides a consistent way for web archives to publish identity, scope and access interfaces while enabling discovery and verification of programmatic endpoints.
The web archiving ecosystem is diverse and resilient but appears as a constellation of “siloed nodes”. Identity, scope, governance and access are often opaque and rely on ad hoc knowledge. As systems evolve, downstream aggregators and web archive indexes degrade, concentrating visibility and use in a small number of web archives. We developed web-archive.txt to address this by enabling each web archive to publish a standard descriptor, preserving diversity while making differences explicit in a structured, human- and machine-readable format and supporting interoperability through comparable metadata.
web-archive.txt descriptors are written in TOML, prioritising human readability while remaining fully machine-parseable. Native support for comments allows web archives to include additional implementation context without affecting interoperability.
The following is a complete example web-archive.txt descriptor for the UK Government Web Archive (UKGWA), located in the repository root (available here):
version = "0.1"
last_updated = "2026-04-23"
[archive]
id = "ukgwa"
name = [ "UK Government Web Archive", { alt = "UKGWA" } ]
established = "1996"
website = "https://www.nationalarchives.gov.uk/webarchive/"
email = "webarchive@nationalarchives.gov.uk"
[archive.organisation]
name = [ "The National Archives", { alt = "TNA" } ]
type = "national_archive"
location = [ "GB" ]
website = "https://www.nationalarchives.gov.uk/"
[archive.scope]
crawl = [ "selective", "event", "thematic" ]
authority = { type = "legal_deposit", documentation = "https://www.legislation.gov.uk/ukpga/Eliz2/6-7/51/contents" }
coverage = "1996-01-01/.."
domains = [ ".gov.uk" ] # All UK government departments and bodies creating records defined as Public Records under the British Public Act
[api]
documentation = "https://webarchive.nationalarchives.gov.uk/ukgwa/help/"
rate_limit = false # IP address whitelisting can be requested for high-volume access
[api.memento]
timemap = { endpoint = "https://webarchive.nationalarchives.gov.uk/ukgwa/timemap/json/{url}", access = "online" }
timegate = { endpoint = "https://webarchive.nationalarchives.gov.uk/ukgwa/{datetime}/{url}", access = "online" }
[api.cdx]
query = { endpoint = "https://webarchive.nationalarchives.gov.uk/ukgwa/cdx?url={url}", access = "online" }
[replay]
rewritten = "https://webarchive.nationalarchives.gov.uk/ukgwa/{datetime}/{url}"
no_toolbar = "https://webarchive.nationalarchives.gov.uk/ukgwa/nobanner/{datetime}/{url}"
raw = "https://webarchive.nationalarchives.gov.uk/ukgwa/{datetime}id_/{url}"A conforming web archive SHOULD publish its descriptor in accordance with RFC8615, Well-Known URIs.
/.well-known/web-archive.txt
Web archives MAY additionally advertise the descriptor using an HTTP Link header:
Link: </.well-known/web-archive.txt>; rel="service-desc"A reference validator is provided to verify conformance of a web-archive.txt descriptor with the specification schema. It validates TOML structure, applies versioned schema rules, reports errors with location information and can verify declared API endpoints.
npm i @overbrowsing/web-archive.txtCommands for validating:
-
Validate the example web-archive.txt descriptor for the UK Government Web Archive (UKGWA), located in the repository root (available here):
npx validate
-
Validate a target web-archive.txt using the latest schema version:
npx validate path/to/web-archive.txt
-
Validate a target web-archive.txt against a specific version of the specification (e.g. v0.1)
npx validate path/to/web-archive.txt --0.1
We are currently developing a bootstrap registry of known web archives, starting with IIPC members. As web archives publish their own web-archive.txt descriptors, this registry will be progressively updated to reflect authoritative, source-published records.
Developed by the Overbrowsing Research Group at the Institute for Design Informatics, The University of Edinburgh.
If you use, implement, or reference this specification, please cite it as 'web-archive.txt' and include clear attribution in publications, software, or documentation where appropriate.
web-archive.txt is licensed under Apache 2.0. For full licensing details, see the LICENSE file.
