RFC 089: Identifiers API#156
Open
kenoir wants to merge 22 commits into
Open
Conversation
Read-only canonical <-> source identifier translation over the RFC 083 ID Registry, for the IIIF/DDS (RFC 085) and requesting (RFC 088) consumers. Carries the OpenAPI contract alongside the RFC (openapi.yaml + a rendered openapi.md via a small uv project, following the RFC 088 pattern) so the proposal stands alone without the private prototype repository. Covers the contract, AWS architecture, API-key auth + usage-plan metering, the caching topology, and the live-data findings (folio-instance aliases present; folio-item-id absent, so the requesting translation has no data yet). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The concern is the cost of database (Aurora) queries, not policing a per-consumer billing quota. That inverts the caching strategy: an edge (CloudFront) cache that serves hits without touching the database is now preferred, rather than rejected for breaking metering. Recasts the per-consumer story as API keys for identity / cost attribution plus a throttle as a database safety valve, and reorients the caching open question toward hit-ratio, throttle sizing and cost attribution. Also removes the detailed real-data-findings list (kept in the prototype docs), leaving a one-line pointer. Contract unchanged (openapi.yaml/openapi.md untouched). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replaces em dashes with plain punctuation throughout and tones down a few
flourishes ("single translation membrane", "exactly the win", "evaporates").
Directional notation (the migration and lookup arrows) is kept. No change to the
contract, the decisions, or the meaning.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Removes the decision-log section and its table of contents entry, and renames the contract-summary heading to "API Contract" (the separate "API contract (OpenAPI)" section is unchanged). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Avoids two similarly-named sections after "The contract" became "API Contract". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
agnesgaroux
reviewed
Jun 19, 2026
agnesgaroux
reviewed
Jun 19, 2026
agnesgaroux
reviewed
Jun 19, 2026
The reverse-lookup 200 response is oneOf [CanonicalIdRef, IdentifierSet] but render_docs.py only handled single $ref bodies, so the generated table showed 'n/a'. Render oneOf/anyOf as the alternatives joined by an escaped pipe (allOf as an intersection) and regenerate openapi.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A reviewer asked whether '400 ... unsupported enum value' contradicts the rule that an unknown open-set sourceSystem yields 404. It does not: type is the only enum-constrained parameter. Name it explicitly to remove the ambiguity. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… UUID) The Sierra item number is the predecessor that lets the new FOLIO item UUID inherit the existing canonical id, matching the work-level pattern. The text had the direction reversed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…and) Record the decision rather than leaving it open: SourceIdentifier.type is scoped to the three catalogue-entity types the API needs (Work/Image/Item) and the enum is extended on demand, rather than modelling the full registry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…question Records that digitisation metadata ingestion fetches mostly unique ids while the Items API is more likely to repeat requests, grounding the hit-ratio sub-question in the concrete clients paul-butcher raised in review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…en question Records paul-butcher's specific-sibling include idea (?include=sierra-system-number) as a related projection to settle alongside the bare-value reverse lookup: more cacheable, but only in the immutable new-to-old direction, returns a filtered set given the one-to-many registry, and must return canonical with a 200 when the requested sibling is absent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove the Lambda arch row from the decisions table and the two inline ARM64 references; we don't pin Lambda architecture elsewhere in the estate, so it doesn't belong as a decision here. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Records agnesgaroux's review point that a bare value is not only expensive to index but can be genuinely ambiguous: the same SourceId can appear under different source systems and resolve to different canonical ids, so a bare-value query may have to return multiple candidates or force disambiguation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Records agnesgaroux's review point: FOLIO records carry both a UUID and an HRID, so confirm which the OAI-PMH feed delivers. The registry can hold both forms per item, so the Minter could record both and this API would serve HRID <-> UUID translation, but okapi resolves the two natively so storing both is an optimisation, not a requirement. Decision sits with the catalogue-pipeline workstream. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lift the two parallel Context concepts (canonical-first principle, the two consumers) to subheadings, and rephrase the remaining standalone bold sentence-starters (service boundary, schema finding, edge caching, freshness) into formal prose. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Decide the isAlias vs obsolete question: the API exposes the full sibling set with isAlias and does not model an obsolete flag. Records that the two are different axes (in the Sierra->FOLIO migration the isAlias=false original is the retired id and the isAlias=true alias is the live one), so this is a decision not to model retired-ness rather than a claim that isAlias encodes it. Drops the now settled item from the RFC 085 contract-edges next step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…demand) Reword Q6 from an open question into a recorded decision, matching Q5. Soften the Open-questions intro now that two items are settled, and drop the type enum from the RFC 085 contract-edges next step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Settle Q7: do not hoist a convenience top-level type. A single top-level value would have to pick one row's type for a mixed-type canonical id and could contradict the others, so the per-row representation is kept and the mixed-type ambiguity is left to consumers rather than resolved here. Drop Q7 from the RFC 085 next step, leaving only the bare-value lookup. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Narrow Q4 against RFC 085's actual requirement (WorkID-level identity lookup, sourceSystem as an optional qualifier, full sibling-set response, no new shape) and decide not to add the unqualified bare form now: sourceSystem stays a required key component, added only if a consumer explicitly requires it, the cost (secondary SourceId index plus cross-system ambiguity) being the reason. The related specific-sibling include is deferred on the same basis. Drop the now resolved bare-value item from the next steps. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reverse open question 7: add a convenience top-level `type` to IdentifierSet, populated from the original (the single isAlias=false) row, so consumers can read a canonical id's type without scanning the set. Exactly one row is isAlias=false, so the source is unambiguous; with cross-type predecessors the top-level value reflects the original and may differ from a later alias. Update the spec (add the property, required, enum), regenerate openapi.md, and add the field to the README example and field docs. Also a formatting pass on the open questions: prefix the resolved items (4-7) with Decided and reword the intro, so it is clear at a glance which questions carry a decision. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Preview
View the rendered RFC on this branch:
What does this change?
The proposal is a small, read-only Identifiers API. Wellcome Collection gives every catalogue thing (a work, an image, an item) a stable public "canonical" id, and keeps a registry recording which underlying source ids that canonical id was built from. This API does one job: given a canonical id it returns the source id(s) behind it, and given a source id it returns the canonical id (optionally with its siblings). It only ever reads that registry; it never creates or changes ids.
It exists because of the Sierra/CALM to FOLIO/Axiell migration. As records move between systems a single canonical id accumulates several source ids over time (an original plus inherited "predecessor" aliases), and a couple of internal services sit right at the edges where that translation has to happen: the IIIF viewer needs to turn old b-numbers and CALM refs into the canonical id it presents under, and requesting needs to turn a canonical item id into the FOLIO UUID a hold is placed on, and back again. The guiding principle is that everything public speaks canonical and source ids only appear at those two edges (ingest and the FOLIO boundary). Rather than have each consumer re-derive the mapping or query the catalogue by source id, this API is the single shared place that translation lives. Because the main running cost is database queries, it is also the natural place to cache aggressively (at the edge, to keep requests off the database) and to attribute that database cost to the consumers driving it.
How it relates to the other RFCs:
The RFC is written to stand on its own and carries the API contract alongside it, so it can be reviewed without access to the closed discovery/prototype repository where the working prototype lives.
Files added:
rfcs/089-identifiers-api/README.md: the RFC document.rfcs/README.md: refreshed RFC listing table (RFC 089 row added).rfcs/089-identifiers-api/openapi.yaml: the OpenAPI 3.0 spec for the two lookup operations (the source of truth).rfcs/089-identifiers-api/openapi.md: generated human-readable rendering of the spec, browsable on GitHub without a Swagger/Redoc renderer.render_docs.py,pyproject.toml,.python-version,.gitignore,uv.lock: a self-containeduvproject that validatesopenapi.yamland regeneratesopenapi.md.How to test
rfcs/089-identifiers-api/README.mdand review the contract, architecture, caching strategy and open questions..scripts/validate_rfc.py..scripts/create_table_summary.py --check-readme.uv run python render_docs.py: this validatesopenapi.yamlagainst the OpenAPI spec validator and rewritesopenapi.md.How can we measure success?
No measurable runtime success criteria; this is a documentation RFC. Success is the RFC being reviewed and providing a clear, self-contained contract and architecture that the team can align on, and a decision on whether this service is the access mechanism for identifier translation in RFC 088.
Have we considered potential risks?