Skip to content

AI-readable docs: llms.txt, .md content negotiation, and CloudFront/Lambda delivery #328

@Nonnyjoe

Description

@Nonnyjoe

Overview

This issue tracks the full rollout of an AI/LLM readability layer on top of the Cartesi docs site. It covers static file generation at build time, HTTP content negotiation, CDN cache tuning, and the CloudFront + Lambda infrastructure that wires it all together.


.md Content Generation (build time)

  • Every doc page gets a .md file written alongside its HTML at build time (plugins/serve-markdown.js postBuild)
  • Generated-index pages (section indexes with no markdown source) also get a .md file — postBuild walks outDir, reads <title> from index.html, and writes a minimal .md
  • llms directive prepended at the top of every generated .md file pointing back to llms.txt
  • llms.txt built with correct base URL per branch via DOCS_DOMAIN env var
  • llms-full.txt — all docs concatenated into a single file for bulk ingestion
  • All llms.txt links corrected to end with .md
  • Dead links removed from llms.txt

Accept: text/markdown Content Negotiation

  • Dev server: GET /page.md and Accept: text/markdown header both serve raw markdown
  • Dev server: generated-index pages handled via async HTML fetch + <title> extraction
  • Lambda (docs-markdown-negotiation): rewrites Accept: text/markdown requests to .md paths, proxies to Amplify origin

CDN Cache Tuning

  • HTML pages: public, max-age=0, s-maxage=3500, must-revalidate — deployments propagate within the hour
  • JS / CSS / fonts (assets/**): public, max-age=31536000, immutable — safe to cache forever because filenames are content-addressed
  • .md files: public, max-age=0, s-maxage=3500, must-revalidate with Content-Type: text/markdown; charset=utf-8
  • llms.txt / llms-full.txt: public, max-age=0, s-maxage=3500, must-revalidate so index updates are visible promptly

Infrastructure

  • Amplify Gen 2 backend wires Lambda and CloudFront together via CDK
  • CloudFront distribution (MarkdownNegotiationStack) deployed with Lambda Function URL as origin
  • DOCS_DOMAIN defaults to staging.docs.cartesi.io when unset — staging deploys work out of the box
  • DNS cutover — staging.docs.cartesi.io CNAME → staging CloudFront distribution
  • ACM certificate (us-east-1) issued for docs.cartesi.io
  • DNS cutover — docs.cartesi.io CNAME → production CloudFront distribution (main branch)

Verification

  • curl -H "Accept: text/markdown" https://staging.docs.cartesi.io/cartesi-rollups/overview returns markdown
  • https://staging.docs.cartesi.io/cartesi-rollups/overview.md resolves directly
  • https://staging.docs.cartesi.io/llms.txt shows staging.docs.cartesi.io URLs
  • Same three checks pass on docs.cartesi.io after production cutover

Metadata

Metadata

Assignees

Labels

No labels
No labels
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions