Skip to content

feat: improve AI agent discoverability#8607

Merged
harsh62 merged 8 commits into
mainfrom
agent-readiness
Jul 3, 2026
Merged

feat: improve AI agent discoverability#8607
harsh62 merged 8 commits into
mainfrom
agent-readiness

Conversation

@harsh62

@harsh62 harsh62 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Improves how AI agents and crawlers discover and consume the Amplify docs, implemented entirely within what the static export + Amplify Hosting can serve today. These changes came out of running the site through an "agent readiness" scan and fixing every gap that has a legitimate, non-misleading solution.

All additions point at content or capabilities that actually exist (the generated llms.txt/markdown exports, the real awslabs/agent-plugins skill, AWS's public managed MCP server, and read-only browser tools backed by real data) — no stub endpoints or fabricated capabilities.

What's included

Discoverability

  • Content Signals in robots.txtContent-Signal: search=yes, ai-input=yes, ai-train=yes.
  • Link response headers (customHttp.yml, RFC 8288) advertising the API catalog, agent skills index, MCP server card, and the llms.txt index.

Agent discovery files (generated at build time, alongside robots.txt/sitemap.xml)

  • /.well-known/api-catalog (RFC 9727 linkset, application/linkset+json) with the required relations mapped to the artifacts the build produces: service-descllms-full.txt, service-docllms.txt, service-metasitemap.xml.
  • /.well-known/agent-skills/index.json (Agent Skills Discovery RFC v0.2.0) advertising the real amplify-workflow skill from awslabs/agent-plugins. Entries point at the docs/marketplace install page, so no sha256 is published (the page is install guidance, not a downloadable artifact).
  • /.well-known/mcp/server-card.json describing the public, no-auth AWS Knowledge MCP Server (https://knowledge-mcp.global.api.aws), which authoritatively indexes Amplify docs. An honest pointer to AWS's managed server — the card explicitly states this site does not host its own MCP server.

Markdown vending (the per-page .md files are already generated under /ai/pages/**)

  • Per-page autodiscovery — each Gen2 content page's <head> emits <link rel="alternate" type="text/markdown" href="/ai/pages/….md">, reusing MarkdownMenu's getMarkdownUrl mapping and gate (skips gen1/home/overview pages that have no .md twin).
  • Correct media type/ai/**/*.md served as text/markdown; charset=utf-8.

WebMCP (in-browser agent tools)

  • Registers read-only tools via document.modelContext (with navigator.modelContext fallback) so in-browser AI agents can call the site's key read actions:
    • get_current_page_markdown — returns the current page's generated Markdown twin
    • get_documentation_index — returns the llms.txt index
  • Both are backed by content the build already produces (real data, not stubs), feature-detected (silent no-op without WebMCP), and torn down on unmount via AbortSignal.

Out of scope (intentionally)

Several scanned standards require live services / DNS / viewer-request edge compute that don't exist behind a static public docs site, and publishing files for them would mislead agents that trust them:

  • OAuth Protected Resource (/.well-known/oauth-protected-resource) and auth.md — both declare that the site's resources are access-controlled and tell agents how to obtain tokens to reach them. docs.amplify.aws is fully public with no protected API and no agent auth. The scanner passes on the metadata file alone, but asserting a protected resource that doesn't exist could make agents refuse to read public docs or attempt pointless token flows. Deliberately skipped.
  • Markdown content negotiation (same-URL Accept: text/markdown) — requires reading a request header at the edge. Amplify Hosting rewrites match on path/query only (confirmed by redirects.json's own validator), and the managed CloudFront distribution exposes no viewer-request function hook to this repo. The per-page <link rel="alternate"> + text/markdown content-type above is the static-friendly equivalent (agents get the markdown at a sibling URL). True same-URL negotiation needs a CloudFront Function on a fronting distribution, owned outside this repo.
  • OAuth/OIDC discovery, WebMCP for write actions, DNS-AID, Web Bot Auth, commerce protocols (x402/ACP/UCP/MPP) — no protected API, no site-side mutating actions, DNS/key infrastructure owned elsewhere, and not an e-commerce site.

Testing

  • New unit tests for the API catalog (incl. RFC 9727 service-desc/service-meta structure), MCP server card, agent skills index, and the WebMCP component (no-op without API, tool registration, real fetch on execute).
  • Full unit suite passes (291 tests); tsc --noEmit clean on changed components.
  • Verified generated output for robots.txt, api-catalog, server-card.json, and agent-skills/index.json end-to-end.

harsh62 added 3 commits June 29, 2026 10:02
Add agent-readiness signals that the static docs build can legitimately serve:

- robots.txt: add Content-Signal directive (search/ai-input/ai-train=yes)
  declaring AI content-usage preferences (contentsignals.org)
- customHttp.yml: add RFC 8288 Link headers advertising the API catalog and
  the existing llms.txt documentation index; set linkset media type
- /.well-known/api-catalog: new RFC 9727 linkset pointing agents at the
  llms.txt / llms-full.txt exports and sitemap (generated at build time,
  mirroring how robots.txt and sitemap.xml are emitted in postBuildTasks)
- add unit tests for the API catalog generator
Add /.well-known/agent-skills/index.json (Agent Skills Discovery RFC v0.2.0)
advertising the real amplify-workflow skill from awslabs/agent-plugins.

- generate-agent-skills.mjs: build-time generator emitting the index, sourcing
  name/description from the upstream SKILL.md frontmatter; url points at the
  agent-plugins docs/marketplace install page, so no sha256 digest is published
  (the page is discovery/install guidance, not a downloadable artifact)
- wire writeAgentSkillsIndex into postBuildTasks (mirrors robots/sitemap/catalog)
- customHttp.yml: advertise the index via an additional Link relation
- add unit tests for the index generator
Surface the real AWS Knowledge MCP server and make the generated per-page
markdown twins discoverable and correctly typed.

MCP server card:
- generate-wellknown.mjs: emit /.well-known/mcp/server-card.json describing the
  public, no-auth AWS Knowledge MCP server (https://knowledge-mcp.global.api.aws,
  HTTP transport, tools) which authoritatively indexes Amplify docs. The card is
  an honest pointer to AWS's managed server, not a claim that docs.amplify.aws is
  itself an MCP endpoint.
- wire writeMcpServerCard into postBuildTasks; advertise via a Link rel

Markdown vending:
- customHttp.yml: serve /ai/**/*.md as text/markdown; charset=utf-8
- Layout: inject <link rel="alternate" type="text/markdown"> into each Gen2
  content page's <head> for automatic per-page discovery, reusing MarkdownMenu's
  getMarkdownUrl mapping (now exported) and mirroring its gate (skip gen1/home/
  overview pages that have no .md twin)

- extend generate-wellknown tests for the server card
@harsh62 harsh62 requested a review from a team as a code owner June 29, 2026 20:32
harsh62 added 4 commits June 29, 2026 17:11
The api-catalog linkset entry was missing the required service-desc
relation, so validators could not recognize a machine-readable service
description. Map the relations per RFC 9727:
- service-desc: llms-full.txt (complete machine-readable export)
- service-doc:  llms.txt (documentation index)
- service-meta: sitemap.xml

Each relation is now an array of { href, type } objects per Appendix A.
Register WebMCP tools via document.modelContext (with navigator.modelContext
fallback) so in-browser AI agents can call the docs site's key read actions:

- get_current_page_markdown: returns the current page's generated Markdown twin
- get_documentation_index: returns the llms.txt documentation index

Both tools are read-only and backed by content the build already produces, so
they return real data rather than stubs. The API is feature-detected and the
component renders nothing, making it a silent no-op in browsers without WebMCP.
Tools are torn down on unmount via an AbortSignal. Mounted from Layout on the
same Gen2 content pages that have a Markdown twin.
With trailingSlash: true, Amplify Hosting 301-redirects the extensionless
path /.well-known/api-catalog to /.well-known/api-catalog/, which has no file
and returns 404 -- so the RFC 9727 catalog was unreachable at its canonical
path. Files with an extension are served directly with a 200.

- Write the catalog as api-catalog.json (extensioned, served as 200)
- Add a 200-rewrite in redirects.json mapping /.well-known/api-catalog to that
  file so the canonical path resolves in place without a redirect
- Set application/linkset+json on both paths in customHttp.yml
- Add a test asserting the 200-rewrite contract

@osama-rizk osama-rizk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, well-scoped change — solid tests and an honest scope section. No crash-level bugs; inline comments below, most-impactful first. (Checked and NOT flagging: redirects.json — the AJV validator accepts status: "200" as a string and the rule sits ahead of the /<*> catch-all, so it resolves 200 today; only nit is the test asserts existence, not ordering.)

await fs.writeFile(catalogPath, generateApiCatalog());
console.log(`api-catalog written to ${catalogPath}`);
} catch (error) {
console.error(`Error writing api-catalog to ${catalogPath}:`, error);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swallowed write error ships a green build that advertises 404s. This catch logs and returns, so a failed write still passes. Meanwhile customHttp.yml emits a global Link header to /.well-known/api-catalog, whose links point at llms-full.txt/llms.txt. If a generator throws, agents follow a Link header to a missing file. writeSitemap/writeRobots do the same, but they aren't advertised in a response header — the blast radius is new here. Consider failing the build on write error (same applies to generate-agent-skills.mjs).

}

function getMarkdownUrl(route: string): string {
export function getMarkdownUrl(route: string): string {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getMarkdownUrl doesn't strip the query string. usePathWithoutHash splits on # only, so /react/build-a-backend/auth/?foo=bar/ai/pages/build-a-backend/auth/?foo=bar.md (404). This PR now routes this function into three consumers (<link rel="alternate">, the WebMcp fetch, and the copy/open menu), so one bad URL propagates everywhere.

Comment thread tasks/generate-agent-skills.mjs Outdated
type: 'claude-skill',
description:
'Build and deploy full-stack web and mobile apps with AWS Amplify Gen2 (TypeScript code-first). Covers auth (Cognito), data (AppSync/DynamoDB), storage (S3), functions, APIs, and AI (Amplify AI Kit with Bedrock) across React, Next.js, Vue, Angular, React Native, Flutter, Swift, and Android.',
url: `${domain}/react/develop-with-ai/agent-plugins/`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skill URL hardcodes /react/ for a platform-agnostic page. The discovery index is global; a docs restructure off /react/ silently publishes a 404 to agents with no error anywhere. Use a platform-neutral/canonical path.

Comment thread src/components/Layout/Layout.tsx Outdated
@@ -170,6 +171,14 @@ export const Layout = ({
children?.props?.childPageNodes?.length != 'undefined' &&

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isOverview guard is inert. children?.props?.childPageNodes?.length != 'undefined' compares a number to the string "undefined" — always true (meant typeof … !== 'undefined'). It works today only because the > 0 clause carries the whole predicate. Pre-existing, but this PR now depends on isOverview to gate markdownUrl, so it re-exposes it.

Comment thread src/components/WebMcp/WebMcp.tsx Outdated
* Fetch a markdown document and return its text, guarding against the SPA
* fallback returning an HTML page (e.g. a 404) instead of markdown.
*/
async function fetchMarkdown(url: string): Promise<string> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetchMarkdown duplicates MarkdownMenu.handleCopy. Both fetch a /ai/pages/*.md URL and reject the SPA HTML fallback with the identical regex pair (/^\s*<!doctype/i, /^\s*<html/i). Fix the fallback detection in one and the other rots. Extract a shared fetchPageMarkdown next to getMarkdownUrl — you already made that move for getMarkdownUrl.

Comment thread tasks/generate-wellknown.mjs Outdated

dotenv.config({ path: './.env.custom' });

const DOMAIN = process.env.SITEMAP_DOMAIN

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOMAIN + ROOT_PATH are copy-pasted across three task files (generate-sitemap, generate-wellknown, generate-agent-skills). Change the output dir or default domain and you edit three files in lockstep. Extract a shared tasks/build-constants.mjs.

Comment thread src/components/WebMcp/WebMcp.tsx Outdated

const register = async () => {
try {
await modelContext.registerTool(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both registerTool calls share one try. If the first rejects (e.g. a transient duplicate-name error during the abort/re-register on fast client-side nav — both names are route-independent), the second tool never registers and the catch swallows it, leaving the page with one or zero tools. Independent trys per tool isolate them.

Comment thread customHttp.yml
# Link headers advertise agent-discovery resources (RFC 8288 / RFC 9727):
# the API catalog, the agent skills index, the MCP server card, and the
# LLM-friendly documentation index.
- key: 'Link'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Link header sits on the global **/* block, so it rides every response (HTML, images, JSON), not just discovery routes — bytes on every request and a wider blast radius for the missing-file cases above. Worth a conscious choice vs. scoping it to the relevant paths.

- getMarkdownUrl: strip query string and hash before building the .md URL,
  so all three consumers (link rel=alternate, WebMcp, copy menu) get a valid
  URL for routes with ?query or #hash
- generators: rethrow write errors in the api-catalog, MCP server card, and
  agent-skills writers so a failed write fails the build instead of shipping a
  green build whose global Link header advertises a missing file
- WebMcp: register each tool in its own try so one rejected registration can't
  block the others; reuse the shared fetchPageMarkdown helper
- MarkdownMenu: extract shared fetchPageMarkdown (used by copy menu and WebMcp)
  so the SPA-HTML fallback guard lives in one place
- Layout: fix inert isOverview guard (typeof x !== 'undefined', not x != 'undefined')
- tasks: extract shared build-constants.mjs (DOMAIN, ROOT_PATH) and a
  CANONICAL_PLATFORM constant used for the agent-skills URL
- customHttp.yml: document the deliberate choice to keep the Link header on the
  global block (Amplify patterns are positive-match only; trailingSlash makes
  pages extensionless, so an html-only pattern would miss real page loads)
- tests: query/hash stripping, fetchPageMarkdown, isolated tool registration,
  and redirect-ordering coverage

@bobbor bobbor left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

only smaller nits that are not blocking. we can go forward with this

@mergify

mergify Bot commented Jul 3, 2026

Copy link
Copy Markdown

Tick the box to add this pull request to the merge queue (same as @mergifyio queue).

  • Queue this pull request

@harsh62 harsh62 merged commit f077b6c into main Jul 3, 2026
13 checks passed
@harsh62 harsh62 deleted the agent-readiness branch July 3, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants