Skip to content

llms-txt-freshness: how should coverage handle intentionally excluded pages? #46

@mvvmm

Description

@mvvmm

The llms-txt-freshness check compares llms.txt URLs against the sitemap to measure coverage, with thresholds at 95% (pass) and 80% (warn).

For our Cloudflare docs, I'm intentionally omitting certain pages that exist in our sitemap from llms.txt because they have no value for agents. For example, https://developers.cloudflare.com/workers/reference/ is a directory listing page — it contains a few links to other pages but no substantive content. I believe it belongs in the sitemap (it's a real page on the site) but not in llms.txt (the content on this page - links to other pages - already exist in llms.txt).

Image

In our internal audits I'm currently working around this by just hardcoding lower thresholds (pass ≥ 75%, warn ≥ 60%) in a local patch, but I'm not sure what the right universal solution is. Does it make sense to treat every sitemap page as something that should be in llms.txt? If not, whats a reasonable and universal way to determine which ones should and shouldn't be included?

Open to discussion and whatever direction makes sense for the library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions