The llms-txt-freshness check compares llms.txt URLs against the sitemap to measure coverage, with thresholds at 95% (pass) and 80% (warn).
For our Cloudflare docs, I'm intentionally omitting certain pages that exist in our sitemap from llms.txt because they have no value for agents. For example, https://developers.cloudflare.com/workers/reference/ is a directory listing page — it contains a few links to other pages but no substantive content. I believe it belongs in the sitemap (it's a real page on the site) but not in llms.txt (the content on this page - links to other pages - already exist in llms.txt).
In our internal audits I'm currently working around this by just hardcoding lower thresholds (pass ≥ 75%, warn ≥ 60%) in a local patch, but I'm not sure what the right universal solution is. Does it make sense to treat every sitemap page as something that should be in llms.txt? If not, whats a reasonable and universal way to determine which ones should and shouldn't be included?
Open to discussion and whatever direction makes sense for the library.
The
llms-txt-freshnesscheck comparesllms.txtURLs against the sitemap to measure coverage, with thresholds at 95% (pass) and 80% (warn).For our Cloudflare docs, I'm intentionally omitting certain pages that exist in our sitemap from
llms.txtbecause they have no value for agents. For example, https://developers.cloudflare.com/workers/reference/ is a directory listing page — it contains a few links to other pages but no substantive content. I believe it belongs in the sitemap (it's a real page on the site) but not inllms.txt(the content on this page - links to other pages - already exist in llms.txt).In our internal audits I'm currently working around this by just hardcoding lower thresholds (pass ≥ 75%, warn ≥ 60%) in a local patch, but I'm not sure what the right universal solution is. Does it make sense to treat every sitemap page as something that should be in llms.txt? If not, whats a reasonable and universal way to determine which ones should and shouldn't be included?
Open to discussion and whatever direction makes sense for the library.