Add JS Asset Auditor plugin with Playwright CLI#633
Add JS Asset Auditor plugin with Playwright CLI#633ChristianPavilonis wants to merge 15 commits intomainfrom
Conversation
Engineering spec for the /audit-js-assets . Covers sweep protocol, Chrome DevTools MCP tooling, heuristic filtering, slug generation, init and diff modes. Closes #606
Fix incorrect MCP tool name prefix, replace misused wait_for with
evaluate_script setTimeout, correct list_network_requests filtering to
use resourceTypes, resolve path derivation contradiction with consistent
/js-assets/{prefix}/{stem}.js formula, pin slug separator and base62
charset, add URL Processing section with normalization rules and
first-party boundary definition, tighten wildcard regex to require mixed
character classes, and move skill location to .claude/commands/.
Implement the /audit-js-assets command that sweeps a publisher page via Chrome DevTools MCP, detects third-party JS assets, and generates js-assets.toml entries. Includes a shared slug generation script (SHA-256 + base62) and adds MCP permission grants for navigate_page, list_network_requests, and close_page.
Move URL normalization, filtering, wildcard detection, slug generation, and TOML formatting into scripts/audit-js-assets.mjs. The skill now collects raw browser data and delegates processing to the script, replacing fragile LLM-side URL manipulation. Expand heuristic filter with Google ad rendering, ad fraud detection, ad verification, and reCAPTCHA categories. Auto-include target URL host as first-party. Add --no-filter flag. Fix semver regex to match alpha suffixes like 1.19.8-hcskhn.
Replace MCP-driven browser automation with a standalone Playwright CLI at tools/js-asset-auditor/audit.mjs. One command sweeps a publisher page, collects script URLs, processes them through the shared pipeline, and writes js-assets.toml. Refactor scripts/audit-js-assets.mjs to export processAssets() so both the stdin-based pipeline and the Playwright CLI share the same processing logic. Simplify the Claude skill from 115 to 59 lines — it now calls the CLI and formats the JSON summary.
Rewrite sweep protocol, implementation, and verification sections to describe the three-component architecture: Playwright CLI, processing library, and Claude Code skill wrapper. Add direct CLI invocation examples, --headed flag, first-party auto-detection verification, and ad-rendering filter verification steps.
Restructure into packages/js-asset-auditor/ as a self-contained Claude Code plugin with .claude-plugin/plugin.json manifest, skills/ directory, bin/ executable, and lib/ processing modules. The plugin provides the audit-js-assets skill and CLI automatically when enabled. Remove tools/js-asset-auditor/, scripts/audit-js-assets.mjs, and .claude/commands/audit-js-assets.md — all replaced by the plugin.
Enables installing the JS Asset Auditor plugin from this repo via /plugin marketplace add <org>/trusted-server followed by /plugin install js-asset-auditor.
Add --domain flag and fall back to inferring from the target URL when trusted-server.toml is not present. Enables using the plugin in any project without project-specific config.
Reflect the plugin layout at packages/js-asset-auditor/, update all file paths, document the domain resolution fallback chain (--domain flag > trusted-server.toml > infer from URL), and update skill invocation to use the namespaced /js-asset-auditor:audit-js-assets format.
New --config [path] flag auto-detects integrations (GPT, GTM, Didomi, DataDome, Lockr, Permutive, Prebid, APS) from swept script URLs and generates a trusted-server.toml with appropriate [integrations.*] sections. Auto-extracts fields like GTM container_id from query params and Permutive org/workspace IDs from URL paths. Fields needing manual input are marked with TODO comments.
Switch from headless-by-default to headed-by-default. Sites with bot protection (DataDome, Cloudflare, etc.) block headless browsers. The --headed flag becomes --headless for CI/automation use cases.
aram356
left a comment
There was a problem hiding this comment.
Summary
Additive Claude Code plugin (packages/js-asset-auditor/) with a Playwright-based CLI for sweeping publisher pages, a processing library, and integration auto-detection. No Rust changes. Review uncovered three blocking issues: the format-docs CI gate is red, the generated trusted-server.toml emits an invalid bidders = "" for Prebid, and the CLI arg parser silently consumes the next flag when a value is omitted. A handful of non-blocking refactor/hardening suggestions follow.
Blocking
🔧 wrench
- format-docs CI failing: prettier wants table column realignment in
docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md(heuristic filter table and detection patterns table). Fix:cd docs && npx prettier --write superpowers/specs/2026-04-01-js-asset-auditor-design.md. (inline at line 100) - Generated
bidders = ""is invalid for Rust Prebid config:PrebidConfig::biddersisVec<String>(crates/trusted-server-core/src/integrations/prebid.rs:69). Users who flipenabled = trueon a generated[integrations.prebid]block get a deserialization failure. (inline atdetect.mjs:266) --first-party/--settle/--output/--domainconsume the next flag: passing a value-taking flag without a value silently swallows the following arg. Reproduced. (inline ataudit.mjs:84)
Non-blocking
♻️ refactor
- Slug algorithm duplicated in
scripts/js-asset-slug.mjs— make it import frompackages/js-asset-auditor/lib/process.mjs. Silent drift here breaks KV lookups against the Rust proxy. (inline atscripts/js-asset-slug.mjs:78) - Stale help block in
audit.mjs— references--headed(not implemented; actual flag is--headless) and omits--config,--force,--domain. (inline ataudit.mjs:18)
🤔 thinking
readPublisherDomainconflates parse errors with missing file — a malformed but presenttrusted-server.tomlsilently falls through to URL-inferred domain, producing wrong slugs (publisher prefix depends on domain). DistinguishENOENTfrom parse failure. (inline ataudit.mjs:142)--configexistence check usesreadFileSync— usefs.existsSync. (inline ataudit.mjs:238)- No unit tests for
process.mjs/detect.mjs— both have nontrivial logic (wildcard regex, first-party boundary, heuristic filter with two match shapes, integration field extraction). Vitest already runs incrates/js/lib; adding fixtures there (or alongside the plugin) would catch drift, especially for the slug hash that must match the Rust proxy.
⛏ nitpick
- GTM match uses
pathname.includes("/gtm.js")—.endsWith("/gtm.js")removes theoretical ambiguity. (inline atdetect.mjs:37) formatTomlValuedoesn't escape"/\— fine today since inputs are static, butJSON.stringifyon the string branch is free escape handling for future patterns. (inline atdetect.mjs:226)
🌱 seedling
- Cross-language slug fixture test — the spec claims the JS slug must produce identical output to the Rust proxy's KV key derivation, but the Rust proxy lives on a separate branch. Once it lands, a shared-fixture test (same
{domain, url}→ same slug in both JS and Rust) is the only reliable guard against silent drift. Worth a follow-up issue.
CI Status
- cargo fmt: PASS
- cargo clippy: PASS
- rust tests: PASS
- vitest (
crates/js/lib): PASS - format-typescript: PASS
- format-docs: FAIL (see 🔧 above)
prk-Jr
left a comment
There was a problem hiding this comment.
Summary
This plugin is close, but there are three blocking behavior issues in the CLI/config pipeline that can mislead operators or produce unusable output. CI is green, but these cases need to be fixed before the tool is safe to rely on for onboarding and diff workflows.
Blocking
🔧 wrench
- Generated configs enable integrations before required fields are present (packages/js-asset-auditor/lib/detect.mjs:260)
- Diff mode keeps re-appending the same NEW assets on repeated runs (packages/js-asset-auditor/lib/process.mjs:214)
--configconflict only logs an error and still exits successfully (packages/js-asset-auditor/lib/audit.mjs:266)
aram356
left a comment
There was a problem hiding this comment.
Summary
Re-review against main after 66951b9. That commit cleanly addresses every finding from my prior review (format-docs, bidders = "", arg-parser flag-swallowing, slug duplication, stale help block, existsSync, GTM endsWith, TOML escaping, ENOENT-vs-parse distinction), with focused regression tests for each.
However, the 2026-04-22 review from @prk-Jr is not addressed by the latest push — I confirmed all three of their blockers in the worktree — and I found two additional blocking-class issues while re-reading the 18 changed files.
Blocking
🔧 wrench
enabled = trueforpartial/detect_onlyintegrations — generated config boot-fails on missing required fields (server_url,pub_id,app_id).packages/js-asset-auditor/lib/detect.mjs:260(inline).- Diff mode re-appends the same NEW assets on repeated runs — reproduced: same asset appears 6× after 3 runs because
parseExistingTomlignores commented suggestions.packages/js-asset-auditor/lib/process.mjs:214(inline). --configcollision exits 0 with success summary — violates spec, breaks automation.packages/js-asset-auditor/lib/audit.mjs:269(inline).--configdefault path collides with livetrusted-server.toml— with--force, silently overwrites the real publisher config with a TODO skeleton.packages/js-asset-auditor/lib/audit.mjs:136(inline).- Plugin unit tests are not wired into CI —
.github/workflows/test.ymlonly runs vitest incrates/js/lib. The 8 regression tests added in 66951b9 (the sole guard against drift on slug algorithm, integration detection, arg parsing) run locally only, so regressions will land onmainunnoticed. Add a step that runscd packages/js-asset-auditor && npm ci && npm teston PRs touching this directory, or unconditionally.
Non-blocking
🤔 thinking
- No diff-idempotence test —
processAssets(..., { diff: true })run twice against its own output should yieldsummary.new.length === 0. A fixture test inpackages/js-asset-auditor/test/process.test.mjswould have caught the re-append regression and will keep catching it. - SKILL.md still says "headless browser" —
packages/js-asset-auditor/skills/audit-js-assets/SKILL.md:24. The CLI has defaulted to headed since e0c7e0c. Misleading for sandbox users. data:/blob:script URLs produce"null"origin in slugs —new URL("data:…").origin === "null"soapplyWildcardsyieldsnull/<pathname>. Rare for<script src>, but one protocol guard is cheap.
♻️ refactor
readPublisherDomainhand-rolls a TOML scan (lib/audit.mjs:37-61) — rejects single-quoted strings and any non-^domain = "…"shape. A small TOML lib would be more robust. Non-urgent.
⛏ nitpick
- APS pattern uses
pathname.includes("/apstag")(lib/detect.mjs:152) — same class of ambiguity as the GTM case tightened in 66951b9. Consider anchoring.
CI Status
- cargo fmt: PASS
- cargo clippy / Analyze (rust): PASS
- cargo test: PASS
- vitest (
crates/js/lib): PASS - format-typescript: PASS
- format-docs: PASS
- browser integration tests: PASS
- CodeQL: PASS
- plugin tests (
packages/js-asset-auditor/test): NOT WIRED INTO CI — see blocking #5
|
Addressed the remaining review feedback in 00ce787 and replied/resolved the inline threads:
Validation run after the changes:
|
aram356
left a comment
There was a problem hiding this comment.
Summary
Re-review against main after 5b3481a. Every prior blocker from the 2026-04-22 (prk-Jr) and 2026-04-23 (aram356) reviews is cleanly addressed in 00ce787 + 5b3481a, with focused regression tests for each fix and the plugin test job now wired into PR CI. CI is green (14 checks). I verified all 16 plugin tests pass locally and prettier is clean on the new docs files.
Two issues warrant changes before merge:
data:/blob:script URLs produce garbage TOML entries that look like real[[js_assets]]blocks but reference unfetchable origins. Reproduced both cases throughprocessAssets(). Inline below.- Spec example for Lockr now contradicts the implementation — the doc shows
enabled = truefor an integration that the (corrected) generator now emits asenabled = false. Inline below.
Plus two non-blocking suggestions: replace the hand-rolled TOML scan in readPublisherDomain with a real parser (carried over — the new test now locks in the strict-double-quote-only limitation), and tighten the loose pathname.includes(\"/apstag\") APS pattern the same way GTM was tightened in 66951b9.
Blocking
🔧 wrench
data:/blob:URLs produce garbage TOML (packages/js-asset-auditor/lib/process.mjs:321, inline)- Spec example contradicts implementation (
docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md:284, inline)
Non-blocking
♻️ refactor
readPublisherDomainhand-rolled TOML scan (packages/js-asset-auditor/lib/audit.mjs:39-63, inline)
🤔 thinking
- Integration detection runs against unfiltered URLs —
lib/audit.mjs:305callsdetectIntegrations(scriptUrls, …)on the raw network capture, not the post-first-party set. If a publisher self-hosts their prebid bundle atcdn.publisher.com/prebid.js, a[integrations.prebid]block lands in the generated config. Not strictly wrong (publisher does run prebid), but it bypasses the first-party boundary the asset pipeline carefully enforces. Consider passing the third-party-filtered URL set instead.
⛏ nitpick
- APS pattern is loose (
packages/js-asset-auditor/lib/detect.mjs:170, inline) --first-partyaccepts unvalidated values (packages/js-asset-auditor/lib/audit.mjs:163) — splits on,with no normalization, so--first-party https://www.example.comsilently no-ops becausestripWwwdoesn't strip the protocol. Either reject non-hostname values or normalize vianew URL(value).hostname.
CI Status
- cargo fmt: PASS
- cargo clippy / Analyze (rust): PASS
- cargo test: PASS
- vitest (
crates/js/lib): PASS - format-typescript: PASS
- format-docs: PASS
- browser integration tests: PASS
- CodeQL: PASS
- js-asset-auditor tests: PASS (newly wired into CI in this PR)
| const assets = []; | ||
| const seenOrigins = new Set(); | ||
|
|
||
| for (const url of survivingUrls) { |
There was a problem hiding this comment.
🔧 wrench — data: and blob: script URLs produce garbage TOML entries.
Reproduced locally by feeding the URLs through processAssets():
data:text/javascript,console.log(1) → origin_url = "nulltext/javascript,console.log(1)"
blob:https://example.com/abc-123 → origin_url = "https://example.comhttps://example.com/abc-123"
new URL("data:…").origin === "null" and applyWildcards concatenates that with pathname, producing entries the proxy can never fetch. The summary still reports them as surfaced assets, so an operator could commit them without noticing.
Fix: skip non-http(s) URLs early in the loop:
for (const url of survivingUrls) {
let parsed;
try {
parsed = new URL(url);
} catch {
continue;
}
if (parsed.protocol !== "http:" && parsed.protocol !== "https:") {
continue;
}
// …existing applyWildcards / slug logic
}A fixture test in test/process.test.mjs covering data: and blob: inputs would lock the contract.
| container_id = "GTM-TRCJMD6" # auto-detected | ||
|
|
||
| [integrations.lockr] | ||
| enabled = true |
There was a problem hiding this comment.
🔧 wrench — Spec example contradicts the implementation after the partial/detect_only fix.
The spec shows:
[integrations.lockr]
enabled = true
sdk_url = "https://aim.loc.kr/identity-lockr-trust-server.js" # auto-detected
# app_id = "" # TODO: set your Lockr Identity app_idBut Lockr's todos always include app_id (lib/detect.mjs:104), so isIntegrationConfigComplete returns false and generateConfig correctly emits enabled = false. I confirmed by running the generator against https://aim.loc.kr/identity-lockr-trust-server.js:
[integrations.lockr]
enabled = false
sdk_url = "https://aim.loc.kr/identity-lockr-trust-server.js" # auto-detected
# app_id = "" # TODO: set your Lockr Identity app_id
Flip the spec example to enabled = false so the documented contract matches what the tool emits — otherwise future readers will treat the spec as the source of truth and re-introduce the prior bug.
| } | ||
| if (inPublisher) { | ||
| const match = line.match(/^domain\s*=\s*"([^"]+)"/); | ||
| if (match) return match[1]; |
There was a problem hiding this comment.
♻️ refactor — readPublisherDomain hand-rolls a strict TOML scan that rejects valid TOML.
The regex ^domain\s*=\s*"([^"]+)" only matches double-quoted single-line strings. Real trusted-server.toml files using domain = 'site.com', multi-line literals, or values containing escaped " will fail with the misleading "Could not find [publisher].domain in trusted-server.toml" — and the new test at test/audit.test.mjs:84 actively locks this strict behavior in.
Fix: use a TOML parser. smol-toml is ~5 KB and dependency-free; @iarna/toml is the broader-compat option. Either lets you replace the whole 25-line scan with:
import { parse } from "smol-toml";
export function readPublisherDomain(repoRoot, configPath = "trusted-server.toml") {
const resolvedPath = resolve(repoRoot, configPath);
const parsed = parse(readFileSync(resolvedPath, "utf-8"));
const domain = parsed.publisher?.domain;
if (typeof domain !== "string" || domain.length === 0) {
throw new Error(`Could not find [publisher].domain in ${configPath}`);
}
return domain;
}Then update the malformed-content test to feed truly malformed TOML rather than valid single-quoted TOML. (Carried over from the 2026-04-20 review; still present.)
| label: "Amazon Publisher Services", | ||
| match: (url) => | ||
| url.hostname === "c.amazon-adsystem.com" && | ||
| url.pathname.includes("/apstag"), |
There was a problem hiding this comment.
⛏ nitpick — APS pattern uses pathname.includes("/apstag"), the same loose match shape that was tightened for GTM in 66951b9.
The spec at line 253 says c.amazon-adsystem.com/aax2/apstag*. Today the matcher would also fire on /foo/apstag/bar, /apstag.js.bak, etc. Anchor it:
match: (url) =>
url.hostname === "c.amazon-adsystem.com" &&
url.pathname.startsWith("/aax2/apstag"),or pathname.endsWith("/apstag.js") if you want to allow other path prefixes.
Summary
packages/js-asset-auditor/with a standalone Playwright CLI that sweeps publisher pages for third-party JS assetstrusted-server.tomlconfig with--configflagtrusted-server.tomloptional with--domainflag for portabilityTry it out
1. Check out the branch
2. Install dependencies
3. Run the CLI directly
4. Use as a Claude Code plugin
Then in Claude Code:
CLI flags
--diffjs-assets.toml--settle <ms>--first-party <hosts>--domain <host>trusted-server.tomlor URL)--no-filter--headless--output <path>js-assets.toml)--config [path]trusted-server.tomlwith detected integrations--forceTest plan
js-assets.tomloutput--configand verify detected integrations in generatedtrusted-server.toml--diffagainst an existingjs-assets.tomland verify confirmed/new/missingtrusted-server.toml(e.g., from/tmp) and verify domain inference--configwithout--forceerrors when file already existsCloses #631