Add closed shadow DOM capture via CDP#136
Conversation
Use CDP to discover closed shadow roots before DOM serialization. Closed shadow roots are inaccessible from JS (element.shadowRoot === null), but CDP's DOM domain can pierce them. We resolve each closed shadow root to a JS object and store it in a WeakMap that PercyDOM.serialize() reads. - Add expose_closed_shadow_roots() using CDP session - Add _walk_nodes() helper to traverse CDP DOM tree - Skip iframe contentDocument nodes (cross-frame not yet supported) - Non-fatal: catches exceptions for non-Chromium browsers and CDP errors - Called after PercyDOM injection, before DOM serialization - Add tests for walk_nodes, non-Chromium skip, CDP errors, closed roots Ported from percy/percy-playwright#609 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single try/except/finally so non-Chromium path (cdp_session=None) exercises the finally's False branch for 100% coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The finally block's if/except branches are exercised by tests (test_expose_non_chromium_browser and test_expose_detach_error_suppressed) but coverage.py's branch analysis doesn't track them through try/except/finally control flow correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
This PR is stale because it has been open for more than 14 days with no activity. Remove stale label or comment or this will be closed in 14 days. |
Iterate a list copy of CACHE.items() so a rewrite during the loop doesn't raise RuntimeError: dictionary changed size during iteration. Use .get for TIMEOUT_KEY / session_details so a partially-populated cache entry (e.g., a session whose details haven't been backfilled yet) doesn't KeyError us out of cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restore 100% coverage after adding the `if timestamp is None: continue` guard in cache.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The exact 3.10.3 patch isn't on the actions/setup-python@v6 cache, which is what GitHub's Automatic Dependency Submission workflow uses. Pinning to the 3.10 line resolves to the latest cached 3.10.x without affecting our test matrix (test.yml passes python-version explicitly). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # tests/test_screenshot.py
pranavz28
left a comment
There was a problem hiding this comment.
Cross-SDK review across the 6 ports of the iframe + closed-shadow-DOM rollout. This PR has the narrowest scope (closed-shadow-DOM only — no CORS iframe work), which is fine. Two HIGH findings inline + one MEDIUM around responsive-reload re-exposure.
Verified-pass items
- Hardcoded JS in
Runtime.callFunctionOn— no user-controlled string interpolation, low security risk. - 7 unit tests covering tree walking, non-Chromium skip, CDP errors, detach-error suppression.
- No module-level shared state; concurrent snapshots on different pages are isolated by WeakMap key.
cache.pydefensive cleanup (drive-by) is safe.
Verdict
Request changes. Two HIGH issues to align with selenium-dotnet #436's closed-shadow port (same-origin iframe walker + DOM.disable pairing). One MEDIUM around responsive-reload re-exposure (selenium-ruby #32 got this right; Python should mirror).
|
|
||
| def _walk_nodes(node, closed_pairs): | ||
| """Walk CDP DOM tree to find closed shadow roots, skipping iframe boundaries.""" | ||
| if "contentDocument" in node: |
There was a problem hiding this comment.
HIGH — Walker skips ALL iframes; selenium-dotnet #436 had the same initial bug and was fixed on CE re-review.
if "contentDocument" in node:
returnSame-origin iframes share the parent JS realm and the window.__percyClosedShadowRoots WeakMap that PercyDOM.serialize reads. Closed shadow roots inside same-origin iframes SHOULD be captured. Cross-origin iframes should still be skipped (different JS realm; resolveNode objectIds wouldn't belong to our execution context anyway).
selenium-dotnet's fix (commit fix(snapshot): recurse closed shadow walk into same-origin iframes):
- Same origin (
GetOrigin(documentURL) == pageOrigin) → recurse INTOcontentDocument. - Cross origin → skip.
- Missing
documentURL→ defensive skip.
Suggested Python port:
if "contentDocument" in node:
cd = node["contentDocument"]
document_url = cd.get("documentURL")
if not document_url:
return
if _same_origin(document_url, page_url):
_walk_nodes(cd, closed_pairs)
return_same_origin should use urllib.parse.urlparse to compare scheme + host + port. Add tests for all three branches (same-origin captured, cross-origin skipped, missing URL skipped) — mirroring the dotnet test suite added in that PR.
| try: | ||
| cdp_session = page.context.new_cdp_session(page) | ||
|
|
||
| cdp_session.send("DOM.enable") |
There was a problem hiding this comment.
HIGH — DOM.enable is called but DOM.disable is never invoked.
selenium-dotnet #436 and selenium-ruby #32 both pair enable/disable explicitly in their try/finally and gate disable on whether enable succeeded (so a failing enable doesn't emit a spurious disable).
cdp_session.detach() in the finally block does mostly release the domain in practice, but the explicit pairing is the documented pattern and matches every sibling SDK that uses CDP.
Suggested fix:
dom_enabled = False
try:
cdp_session.send("DOM.enable")
dom_enabled = True
# ... existing walk + resolve + register ...
finally:
if dom_enabled:
try:
cdp_session.send("DOM.disable")
except Exception:
pass
try:
cdp_session.detach()
except Exception:
passAlso add a test mirroring dotnet's DOM.disable lifecycle suite: DOM.disable is sent after a successful run, sent after a mid-walk failure, NOT sent when DOM.enable itself raised.
| page.evaluate(percy_dom_script) | ||
|
|
||
| # Expose closed shadow roots via CDP before serialization | ||
| expose_closed_shadow_roots(page) |
There was a problem hiding this comment.
MEDIUM — expose_closed_shadow_roots is called once but the WeakMap is wiped on responsive reloads.
Responsive captures reload the page at each width and re-evaluate percy_dom_script. The window.__percyClosedShadowRoots WeakMap is gone after page.reload(), so closed shadow roots will be missing in responsive DOMs at widths beyond the first.
selenium-ruby #32 explicitly re-primes after refresh in its responsive snapshot loop (lib/percy.rb:549). Search every page.evaluate(percy_dom_script) and page.reload() site in this file and precede each with expose_closed_shadow_roots(page).
Add a unit test that drives the responsive capture path through change_window_dimension_and_wait and asserts the WeakMap is re-populated at the second width.
|
This PR is stale because it has been open for more than 14 days with no activity. Remove stale label or comment or this will be closed in 14 days. |
…air DOM.disable, re-prime WeakMap on responsive reload Addresses reviewer feedback on PER-7292 closed-shadow-DOM CDP capture, mirroring percy-selenium-dotnet #436 and percy-selenium-ruby #32: - Walker no longer skips ALL iframes. Same-origin child frame documents share the parent JS realm and the window.__percyClosedShadowRoots WeakMap, so we recurse INTO them; cross-origin frames (different realm) and contentDocuments without a resolvable origin are skipped. Adds _get_origin/_same_origin helpers (scheme+host+port via urlparse). - Pair DOM.enable with DOM.disable in the finally block, gated on a dom_enabled flag so a failing enable never emits a spurious disable. - Re-prime the closed-shadow WeakMap after page.reload() in the responsive capture loop (reload creates a new document and wipes it). Adds unit tests for all three walker branches (same-origin captured, cross-origin skipped, missing-URL skipped), the DOM.disable lifecycle (sent on success, sent after mid-walk failure, NOT sent when enable threw), and the responsive-reload re-prime. Lint 10.00/10, 100% branch coverage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🤖 stack:pr-review — Automated review (BrowserStack AI harness)Summary: Adds closed-shadow-DOM capture via CDP for Chromium (
Findings: no Fail. Non-blocking note: dotnet wraps each Verification: pylint 10.00/10; full unittest suite 95 OK; coverage 100%. The three original reviewer threads (same-origin iframe recursion, DOM.disable pairing, WeakMap re-prime) all hold. Commits: the fixes are at Overall: ✅ Pass — Approve. No human-decision blockers.Posted by the BrowserStack |
Summary
expose_closed_shadow_roots()using CDP to discover and expose closed shadow roots_walk_nodes()helper for CDP DOM tree traversalPorted from percy/percy-playwright#609
Test plan
🤖 Generated with Claude Code