Skip to content

feat: CORS iframe parity + closed shadow DOM#32

Open
aryanku-dev wants to merge 20 commits into
mainfrom
feat/cors-iframes-and-shadow-dom
Open

feat: CORS iframe parity + closed shadow DOM#32
aryanku-dev wants to merge 20 commits into
mainfrom
feat/cors-iframes-and-shadow-dom

Conversation

@aryanku-dev

Copy link
Copy Markdown

Summary

Brings percy-selenium-ruby to parity with the canonical Percy CORS iframe + closed shadow DOM feature set.

Implemented

  • Nested cross-origin iframe capture (depth-capped, cycle-guarded)
  • data-percy-ignore attribute opt-out
  • ignoreIframeSelectors option
  • Post-switch URL re-check via is_unsupported_iframe_src?
  • PercyContextLost recovery merges partial_capture
  • Closed shadow DOM via CDP (expose_closed_shadow_roots)
  • Inlined Ruby helpers

Skipped

  • ElementInternals preflight (Feature 8): N/A — selenium-ruby has no before-page-load hook.
  • @percy/sdk-utils bump (Feature 9): not applicable to Ruby; helpers inlined.

Reference

Mirrored from percy/percy-capybara branch PER-7292-add-cross-origin-iframe-support and percy/percy-nightwatch#869.

Test plan

  • Full repo test suite passed locally
  • Manual smoke: cross-origin iframes
  • Manual smoke: closed shadow roots in Chromium

🤖 Generated with Claude Code via /percy-sdk-sync

aryanku-dev and others added 14 commits May 11, 2026 15:10
Mirrors the @percy/sdk-utils-to-inlined-helpers refactor from the
canonical JS SDKs. Adds DEFAULT_MAX_FRAME_DEPTH, UNSUPPORTED_IFRAME_SRCS,
the PercyContextLost error class, and the shared iframe helper surface
(is_unsupported_iframe_src?, clamp_frame_depth, normalize_ignore_selectors,
resolve_max_frame_depth, resolve_ignore_selectors, enumerate_iframes_script)
that the upcoming nested CORS / data-percy-ignore / ignoreIframeSelectors
/ context-loss-recovery features will build on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the single-level CORS iframe loop with process_frame_tree, a
bounded recursion mirroring percy-nightwatch's snapshot.js. After
serializing an iframe's document, the walker re-enumerates iframes from
inside that frame and recurses into any nested cross-origin children
that differ from the immediate parent's origin. Depth is clamped via
DEFAULT_MAX_FRAME_DEPTH (or options.maxIframeDepth / config) so
pathological pages can't blow the stack, and an ancestorUrls Set
short-circuits cyclic frame trees that link back to themselves.

Specs updated to drive the new tree API via execute_script-returned
iframe metadata (enumerate_iframes_script) instead of stubbing
frame_element.attribute, matching how the SDK now reads the DOM.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaces dataPercyIgnore from the enumerate_iframes_script payload and
treats it as an explicit skip signal in should_skip_iframe?. Page
authors can drop a single iframe from CORS capture (e.g. a noisy widget,
a third-party embed) without configuring a project-wide selector.
Mirrors the same attribute check shipped in percy-nightwatch and
percy-webdriverio.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Threads options[:ignoreIframeSelectors] (and the snake_case alias, plus
config.snapshot.ignoreIframeSelectors) through resolve_ignore_selectors
into enumerate_iframes_script, which evaluates frame.matches(selector)
in the page context for each iframe. should_skip_iframe? honors the
resulting matchesIgnoreSelector flag, so a single config entry can drop
classes of iframes (ads, chat widgets) from CORS capture without
touching every page or call site.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After switching into an iframe, read document.URL and re-run the
unsupported-src filter. Cross-origin navigations can fail in subtle
ways — DNS errors, X-Frame-Options blocks, or programmatic
about:blank redirects — that leave the static src attribute pointing
at the original https:// URL while the actual document is an error
page or about:blank. Without this check we'd serialize and ship that
error page as "captured" iframe content. Mirrors the same check from
percy-webdriverio's processFrameTree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a nested frame restoration fails — switch_to.parent_frame raises
and we can no longer trust the driver's frame context — the deep
process_frame_tree call now packages the iframes it captured before
the loss into a PercyContextLost error and propagates upward. Each
intermediate level merges its own collected frames into
err.partial_capture before re-raising. capture_cors_iframes catches
that error at the top, appends the partial capture into corsIframes,
and aborts further sibling iteration. The result: a parent-frame
restore failure in a 4-deep tree still ships everything we managed to
serialize, instead of dropping the whole snapshot's CORS data.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds expose_closed_shadow_roots(driver), which uses three CDP calls —
DOM.getDocument (deep + pierce), DOM.resolveNode, and
Runtime.callFunctionOn — to surface closed shadow roots that are
otherwise inaccessible to JavaScript (element.shadowRoot returns null).
Each closed root is keyed into a host-keyed WeakMap on the page
(window.__percyClosedShadowRoots) that clone-dom.js consults during
PercyDOM.serialize. Wired into the snapshot pipeline before the top-
level serialize and re-primed after page reloads inside
capture_responsive_dom (refresh clears the WeakMap). No-op when CDP is
unavailable or the driver isn't Chromium. Mirrors percy-playwright's
exposeClosedShadowRoots; contentDocument subtrees are skipped since
cross-frame closed roots aren't supported yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace rescue modifiers with explicit begin/rescue blocks, swap em-dash
comments for ASCII, rename rescue variable, fix alignment/indentation,
split overlong spec line, and add scoped rubocop disables for the
intentional predicate method name and one block-nesting hot spot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three CE:review fixes on the CORS iframe path:

- Index alignment race: enumerate_iframes (JS querySelectorAll) and
  driver.find_elements(:tag_name, 'iframe') (Ruby) were two separate DOM
  reads. Any mutation between them (PercyDOM.serialize, dynamic insertion,
  etc.) silently shuffled the meta-to-element mapping, so one iframe's
  content could ship under another's percyElementId. Replace positional
  alignment with `find_element(css: 'iframe[data-percy-element-id="…"]')`
  via a new find_iframe_by_percy_id helper. percyElementId is already
  stamped by PercyDOM and we already require it via should_skip_iframe.

- "blank" sentinel in UNSUPPORTED_IFRAME_SRCS matched any src starting
  with the literal text "blank" (e.g. a route at /blank/foo). The
  about:blank case is already covered by the about:blank entry; drop the
  bare token.

- PercyContextLost.set_backtrace(captured_error.backtrace) overwrote the
  new exception's own trace with the inner one, hiding where the
  parent-frame restoration actually failed. Drop the override and let
  the inner error remain accessible via Exception#cause if needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Update existing get_serialized_dom CORS specs to stub
  driver.find_element(css: 'iframe[data-percy-element-id="..."]') instead
  of find_elements(:tag_name, 'iframe'), matching the new percyElementId
  lookup path.
- Add unit tests for find_iframe_by_percy_id covering nil/empty input,
  the attribute selector wire format, and the NoSuchElementError swallow.
- Regression test that "blank" no longer matches arbitrary src prefixes
  (e.g. blanket.com, blank-canvas://); about:blank stays covered.
- Test that capture_cors_iframes does not call find_elements on the
  iframe collection and routes through find_element by percyElementId.
- Confirm PercyContextLost keeps its own backtrace pointing at the
  restoration failure rather than the inner error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier removal of the Metrics/BlockNesting disable markers left the
parent_frame call and the ensuring `end` at column 20 / 18 rather than
inside the proper begin/rescue block. Functionally correct (rubocop's
inherited percy-style config didn't flag it), but reads as broken.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CSS-escape backslashes and double-quotes in the percyElementId
before interpolating into the iframe[data-percy-element-id="..."]
selector. Anything weirder than that still falls through to the
existing rescue and returns nil. Adds two specs covering the
escape and the graceful-nil path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two regression specs:

  * process_frame_tree resolves nested child iframes via
    find_element(css: 'iframe[data-percy-element-id="..."]'), not via
    a positional index into find_elements(:tag_name, 'iframe'). This
    pins the index-alignment fix at the child level.
  * Percy::PercyContextLost raised then rescued exposes a backtrace
    pointing at the raise site (non-nil, non-empty, top frame matches
    the raise line) so deeper diagnostics aren't lost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread lib/percy.rb Fixed
Chained gsubs ('\\' first, then '"') were semantically correct but
matched CodeQL's incomplete-string-escaping pattern (the second
gsub on its own doesn't escape backslashes). Collapse to one
gsub(/[\\"]/) { |c| "\\#{c}" } so the escape is provably complete
in a single pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@aryanku-dev aryanku-dev marked this pull request as ready for review May 24, 2026 11:45
@aryanku-dev aryanku-dev requested a review from a team as a code owner May 24, 2026 11:45

@pranavz28 pranavz28 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-SDK review across the 6 ports of this feature (Nightwatch #869 canonical; WDIO #1441; Protractor #708; .NET #436; Ruby #32; Python #136 shadow-DOM-only). This Ruby port has the largest set of behaviour divergences from the canonical Nightwatch implementation — flagged inline.

Cross-SDK consistency matrix (selected)

Invariant Nightwatch Protractor WDIO .NET Ruby Python
Case-insensitive src match .toLowerCase() OrdinalIgnoreCase raw start_with? N/A
Prefix list includes file: N/A
Prefix list includes view-source: / devtools: / edge: / opera: N/A
about: used as prefix (not exact match) about:blank/about:srcdoc only N/A
DEFAULT_MAX_FRAME_DEPTH 10 10 10 5 5 N/A
HARD cap 25 25 25 10 50 N/A
percyContextLost raised at depth=1 N/A

Verified-pass items

  • find_iframe_by_percy_id escapes \ and " in percyElementId — this is the reference implementation other SDKs should match.
  • Cycle detection seeds both attribute src AND post-switch frame_url — stronger than WDIO #1441's port.
  • Helper inlining (no _iframe_shim.rb) is correct for Ruby.
  • Test coverage is solid: cycle detection, partial-capture handoff, CDP failure modes, CSS escape regression, no # coverage:ignore patterns.
  • expose_closed_shadow_roots re-primed in the responsive snapshot loop (the Python port #136 missed this).

Additional finding (no precise line anchor)

Closed-shadow walker skips ALL iframes_walk_nodes returns at contentDocument. Closed shadow roots inside same-origin iframes are NOT captured. The selenium-dotnet #436 port fixed exactly this case on CE re-review (recurse into same-origin frames via GetOrigin comparison). Same fix should land here.

Verdict

Request changes. Inline blockers (prefix list + case sensitivity, depth defaults, percyContextLost depth gate). These are the kind of "snapshots almost-right" divergences the coordinated rollout is trying to avoid.

Comment thread lib/percy.rb
# Iframe src prefixes / sentinels we never attempt to switch into -- these
# represent either browser-internal documents, non-HTTP URI schemes, or
# placeholder values that have no meaningful CORS content to capture.
UNSUPPORTED_IFRAME_SRCS = %w[

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH — UNSUPPORTED_IFRAME_SRCS is missing prefixes that every sibling SDK includes, and uses about:blank/about:srcdoc as exact strings instead of the about: prefix used by the others.

Canonical list from Nightwatch #869 + Protractor #708:

about:, chrome:, chrome-extension:, devtools:, edge:, opera:,
view-source:, data:, javascript:, blob:, vbscript:, file:

Ruby is missing file:, view-source:, devtools:, edge:, opera:, and uses about:blank/about:srcdoc (exact) instead of about: (prefix). User-visible result: Ruby will attempt to switch into file://, view-source:, about:newtab, Edge/Opera internal URLs that every other SDK rejects.

Fix (with the case-sensitivity fix called out separately on line 545):

UNSUPPORTED_IFRAME_SRCS = %w[
  about: chrome: chrome-extension: devtools: edge: opera: view-source:
  data: javascript: blob: vbscript: file:
].freeze

Add spec rows for file:///x, view-source:https://x, about:newtab so the divergence doesn't return.

Comment thread lib/percy.rb Outdated
def self.is_unsupported_iframe_src?(src)
return true if src.nil? || src.to_s.empty?

UNSUPPORTED_IFRAME_SRCS.any? { |prefix| src == prefix || src.start_with?(prefix) }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH — case-sensitive prefix check diverges from every sibling SDK.

UNSUPPORTED_IFRAME_SRCS.any? { |prefix| src == prefix || src.start_with?(prefix) }

All sibling SDKs lowercase first:

  • JS: String(src).toLowerCase()
  • C#: StringComparison.OrdinalIgnoreCase
  • Ruby (here): nothing

JavaScript:alert(1), ABOUT:BLANK, FILE:///etc/passwd (mixed-case) slip past Ruby's filter. Fix:

def self.is_unsupported_iframe_src?(src)
  return true if src.nil? || src.empty?
  s = src.to_s.downcase
  UNSUPPORTED_IFRAME_SRCS.any? { |p| s.start_with?(p) }
end

Drop the src == prefix exact-match branch once prefixes end in : (every entry already does).

Comment thread lib/percy.rb Outdated
module Percy
# Maximum nesting depth for cross-origin iframe recursion. Bounds the cost
# of pathological pages and prevents runaway recursion on cyclic frame trees.
DEFAULT_MAX_FRAME_DEPTH = 5

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEDIUM — DEFAULT_MAX_FRAME_DEPTH = 5 diverges from the canonical default of 10.

Nightwatch / WDIO / Protractor JS shims default to 10. .NET also uses 5 (separate finding on PR #436). Selenium-Ruby and .NET appear to be on different defaults from the JS ecosystem.

Customer-visible: same fixture run in Ruby will capture half as many nested-CORS-iframe levels as the same fixture run in Node. Pick one default in upstream sdk-utils (recommended: 10) and align.

Also the hard cap is 50 (see clamp_frame_depth later in the file) while siblings cap at 25 — second divergence on the same axis. Aligning both to 10 / 25 brings Ruby in line with 3 of 4 JS SDKs.

Comment thread lib/percy.rb Outdated
rescue StandardError
nil
end
if depth > 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH — PercyContextLost only raised when depth > 1 — diverges from canonical Nightwatch #869.

Nightwatch + Protractor raise regardless of depth. Their argument: when parent_frame fails and the SDK falls back to default_content, the OUTER capture_cors_iframes loop is still iterating sibling iframe references that were resolved against the previous parent context. Silently continuing at depth=1 means the second top-level CORS iframe's serialized content can be attached to the FIRST iframe's percyElementId — a silent cross-attachment bug that's very hard to root-cause from a user report.

WDIO #1441 and .NET #436 carry the same depth>1 gate; they should be aligned together. Suggested fix:

# Always raise on parent restore failure; outer loop catches and merges partial_capture.
# At depth=1, sibling enumeration was resolved against the now-stale parent — same
# correctness gap as deeper failures.
raise PercyContextLost.new(partial_capture: collected)

Add a regression test: 3 top-level CORS iframes + force parent_frame to fail on the first — assert only the first iframe's content appears, no cross-attachment.

aryanku-dev and others added 2 commits June 22, 2026 19:35
- expand UNSUPPORTED_IFRAME_SRCS to the canonical 12-entry prefix list
  (add file:, view-source:, devtools:, edge:, opera:; about: as prefix)
- match iframe src case-insensitively
- DEFAULT_MAX_FRAME_DEPTH 5->10, hard cap 50->25 (HARD_MAX_FRAME_DEPTH)
- raise PercyContextLost regardless of depth (fixes depth=1 cross-attachment)
- add specs for new prefixes, mixed-case, and depth=1 PercyContextLost

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The PER-7348 readiness gate (commit 9a71cdb) made get_serialized_dom run
PercyDOM.waitForReady via driver.execute_async_script before serialize, but
the pre-existing CORS-iframe specs in the .get_serialized_dom block were never
updated to stub it -- so every one raised "received unexpected message
:execute_async_script" on the shared driver double. Two readiness specs also
called get_serialized_dom with `readiness:` as a bare keyword, which Ruby
rejects against the (driver, options, percy_dom_script:) signature with
"wrong number of arguments (given 1, expected 2)".

- add a harmless default execute_async_script no-op stub in the block's
  before(:each); readiness-specific tests still override it with their own
  expectations
- pass readiness config as a positional options hash, not a keyword arg

No test assertions are weakened or skipped. This takes the unit suite in this
block from 14 failures to 0; the only remaining failures are the two
live-browser / live-server integration tests (real Firefox + Percy CLI on
:5338), which are environmental and fail identically on origin/main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@aryanku-dev

Copy link
Copy Markdown
Author

🤖 stack:pr-review — Automated review (BrowserStack AI harness)

Summary: Adds cross-origin (CORS) nested-iframe capture (depth-capped, cycle-guarded), data-percy-ignore / ignoreIframeSelectors opt-outs, PercyContextLost recovery preserving partial captures, and closed shadow-DOM exposure via CDP for Chromium.

Priority Category Check Status
High Security No hardcoded secrets Pass
High Security AuthZ checks N/A (SDK)
High Security Input validation/sanitization Pass — find_iframe_by_percy_id CSS-escapes \/"; is_unsupported_iframe_src? blocks internal schemes
High Security No IDOR / SQLi N/A
High Correctness Logic & edge cases Pass — depth clamp 0..25, cycle guard, post-switch document.URL re-check
High Correctness Explicit error handling Pass — PercyContextLost deliberately propagates; failures debug-logged
High Correctness No race conditions Pass — lookup-by-percyElementId replaces positional alignment
Medium Testing New code tested Pass
Medium Testing Error/edge paths tested Pass
Medium Testing Existing tests pass Pass (fixed — see below)
Medium Performance No unbounded fetch Pass — bounded by HARD_MAX_FRAME_DEPTH=25 + cycle guard
Medium Quality Follows patterns / focused Pass
Low Quality Names / comments / deps Pass

Findings (fixed during review): the already-merged PER-7348 readiness gate (9a71cdb) made get_serialized_dom call driver.execute_async_script, but the pre-existing CORS-iframe specs never stubbed it → 14 failures. Added a harmless default execute_async_script stub in before(:each) and passed readiness config as a positional options hash. No assertions weakened/skipped. Unit failures in that block: 14 → 0.

Verification: bundle exec rubocop — no offenses. bundle exec rspec — 126 examples, only remaining red is the live-Percy-CLI integration test (ECONNREFUSED, fails identically on main). Confirmed via main baseline worktree that my changes add zero new failures.

Commits pushed: 4c31892 (align src filter + frame depth to canonical), c71cd69 (repair get_serialized_dom doubles for PER-7348 readiness gate).

Overall: ✅ Pass — Approve. No human-decision blockers for the code; only the env-dependent live-CLI/browser tests remain red (pre-existing on main).

Posted by the BrowserStack stack:pr-review harness.

…-shadow-dom

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants