Skip to content

Cap remote recipient fetches per incoming activity#3094

Merged
pfefferle merged 21 commits intotrunkfrom
fix/cap-remote-recipient-fetches
Apr 28, 2026
Merged

Cap remote recipient fetches per incoming activity#3094
pfefferle merged 21 commits intotrunkfrom
fix/cap-remote-recipient-fetches

Conversation

@pfefferle
Copy link
Copy Markdown
Member

@pfefferle pfefferle commented Mar 25, 2026

Summary

  • Limits the number of outbound HTTP requests triggered by remote recipient URLs in to/cc/bcc fields of incoming activities.
  • Defaults to 10 remote fetches per activity, filterable via activitypub_max_remote_recipient_fetches.
  • Skips the cap entirely for recipients we already know:
    • The actor's followers collection URL (read from the cached actor profile).
    • Any URL already cached as a Remote_Actors post (it's a known actor, not a collection — no local recipients to resolve).
  • Normalizes actor and followers through object_to_uri() so peers that serialize either field as an inline object are still matched correctly.
  • Prevents abuse where a crafted activity with many unknown remote recipients could trigger unbounded outbound requests from the server.

Test plan

  • Send an activity with fewer than 10 remote recipients in to/cc/bcc — all should be resolved as before.
  • Send an activity with more than 10 remote recipients — only the first 10 unknown URLs should trigger outbound fetches, the rest should be skipped.
  • Verify local (same-domain) recipients are unaffected by the cap.
  • Send an activity addressed to many recipients your instance has already cached — none of them should consume a fetch slot.
  • Send an activity with the actor's followers collection URL in cc — followers should be resolved without an HTTP fetch, even if the cap is already reached.
  • Verify cross-domain followers URLs (e.g., WordPress.com where actor and collection domains differ) are still detected from cached metadata.
  • Verify peers that serialize actor or followers as an inline object (rather than a string IRI) are handled correctly.

Limit the number of outbound HTTP requests triggered by remote
recipient URLs in to/cc/bcc fields to prevent abuse. Defaults to 5,
filterable via `activitypub_max_remote_recipient_fetches`.
Copilot AI review requested due to automatic review settings March 25, 2026 10:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR mitigates abuse of the shared inbox endpoint by limiting how many outbound HTTP fetches can be triggered from remote recipient URLs embedded in incoming ActivityPub activities.

Changes:

  • Add a per-activity cap (default: 5) on remote recipient lookups, filterable via activitypub_max_remote_recipient_fetches.
  • Skip additional remote recipient fetches once the cap is reached while continuing to process same-domain recipients.
  • Add a patch-level changelog entry documenting the fix.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
includes/rest/class-inbox-controller.php Adds the remote-recipient fetch cap (and introduces a new filter) inside get_local_recipients().
.github/changelog/fix-cap-remote-recipient-fetches Documents the change as a patch “fixed” item.

Comment thread includes/rest/class-inbox-controller.php Outdated
Comment thread includes/rest/class-inbox-controller.php Outdated
Instead of always fetching remote recipient URLs to check if they are
collections, compare them against the actor's known followers URL from
the cached Remote_Actors post. This avoids consuming a remote fetch
slot for the most common collection URL and ensures followers are
resolved even when the fetch cap is reached.
AS2 permits both `actor` and `followers` as either an IRI string or
an inline object. Pass these through `object_to_uri()` before using
them so the cached-followers detection still fires when peers serialize
either field as an object, and `Remote_Actors::get_by_uri()` doesn't
get an array passed into a `%s` placeholder.
Consult the persistent Remote_Actors cache for each remote recipient
before counting against the fetch cap. A recipient that's already
known to be a remote actor isn't a collection, so it contributes no
local users and there's no reason to spend an HTTP slot rediscovering
that. The cap now only bites genuinely-unknown URLs in a single
activity, which is the abuse shape it's meant to block.

Also extract the cached followers-URL lookup into a small helper to
keep get_local_recipients readable.
With cache-first short-circuiting, the cap only counts genuinely
unknown URIs in a single activity. 10 leaves headroom for thread
replies that newly introduce several remote participants while still
clipping unbounded audience floods.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comment thread includes/rest/class-inbox-controller.php
Comment thread tests/phpunit/tests/includes/rest/class-test-inbox-controller.php Outdated
Comment thread tests/phpunit/tests/includes/rest/class-test-inbox-controller.php Outdated
Comment thread tests/phpunit/tests/includes/rest/class-test-inbox-controller.php Outdated
The cached-actor short-circuit in get_local_recipients() previously
ran Remote_Actors::get_by_uri() once per recipient, so a flood of
unknown URIs in to/cc/bcc forced an O(N) burst of indexed SELECTs
even though the HTTP-fetch cap stopped outbound traffic. Concurrent
requests of that shape could saturate DB connections.

Batch the lookup instead: pre-compute the set of candidate URIs once,
ask Remote_Actors for all matches in a single chunked WHERE-IN query,
then check membership in O(1) inside the loop.

For normal activity sizes (1-5 recipients) this is equal or slightly
faster than the per-recipient form. For adversarial payloads it caps
the DB cost at one batched query (chunked at 200 placeholders), so a
10k-URI flood produces ~50 fast queries instead of 10k sequential
ones. Cached recipients still don't burn the HTTP-fetch cap; no
recipient ever gets dropped artificially.

Also wraps three get_local_recipients tests in try/finally so a
mid-test assertion failure doesn't leak filters/options into other
tests, and adds direct unit coverage for the new
Remote_Actors::get_existing_uris() helper.

Changelog entry retyped from `fixed` to `security` to match the
nature of the change.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread includes/rest/class-inbox-controller.php Outdated
Remote_Actors::get_actor() reads actor JSON from post_content first
and falls back to the _activitypub_actor_json postmeta when
post_content is empty (legacy storage path). The cached followers-URL
lookup in get_local_recipients() only read post_content, so for any
actor still stored in the legacy form the optimisation silently
no-op'd, falling through to the HTTP fetch path. Mirror the same
fallback so cached followers detection works on legacy and current
actor records alike.
Copy link
Copy Markdown
Member

@jeherve jeherve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good.

One question that came up was whether it’s too risky to drop incoming Activities because of large audience lists. From what I can tell, this patch doesn’t actually drop the Activity itself; it only stops doing additional remote recipient fetches once the cap is reached. Same-domain recipients still work, cached remote actors still short-circuit, and cached followers URLs can still resolve locally. That feels like a reasonable tradeoff to me.

That said, there are a few nitpicks that I would highlight:

  • In includes/rest/class-inbox-controller.php, the new inline-object normalization path for actor / followers still isn’t covered by tests. The new PHPUnit cases all use string IRIs, so the main interoperability improvement described in the PR isn’t actually verified yet. Maybe worth adding some tests?
  • In that same file, I’d also like a test for the legacy _activitypub_actor_json fallback in get_cached_followers_url(). That fallback is now important for the cached followers shortcut, we don't want that to start failing in the future.
  • There’s also a small perf regression here: we now look up the same remote actor multiple times in one request (get_cached_followers_url(), public follower resolution, and again in the followers-URL branch). I’m wondering if we should pass the actor post through, or memoize it in-request, so we only do that lookup once.
  • Small doc cleanup: the new activitypub_max_remote_recipient_fetches filter should get an @since unreleased tag.

Maybe something else worth considering: should we log a message when the cap is reached? That may help folks (and ourselves) with some debugging.

…ests

- Tag the new activitypub_max_remote_recipient_fetches filter with @SInCE unreleased.
- Add tests for inline-object actor and followers fields, and for the legacy _activitypub_actor_json postmeta fallback in get_cached_followers_url().
Fires activitypub_remote_recipient_fetch_cap_reached once per activity on
the first over-cap recipient, so site owners can route cap hits to their
logging system of choice without polluting production logs by default.
@pfefferle pfefferle requested a review from jeherve April 28, 2026 07:14
Copy link
Copy Markdown
Member

@jeherve jeherve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. 🚢

@pfefferle pfefferle merged commit 82796fd into trunk Apr 28, 2026
11 checks passed
@pfefferle pfefferle deleted the fix/cap-remote-recipient-fetches branch April 28, 2026 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants