Cap remote recipient fetches per incoming activity#3094
Conversation
Limit the number of outbound HTTP requests triggered by remote recipient URLs in to/cc/bcc fields to prevent abuse. Defaults to 5, filterable via `activitypub_max_remote_recipient_fetches`.
There was a problem hiding this comment.
Pull request overview
This PR mitigates abuse of the shared inbox endpoint by limiting how many outbound HTTP fetches can be triggered from remote recipient URLs embedded in incoming ActivityPub activities.
Changes:
- Add a per-activity cap (default: 5) on remote recipient lookups, filterable via
activitypub_max_remote_recipient_fetches. - Skip additional remote recipient fetches once the cap is reached while continuing to process same-domain recipients.
- Add a patch-level changelog entry documenting the fix.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| includes/rest/class-inbox-controller.php | Adds the remote-recipient fetch cap (and introduces a new filter) inside get_local_recipients(). |
| .github/changelog/fix-cap-remote-recipient-fetches | Documents the change as a patch “fixed” item. |
Instead of always fetching remote recipient URLs to check if they are collections, compare them against the actor's known followers URL from the cached Remote_Actors post. This avoids consuming a remote fetch slot for the most common collection URL and ensures followers are resolved even when the fetch cap is reached.
AS2 permits both `actor` and `followers` as either an IRI string or an inline object. Pass these through `object_to_uri()` before using them so the cached-followers detection still fires when peers serialize either field as an object, and `Remote_Actors::get_by_uri()` doesn't get an array passed into a `%s` placeholder.
Consult the persistent Remote_Actors cache for each remote recipient before counting against the fetch cap. A recipient that's already known to be a remote actor isn't a collection, so it contributes no local users and there's no reason to spend an HTTP slot rediscovering that. The cap now only bites genuinely-unknown URLs in a single activity, which is the abuse shape it's meant to block. Also extract the cached followers-URL lookup into a small helper to keep get_local_recipients readable.
With cache-first short-circuiting, the cap only counts genuinely unknown URIs in a single activity. 10 leaves headroom for thread replies that newly introduce several remote participants while still clipping unbounded audience floods.
The cached-actor short-circuit in get_local_recipients() previously ran Remote_Actors::get_by_uri() once per recipient, so a flood of unknown URIs in to/cc/bcc forced an O(N) burst of indexed SELECTs even though the HTTP-fetch cap stopped outbound traffic. Concurrent requests of that shape could saturate DB connections. Batch the lookup instead: pre-compute the set of candidate URIs once, ask Remote_Actors for all matches in a single chunked WHERE-IN query, then check membership in O(1) inside the loop. For normal activity sizes (1-5 recipients) this is equal or slightly faster than the per-recipient form. For adversarial payloads it caps the DB cost at one batched query (chunked at 200 placeholders), so a 10k-URI flood produces ~50 fast queries instead of 10k sequential ones. Cached recipients still don't burn the HTTP-fetch cap; no recipient ever gets dropped artificially. Also wraps three get_local_recipients tests in try/finally so a mid-test assertion failure doesn't leak filters/options into other tests, and adds direct unit coverage for the new Remote_Actors::get_existing_uris() helper. Changelog entry retyped from `fixed` to `security` to match the nature of the change.
Remote_Actors::get_actor() reads actor JSON from post_content first and falls back to the _activitypub_actor_json postmeta when post_content is empty (legacy storage path). The cached followers-URL lookup in get_local_recipients() only read post_content, so for any actor still stored in the legacy form the optimisation silently no-op'd, falling through to the HTTP fetch path. Mirror the same fallback so cached followers detection works on legacy and current actor records alike.
There was a problem hiding this comment.
This is looking good.
One question that came up was whether it’s too risky to drop incoming Activities because of large audience lists. From what I can tell, this patch doesn’t actually drop the Activity itself; it only stops doing additional remote recipient fetches once the cap is reached. Same-domain recipients still work, cached remote actors still short-circuit, and cached followers URLs can still resolve locally. That feels like a reasonable tradeoff to me.
That said, there are a few nitpicks that I would highlight:
- In
includes/rest/class-inbox-controller.php, the new inline-object normalization path foractor/followersstill isn’t covered by tests. The new PHPUnit cases all use string IRIs, so the main interoperability improvement described in the PR isn’t actually verified yet. Maybe worth adding some tests? - In that same file, I’d also like a test for the legacy
_activitypub_actor_jsonfallback inget_cached_followers_url(). That fallback is now important for the cached followers shortcut, we don't want that to start failing in the future. - There’s also a small perf regression here: we now look up the same remote actor multiple times in one request (
get_cached_followers_url(), public follower resolution, and again in the followers-URL branch). I’m wondering if we should pass the actor post through, or memoize it in-request, so we only do that lookup once. - Small doc cleanup: the new
activitypub_max_remote_recipient_fetchesfilter should get an@since unreleasedtag.
Maybe something else worth considering: should we log a message when the cap is reached? That may help folks (and ourselves) with some debugging.
…ests - Tag the new activitypub_max_remote_recipient_fetches filter with @SInCE unreleased. - Add tests for inline-object actor and followers fields, and for the legacy _activitypub_actor_json postmeta fallback in get_cached_followers_url().
Fires activitypub_remote_recipient_fetch_cap_reached once per activity on the first over-cap recipient, so site owners can route cap hits to their logging system of choice without polluting production logs by default.
Summary
to/cc/bccfields of incoming activities.activitypub_max_remote_recipient_fetches.Remote_Actorspost (it's a known actor, not a collection — no local recipients to resolve).actorandfollowersthroughobject_to_uri()so peers that serialize either field as an inline object are still matched correctly.Test plan
to/cc/bcc— all should be resolved as before.cc— followers should be resolved without an HTTP fetch, even if the cap is already reached.actororfollowersas an inline object (rather than a string IRI) are handled correctly.