reddit: document blanket 403 walls + JSON-via-browser fallback#429
Open
ardasisbot wants to merge 1 commit into
Open
reddit: document blanket 403 walls + JSON-via-browser fallback#429ardasisbot wants to merge 1 commit into
ardasisbot wants to merge 1 commit into
Conversation
…1.5) Anonymous .json requests can be 403-blocked wholesale at the CDN level (observed June 2026, datacenter IP, browser UA made no difference). Document the recovery: navigate .json URLs in the user's Chrome and parse document.body.innerText. Also adds the subreddit search.json endpoint, thread-comments params, stale-session retry, and -c shell-quoting trap. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
1 issue found across 1 file
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="agent-workspace/domain-skills/reddit/scraping.md">
<violation number="1" location="agent-workspace/domain-skills/reddit/scraping.md:51">
P2: Path 1.5 documentation for `/comments/<id>.json` omits `kind: "more"` entries in `data[1]["data"]["children"]`, giving an incorrect data-shape guarantee that could cause KeyErrors in agent-generated code. The existing Path 1 section already correctly documents `kind: "more"` for the same endpoint.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
| Useful JSON endpoints beyond single posts: | ||
|
|
||
| - **Subreddit search:** `/r/<sub>/search.json?q=<query>&restrict_sr=on&sort=top&t=month&limit=25&raw_json=1` — `q` supports quoted phrases and `OR` (`q=tax efficient OR "tax loss harvesting"`, URL-encoded). `t` ∈ hour/day/week/month/year/all. | ||
| - **Thread + top comments:** `/r/<sub>/comments/<id>.json?limit=10&sort=top&depth=1&raw_json=1` — `data[1]["data"]["children"]` are top-level comments (`body`, `score`, `author`); filter out `stickied`. |
Contributor
There was a problem hiding this comment.
P2: Path 1.5 documentation for /comments/<id>.json omits kind: "more" entries in data[1]["data"]["children"], giving an incorrect data-shape guarantee that could cause KeyErrors in agent-generated code. The existing Path 1 section already correctly documents kind: "more" for the same endpoint.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At agent-workspace/domain-skills/reddit/scraping.md, line 51:
<comment>Path 1.5 documentation for `/comments/<id>.json` omits `kind: "more"` entries in `data[1]["data"]["children"]`, giving an incorrect data-shape guarantee that could cause KeyErrors in agent-generated code. The existing Path 1 section already correctly documents `kind: "more"` for the same endpoint.</comment>
<file context>
@@ -29,6 +29,32 @@ Fails on:
+Useful JSON endpoints beyond single posts:
+
+- **Subreddit search:** `/r/<sub>/search.json?q=<query>&restrict_sr=on&sort=top&t=month&limit=25&raw_json=1` — `q` supports quoted phrases and `OR` (`q=tax efficient OR "tax loss harvesting"`, URL-encoded). `t` ∈ hour/day/week/month/year/all.
+- **Thread + top comments:** `/r/<sub>/comments/<id>.json?limit=10&sort=top&depth=1&raw_json=1` — `data[1]["data"]["children"]` are top-level comments (`body`, `score`, `author`); filter out `stickied`.
+- `raw_json=1` stops Reddit HTML-escaping `&`, `<`, `>` in text fields.
+
</file context>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a Path 1.5 to the reddit scraping skill: fetching reddit's
.jsonendpoints by navigating them in the user's Chrome and parsingdocument.body.innerText.Why
Field-tested 2026-06-11: reddit.com returned 403 for every anonymous
.jsonrequest from a datacenter IP - browser User-Agent made no difference, and the response is an HTML challenge page, not JSON. The existing Path 1 (http_get) only documents 401/429 failure modes, so an agent hitting the blanket 403 wall has no documented recovery. Navigating the same URLs in the user's real browser session passes cleanly.Also adds:
qsupportsORand quoted phrases,t=monthetc.) for topic sweeps?limit=10&sort=top&depth=1&raw_json=1)Runtime.evaluate timed outmid-loop →ensure_real_tab()+ retry)browser-harness -c '...'🤖 Generated with Claude Code
Summary by cubic
Documented Reddit’s blanket 403 on anonymous
.jsonrequests and added a JSON‑via‑browser fallback (Path 1.5) that uses the user’s Chrome to fetch.jsonand parsedocument.body.innerText. This gives the scraping skill a reliable path whenhttp_getis blocked..jsonin a real tab and parsedocument.body.innerTextto bypass CDN 403s./r/<sub>/search.jsonwithq,t,limit,raw_json=1) and thread + top comments (/comments/<id>.json?...).Runtime.evaluate timed out), callensure_real_tab()and retry.browser-harness -c; use double quotes or.format().Written for commit 39bfbb0. Summary will update on new commits.