From e66ded38d9ad4ee631a5af7bc136d6433eccba8e Mon Sep 17 00:00:00 2001 From: Siddharth Nashikkar Date: Mon, 22 Jun 2026 22:16:18 -0400 Subject: [PATCH 1/5] docs: rewrite the README for end users Lead with what the tool decides and why, then install, the CLI with sample output, the library, and the agent skill, with an honest requirements and scope section. --- README.md | 114 +++++++++++++++++++++++++++++++++++------------------- 1 file changed, 74 insertions(+), 40 deletions(-) diff --git a/README.md b/README.md index d64ee92..7297713 100644 --- a/README.md +++ b/README.md @@ -1,28 +1,31 @@ # LocalContextRouter -> Stop paying the vision-token tax. Decide locally — text, OCR, or vision — *before* a document ever reaches a multimodal LLM. +Decide locally how each page of a document should reach a multimodal model: +as extracted text, on-device OCR, or a rendered image. That keeps you from +paying for vision tokens on pages that are only text. -LocalContextRouter is a **preflight layer** for document-heavy LLM workflows. Given a -PDF or image, it inspects the content on your machine and decides the cheapest path -that still preserves accuracy: +A multimodal model reads a PDF by pulling its text *and* rendering every page to +an image, then billing for both. On a text page that image runs roughly +1,300 to 4,800 tokens while the same page as plain text is 400 to 800. For a +text-dominant document that is several times the cost for nothing extra. +LocalContextRouter does the cheap work on your machine first and tells you what +each page actually needs. -- **Text-layer PDF** → extract text locally (near-free). -- **Scanned / image-only page** → OCR on-device with Apple Vision. -- **Chart / table / diagram / layout-heavy page** → keep the page as an image for the vision model, where the pixels actually carry meaning. +It does not call a model. It returns a per-page decision and the text to send; +your application still makes the call. -It never calls an LLM itself. It prepares the cheapest faithful context and hands you -back a routing decision plus a token-savings estimate. Your application still owns the -model call. +## How it decides -## Why +For each page: -Multimodal models read a PDF by extracting its text *and* rendering every page to an -image, then billing for both. A text-heavy page sent as an image can cost -**1,300–4,800 tokens**; the same page as extracted text costs **400–800**. For -text-dominant documents that is a 2–10× tax for zero added signal. +- A usable text layer that is mostly prose: use the extracted text. +- A text layer dominated by a table, chart, or diagram: send the page as an + image, where the layout carries the meaning. +- No usable text, such as a scan or a photo: recognize it on-device with + Apple's Vision framework. -LocalContextRouter spends cheap local compute to avoid that tax — and only escalates -to vision when the page genuinely needs it. +The result also reports how many tokens you saved against sending every page as +an image. ## Install @@ -30,48 +33,79 @@ to vision when the page genuinely needs it. pip install localcontextrouter ``` -The macOS wheel bundles the on-device OCR binary (`lcr-ocr`, a universal2 build), -so OCR works out of the box — no extra setup. To override it (e.g. a locally built -binary), set `LCR_OCR_BIN` to its path. +macOS only. The wheel bundles a universal (Apple Silicon and Intel) OCR binary, +so text recognition works with no extra setup. -## Use +## Command line -There is no server and no background process — everything runs on demand and exits. +```sh +localctx invoice.pdf +localctx invoice.pdf --json +localctx scan.png +``` -### Command line +`localctx invoice.pdf` prints each page, the source chosen for it, and the +tokens saved: -```sh -localctx report.pdf # human summary + tokens saved -localctx report.pdf --json # machine-readable -localctx report.pdf --vision-dir ./out # render visual pages to ./out +``` +Document: invoice.pdf (3 pages) +Tokens saved vs sending every page as an image: 3085 + +Page 1 [text] +ACME Corp, Invoice #4471 ... + +Page 2 [vision] +Quarterly results by segment ... + +Page 3 [ocr] +SCANNED RECEIPT TOTAL 42.00 ``` -### Library +Add `--vision-dir DIR` to render the pages that should go to the model as images +into `DIR`; their paths are then listed in the output and the JSON. + +## In code ```python from localcontextrouter import route_pdf, Source -result = route_pdf("report.pdf") +result = route_pdf("invoice.pdf") for page in result.pages: if page.source is Source.VISION: - ... # send the rendered page image to the model + send_image(page.index) # the page's meaning is visual else: - ... # use page.text (extracted or OCR'd) + send_text(page.text) # extracted or recognized text -print(result.text) # all text-routable pages joined -print(result.tokens_saved) # tokens avoided vs sending every page as an image +print(result.tokens_saved) ``` -### Agent Skill +Every page also carries an estimate of its cost both ways, as +`page.tokens.text_tokens` and `page.tokens.image_tokens`. + +## As an agent skill + +`local-context-router` is an Agent Skill in the open `SKILL.md` format, so it +works in Claude Code and other compatible agents. It lives in this repository +under `.claude/skills/local-context-router`; copy that folder into your agent's +skills directory: + +```sh +cp -r .claude/skills/local-context-router ~/.claude/skills/ +``` -The `local-context-router` skill (in `.claude/skills/`) runs the same preflight -inside Claude Code or Codex — copy it into your `.claude/skills/` (or `~/.claude/skills/`). +With the package installed, the agent runs the preflight on any PDF or image you +share, then uses the text for the cheap pages and attaches images only for the +visual ones. -## Requirements +## Requirements and scope -- macOS 10.15+ (on-device OCR uses the Apple Vision framework) -- Python 3.10+ +- macOS 11 or newer. Recognition uses the Apple Vision framework and needs a + normal macOS graphics environment; it will not run inside a headless sandbox + that lacks one. +- Python 3.10 or newer. +- The scope is per-page routing, on-device OCR, and a token estimate. Retrieval + over very large documents is out of scope. ## License -[MIT](LICENSE) © 2026 Siddharth Nashikkar +MIT. See [LICENSE](LICENSE). From 69619494d104e447b8f96c56111a6970fb0610af Mon Sep 17 00:00:00 2001 From: Siddharth Nashikkar Date: Mon, 22 Jun 2026 22:16:18 -0400 Subject: [PATCH 2/5] docs(skill): make the skill tool-agnostic Invoke the installed localctx command (with the bundled script as a fallback by relative path) instead of relying on a Claude-specific path variable, so the skill works across compatible agents. Describe the flat JSON fields. --- .claude/skills/local-context-router/SKILL.md | 66 ++++++++++---------- 1 file changed, 34 insertions(+), 32 deletions(-) diff --git a/.claude/skills/local-context-router/SKILL.md b/.claude/skills/local-context-router/SKILL.md index e7d47b8..8845632 100644 --- a/.claude/skills/local-context-router/SKILL.md +++ b/.claude/skills/local-context-router/SKILL.md @@ -2,55 +2,57 @@ name: local-context-router description: >- Preflight a PDF, scan, or screenshot locally before sending it to the model. - Extracts the embedded text layer for free, OCRs image-only pages on-device - with Apple Vision, and flags only genuinely visual pages (tables, charts, - diagrams) for the vision model — cutting vision-token cost. Use whenever the - user shares a PDF or image to read, summarize, or extract from. + Extracts the embedded text layer, OCRs image-only pages on-device with Apple + Vision, and flags genuinely visual pages (tables, charts, diagrams) for the + vision model, which cuts vision-token cost. Use whenever the user shares a PDF + or image to read, summarize, or extract from. --- # Local Context Router -Multimodal models read a PDF by extracting its text *and* rendering every page -to an image, billing for both. For text-heavy pages that is a 2–10× token tax -for no added signal. This skill spends cheap local compute first and only pays -for vision when a page's meaning actually lives in its pixels. +A multimodal model reads a PDF by extracting its text and rendering every page to +an image, then paying for both. On a page that is mostly prose, the image is +wasted spend. Run this preflight first and send the model only what each page +needs. ## When to use -Use this **before** attaching a PDF, scan, or screenshot to the conversation — -whenever the user wants you to read, summarize, or extract from a document. +Before reading, summarizing, or extracting from a PDF, scan, or screenshot the +user has shared. -## How to run +## Requirements -Run the preflight script on the file. It picks the cheapest faithful source per -page and prints the result as JSON: +The `localcontextrouter` package must be installed (`pip install localcontextrouter`, +macOS). It provides the `localctx` command used below. + +## Run + +Route the document and read the JSON, rendering any visual pages into a folder: ```sh -python "${CLAUDE_SKILL_DIR}/scripts/preflight.py" --json --vision-dir "${CLAUDE_SKILL_DIR}/.cache" +localctx --json --vision-dir ./lcr-pages ``` -- `` is the PDF or image to analyze. -- `--vision-dir` is where rendered images of visual pages are written. +If `localctx` is not on the PATH, run the bundled script by its path inside this +skill folder instead: + +```sh +python scripts/preflight.py --json --vision-dir ./lcr-pages +``` -## How to use the result +## Use the result -The JSON has a `pages` array and a `tokens_saved` total. For each page: +The JSON has `tokens_saved` and a `pages` array. Each page carries `source`, +`text`, `text_tokens`, `image_tokens`, and `image`: -- **`source: "text"`** — use the page's `text` directly. Do **not** attach the - image; it adds cost without information. -- **`source: "ocr"`** — the page was image-only and has been OCR'd on-device; - use the returned `text`. -- **`source: "vision"`** — the page is a table, chart, or diagram whose meaning - is visual. Attach the rendered image at `image` to the conversation so the - vision model can read it. The `text` is a rough fallback only. +- `source: "text"`: use `text` directly; do not attach the image. +- `source: "ocr"`: the page was image-only and has been OCR'd on-device; use `text`. +- `source: "vision"`: the page is a table, chart, or diagram; attach the image at + `image` so the model can read it. The `text` is a rough fallback only. -Assemble the per-page text in order for the parts you can read as text, and -attach images only for the `vision` pages. Mention `tokens_saved` if the user -cares about cost. +Assemble the text and OCR pages in reading order, attach images only for the +vision pages, and mention `tokens_saved` if the user cares about cost. ## Notes -- Everything runs locally and offline; no document leaves the machine during - preflight. -- Requires macOS (on-device OCR uses Apple Vision) and the `localcontextrouter` - package importable by the Python interpreter. +Everything runs locally and offline; the document does not leave the machine. From 0e3f064208b40b66dee6a5d21bd44ff217ea81e7 Mon Sep 17 00:00:00 2001 From: Siddharth Nashikkar Date: Mon, 22 Jun 2026 22:16:18 -0400 Subject: [PATCH 3/5] docs: note the OCR environment requirement OCR needs a normal macOS graphics environment and will not run in a headless sandbox that lacks one. --- CHANGELOG.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 30ecd0b..bf048fb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -21,15 +21,17 @@ First release. estimate, following each provider's documented tokenization. - `route_pdf`, which routes each page to text, OCR, or vision and reports the tokens saved versus sending every page as an image. -- Routed text is normalized — stray control characters (e.g. PDF discretionary - hyphens) are stripped and line endings collapsed — while classification still +- Routed text is normalized: stray control characters (such as PDF discretionary + hyphens) are stripped and line endings collapsed, while classification still runs on the raw text layer. - `localctx` command-line interface. - A `local-context-router` Agent Skill for Claude Code and Codex. ### Notes -- macOS only; OCR uses the Apple Vision framework. +- macOS only; OCR uses the Apple Vision framework and needs a normal macOS + graphics environment, so it will not run inside a headless sandbox that lacks + one. - The macOS wheel is a `universal2` platform wheel that bundles the `lcr-ocr` binary, so OCR works out of the box. `LCR_OCR_BIN` overrides the bundled copy. From 056c348f4f2e9b61cd8acabd53f872251a94e703 Mon Sep 17 00:00:00 2001 From: Siddharth Nashikkar Date: Mon, 22 Jun 2026 22:16:18 -0400 Subject: [PATCH 4/5] style: remove em dashes from prose and metadata --- ocr/README.md | 8 ++++---- ocr/Sources/LCROCR/ImageLoading.swift | 2 +- pyproject.toml | 2 +- src/localcontextrouter/__init__.py | 2 +- src/localcontextrouter/classify.py | 2 +- src/localcontextrouter/cli.py | 2 +- src/localcontextrouter/detect.py | 2 +- src/localcontextrouter/models.py | 8 ++++---- src/localcontextrouter/ocr.py | 2 +- src/localcontextrouter/router.py | 2 +- src/localcontextrouter/text.py | 2 +- 11 files changed, 17 insertions(+), 17 deletions(-) diff --git a/ocr/README.md b/ocr/README.md index 6a5804f..ad1312b 100644 --- a/ocr/README.md +++ b/ocr/README.md @@ -1,7 +1,7 @@ # lcr-ocr On-device OCR binary used by LocalContextRouter. Wraps the Apple Vision -framework — fully offline, no network, no entitlements, and no Screen Recording +framework, fully offline, no network, no entitlements, and no Screen Recording permission (it reads image files you pass in, it does not capture the screen). ## Build @@ -59,9 +59,9 @@ Follows the `sysexits.h` convention so callers can branch on failure mode: ## Layout -- `Sources/LCROCR` — reusable library: image loading, the Vision engine, and the result models. -- `Sources/lcr-ocr` — thin CLI over the library. -- `Tests/LCROCRTests` — engine tests that render text in-process (no binary fixtures). +- `Sources/LCROCR`, reusable library: image loading, the Vision engine, and the result models. +- `Sources/lcr-ocr`, thin CLI over the library. +- `Tests/LCROCRTests`, engine tests that render text in-process (no binary fixtures). ## Requirements diff --git a/ocr/Sources/LCROCR/ImageLoading.swift b/ocr/Sources/LCROCR/ImageLoading.swift index c8f0cee..df8ab60 100644 --- a/ocr/Sources/LCROCR/ImageLoading.swift +++ b/ocr/Sources/LCROCR/ImageLoading.swift @@ -20,7 +20,7 @@ public enum ImageLoadError: Error, CustomStringConvertible { /// Loads bitmaps from disk into `CGImage` using ImageIO. /// /// ImageIO is used instead of AppKit so the binary runs headless (no window -/// server) — important for CI and for invocation from a CLI. +/// server), important for CI and for invocation from a CLI. public enum ImageLoader { /// Decode the first image in the file at `path`. public static func loadCGImage(atPath path: String) throws -> CGImage { diff --git a/pyproject.toml b/pyproject.toml index 7aaf2c0..61b12ac 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "hatchling.build" [project] name = "localcontextrouter" -description = "Preflight router that picks the cheapest faithful path — text, OCR, or vision — before a document reaches a multimodal LLM." +description = "Preflight router that picks the cheapest faithful path (text, OCR, or vision) before a document reaches a multimodal LLM." readme = "README.md" requires-python = ">=3.10" license = "MIT" diff --git a/src/localcontextrouter/__init__.py b/src/localcontextrouter/__init__.py index 0e1d024..091284e 100644 --- a/src/localcontextrouter/__init__.py +++ b/src/localcontextrouter/__init__.py @@ -1,4 +1,4 @@ -"""LocalContextRouter — cheapest faithful path for documents bound for a multimodal LLM.""" +"""LocalContextRouter, cheapest faithful path for documents bound for a multimodal LLM.""" from .classify import classify_text, compute_signals from .detect import is_vision_worthy diff --git a/src/localcontextrouter/classify.py b/src/localcontextrouter/classify.py index b66be3a..63f3cfc 100644 --- a/src/localcontextrouter/classify.py +++ b/src/localcontextrouter/classify.py @@ -5,7 +5,7 @@ absent (:class:`PageClass.SCANNED`), or present but broken (:class:`PageClass.GARBLED`). The two latter cases route to OCR downstream. -Thresholds are deliberately conservative — when in doubt the page is sent to +Thresholds are deliberately conservative, when in doubt the page is sent to OCR, since a wrong "digital" verdict silently feeds garbage to the model. """ diff --git a/src/localcontextrouter/cli.py b/src/localcontextrouter/cli.py index 3744142..6e0d6d0 100644 --- a/src/localcontextrouter/cli.py +++ b/src/localcontextrouter/cli.py @@ -1,4 +1,4 @@ -"""``localctx`` — route a document and report the cheapest faithful source per page.""" +"""``localctx``, route a document and report the cheapest faithful source per page.""" from __future__ import annotations diff --git a/src/localcontextrouter/detect.py b/src/localcontextrouter/detect.py index 76014c9..85ac8f4 100644 --- a/src/localcontextrouter/detect.py +++ b/src/localcontextrouter/detect.py @@ -3,7 +3,7 @@ Some pages carry a perfectly good text layer yet still lose their meaning when flattened to text: tables, charts, diagrams, and figure-heavy layouts. Those are worth the vision-token cost. This module decides that from cheap layout features -(:class:`~.models.PageFeatures`) — no rendering and no ML. +(:class:`~.models.PageFeatures`), no rendering and no ML. """ from __future__ import annotations diff --git a/src/localcontextrouter/models.py b/src/localcontextrouter/models.py index daa9dd2..718383a 100644 --- a/src/localcontextrouter/models.py +++ b/src/localcontextrouter/models.py @@ -10,13 +10,13 @@ class PageClass(str, Enum): """How a PDF page should be sourced before it reaches an LLM.""" DIGITAL = "digital" - """A usable embedded text layer is present — extract the text directly.""" + """A usable embedded text layer is present, extract the text directly.""" SCANNED = "scanned" - """Little or no text layer — the page is image-only and needs OCR.""" + """Little or no text layer, the page is image-only and needs OCR.""" GARBLED = "garbled" - """A text layer exists but is broken (unmapped glyphs) — OCR is safer.""" + """A text layer exists but is broken (unmapped glyphs), OCR is safer.""" @dataclass(frozen=True) @@ -74,7 +74,7 @@ class Source(str, Enum): """Produced by on-device OCR after rendering the page.""" VISION = "vision" - """Send the page to a vision model — its meaning lives in the visuals.""" + """Send the page to a vision model, its meaning lives in the visuals.""" @dataclass(frozen=True) diff --git a/src/localcontextrouter/ocr.py b/src/localcontextrouter/ocr.py index 45ffdc6..445d21d 100644 --- a/src/localcontextrouter/ocr.py +++ b/src/localcontextrouter/ocr.py @@ -120,7 +120,7 @@ def ocr_png_text( ) -> str: """OCR a PNG given as bytes; return the recognized lines joined by newlines. - Lines below ``min_confidence`` are dropped — useful for filtering the + Lines below ``min_confidence`` are dropped, useful for filtering the low-confidence glyphs that icons and logos tend to produce. """ with tempfile.NamedTemporaryFile(suffix=".png") as tmp: diff --git a/src/localcontextrouter/router.py b/src/localcontextrouter/router.py index 1869bdf..4761eda 100644 --- a/src/localcontextrouter/router.py +++ b/src/localcontextrouter/router.py @@ -1,7 +1,7 @@ """Route each PDF page to the cheapest faithful source: text, OCR, or vision. - Digital pages keep their extracted text, unless their meaning lives in visuals - (tables, charts, diagrams) — those go to a vision model. + (tables, charts, diagrams), those go to a vision model. - Scanned or garbled pages are rendered and sent to OCR. Every page carries a token estimate so the savings of avoiding the image path diff --git a/src/localcontextrouter/text.py b/src/localcontextrouter/text.py index ebf8a95..fc8295c 100644 --- a/src/localcontextrouter/text.py +++ b/src/localcontextrouter/text.py @@ -1,6 +1,6 @@ """Text normalization for routed output. -Applied to the text a page contributes to the model — not before +Applied to the text a page contributes to the model, not before classification, which relies on seeing control and replacement characters to spot a broken text layer. """ From 6324c3242ba79e23b2c54f3c724e1231deeb845d Mon Sep 17 00:00:00 2001 From: Siddharth Nashikkar Date: Mon, 22 Jun 2026 22:16:18 -0400 Subject: [PATCH 5/5] chore: generalize the local-settings ignore entry --- .gitignore | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.gitignore b/.gitignore index 8e80295..f35ccae 100644 --- a/.gitignore +++ b/.gitignore @@ -38,5 +38,5 @@ src/localcontextrouter/_bin/ /tmp/ *.log -# Claude Code local (user-specific) settings +# Local agent settings (user-specific) .claude/settings.local.json