Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 34 additions & 32 deletions .claude/skills/local-context-router/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,55 +2,57 @@
name: local-context-router
description: >-
Preflight a PDF, scan, or screenshot locally before sending it to the model.
Extracts the embedded text layer for free, OCRs image-only pages on-device
with Apple Vision, and flags only genuinely visual pages (tables, charts,
diagrams) for the vision model — cutting vision-token cost. Use whenever the
user shares a PDF or image to read, summarize, or extract from.
Extracts the embedded text layer, OCRs image-only pages on-device with Apple
Vision, and flags genuinely visual pages (tables, charts, diagrams) for the
vision model, which cuts vision-token cost. Use whenever the user shares a PDF
or image to read, summarize, or extract from.
---

# Local Context Router

Multimodal models read a PDF by extracting its text *and* rendering every page
to an image, billing for both. For text-heavy pages that is a 2–10× token tax
for no added signal. This skill spends cheap local compute first and only pays
for vision when a page's meaning actually lives in its pixels.
A multimodal model reads a PDF by extracting its text and rendering every page to
an image, then paying for both. On a page that is mostly prose, the image is
wasted spend. Run this preflight first and send the model only what each page
needs.

## When to use

Use this **before** attaching a PDF, scan, or screenshot to the conversation —
whenever the user wants you to read, summarize, or extract from a document.
Before reading, summarizing, or extracting from a PDF, scan, or screenshot the
user has shared.

## How to run
## Requirements

Run the preflight script on the file. It picks the cheapest faithful source per
page and prints the result as JSON:
The `localcontextrouter` package must be installed (`pip install localcontextrouter`,
macOS). It provides the `localctx` command used below.

## Run

Route the document and read the JSON, rendering any visual pages into a folder:

```sh
python "${CLAUDE_SKILL_DIR}/scripts/preflight.py" <path-to-document> --json --vision-dir "${CLAUDE_SKILL_DIR}/.cache"
localctx <path-to-document> --json --vision-dir ./lcr-pages
```

- `<path-to-document>` is the PDF or image to analyze.
- `--vision-dir` is where rendered images of visual pages are written.
If `localctx` is not on the PATH, run the bundled script by its path inside this
skill folder instead:

```sh
python scripts/preflight.py <path-to-document> --json --vision-dir ./lcr-pages
```

## How to use the result
## Use the result

The JSON has a `pages` array and a `tokens_saved` total. For each page:
The JSON has `tokens_saved` and a `pages` array. Each page carries `source`,
`text`, `text_tokens`, `image_tokens`, and `image`:

- **`source: "text"`** — use the page's `text` directly. Do **not** attach the
image; it adds cost without information.
- **`source: "ocr"`** — the page was image-only and has been OCR'd on-device;
use the returned `text`.
- **`source: "vision"`** — the page is a table, chart, or diagram whose meaning
is visual. Attach the rendered image at `image` to the conversation so the
vision model can read it. The `text` is a rough fallback only.
- `source: "text"`: use `text` directly; do not attach the image.
- `source: "ocr"`: the page was image-only and has been OCR'd on-device; use `text`.
- `source: "vision"`: the page is a table, chart, or diagram; attach the image at
`image` so the model can read it. The `text` is a rough fallback only.

Assemble the per-page text in order for the parts you can read as text, and
attach images only for the `vision` pages. Mention `tokens_saved` if the user
cares about cost.
Assemble the text and OCR pages in reading order, attach images only for the
vision pages, and mention `tokens_saved` if the user cares about cost.

## Notes

- Everything runs locally and offline; no document leaves the machine during
preflight.
- Requires macOS (on-device OCR uses Apple Vision) and the `localcontextrouter`
package importable by the Python interpreter.
Everything runs locally and offline; the document does not leave the machine.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,5 +38,5 @@ src/localcontextrouter/_bin/
/tmp/
*.log

# Claude Code local (user-specific) settings
# Local agent settings (user-specific)
.claude/settings.local.json
8 changes: 5 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,17 @@ First release.
estimate, following each provider's documented tokenization.
- `route_pdf`, which routes each page to text, OCR, or vision and reports the
tokens saved versus sending every page as an image.
- Routed text is normalizedstray control characters (e.g. PDF discretionary
hyphens) are stripped and line endings collapsed while classification still
- Routed text is normalized: stray control characters (such as PDF discretionary
hyphens) are stripped and line endings collapsed, while classification still
runs on the raw text layer.
- `localctx` command-line interface.
- A `local-context-router` Agent Skill for Claude Code and Codex.

### Notes

- macOS only; OCR uses the Apple Vision framework.
- macOS only; OCR uses the Apple Vision framework and needs a normal macOS
graphics environment, so it will not run inside a headless sandbox that lacks
one.
- The macOS wheel is a `universal2` platform wheel that bundles the `lcr-ocr`
binary, so OCR works out of the box. `LCR_OCR_BIN` overrides the bundled copy.

Expand Down
114 changes: 74 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,77 +1,111 @@
# LocalContextRouter

> Stop paying the vision-token tax. Decide locally — text, OCR, or vision — *before* a document ever reaches a multimodal LLM.
Decide locally how each page of a document should reach a multimodal model:
as extracted text, on-device OCR, or a rendered image. That keeps you from
paying for vision tokens on pages that are only text.

LocalContextRouter is a **preflight layer** for document-heavy LLM workflows. Given a
PDF or image, it inspects the content on your machine and decides the cheapest path
that still preserves accuracy:
A multimodal model reads a PDF by pulling its text *and* rendering every page to
an image, then billing for both. On a text page that image runs roughly
1,300 to 4,800 tokens while the same page as plain text is 400 to 800. For a
text-dominant document that is several times the cost for nothing extra.
LocalContextRouter does the cheap work on your machine first and tells you what
each page actually needs.

- **Text-layer PDF** → extract text locally (near-free).
- **Scanned / image-only page** → OCR on-device with Apple Vision.
- **Chart / table / diagram / layout-heavy page** → keep the page as an image for the vision model, where the pixels actually carry meaning.
It does not call a model. It returns a per-page decision and the text to send;
your application still makes the call.

It never calls an LLM itself. It prepares the cheapest faithful context and hands you
back a routing decision plus a token-savings estimate. Your application still owns the
model call.
## How it decides

## Why
For each page:

Multimodal models read a PDF by extracting its text *and* rendering every page to an
image, then billing for both. A text-heavy page sent as an image can cost
**1,300–4,800 tokens**; the same page as extracted text costs **400–800**. For
text-dominant documents that is a 2–10× tax for zero added signal.
- A usable text layer that is mostly prose: use the extracted text.
- A text layer dominated by a table, chart, or diagram: send the page as an
image, where the layout carries the meaning.
- No usable text, such as a scan or a photo: recognize it on-device with
Apple's Vision framework.

LocalContextRouter spends cheap local compute to avoid that tax — and only escalates
to vision when the page genuinely needs it.
The result also reports how many tokens you saved against sending every page as
an image.

## Install

```sh
pip install localcontextrouter
```

The macOS wheel bundles the on-device OCR binary (`lcr-ocr`, a universal2 build),
so OCR works out of the box — no extra setup. To override it (e.g. a locally built
binary), set `LCR_OCR_BIN` to its path.
macOS only. The wheel bundles a universal (Apple Silicon and Intel) OCR binary,
so text recognition works with no extra setup.

## Use
## Command line

There is no server and no background process — everything runs on demand and exits.
```sh
localctx invoice.pdf
localctx invoice.pdf --json
localctx scan.png
```

### Command line
`localctx invoice.pdf` prints each page, the source chosen for it, and the
tokens saved:

```sh
localctx report.pdf # human summary + tokens saved
localctx report.pdf --json # machine-readable
localctx report.pdf --vision-dir ./out # render visual pages to ./out
```
Document: invoice.pdf (3 pages)
Tokens saved vs sending every page as an image: 3085

Page 1 [text]
ACME Corp, Invoice #4471 ...

Page 2 [vision]
Quarterly results by segment ...

Page 3 [ocr]
SCANNED RECEIPT TOTAL 42.00
```

### Library
Add `--vision-dir DIR` to render the pages that should go to the model as images
into `DIR`; their paths are then listed in the output and the JSON.

## In code

```python
from localcontextrouter import route_pdf, Source

result = route_pdf("report.pdf")
result = route_pdf("invoice.pdf")
for page in result.pages:
if page.source is Source.VISION:
... # send the rendered page image to the model
send_image(page.index) # the page's meaning is visual
else:
... # use page.text (extracted or OCR'd)
send_text(page.text) # extracted or recognized text

print(result.text) # all text-routable pages joined
print(result.tokens_saved) # tokens avoided vs sending every page as an image
print(result.tokens_saved)
```

### Agent Skill
Every page also carries an estimate of its cost both ways, as
`page.tokens.text_tokens` and `page.tokens.image_tokens`.

## As an agent skill

`local-context-router` is an Agent Skill in the open `SKILL.md` format, so it
works in Claude Code and other compatible agents. It lives in this repository
under `.claude/skills/local-context-router`; copy that folder into your agent's
skills directory:

```sh
cp -r .claude/skills/local-context-router ~/.claude/skills/
```

The `local-context-router` skill (in `.claude/skills/`) runs the same preflight
inside Claude Code or Codex — copy it into your `.claude/skills/` (or `~/.claude/skills/`).
With the package installed, the agent runs the preflight on any PDF or image you
share, then uses the text for the cheap pages and attaches images only for the
visual ones.

## Requirements
## Requirements and scope

- macOS 10.15+ (on-device OCR uses the Apple Vision framework)
- Python 3.10+
- macOS 11 or newer. Recognition uses the Apple Vision framework and needs a
normal macOS graphics environment; it will not run inside a headless sandbox
that lacks one.
- Python 3.10 or newer.
- The scope is per-page routing, on-device OCR, and a token estimate. Retrieval
over very large documents is out of scope.

## License

[MIT](LICENSE) © 2026 Siddharth Nashikkar
MIT. See [LICENSE](LICENSE).
8 changes: 4 additions & 4 deletions ocr/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# lcr-ocr

On-device OCR binary used by LocalContextRouter. Wraps the Apple Vision
framework fully offline, no network, no entitlements, and no Screen Recording
framework, fully offline, no network, no entitlements, and no Screen Recording
permission (it reads image files you pass in, it does not capture the screen).

## Build
Expand Down Expand Up @@ -59,9 +59,9 @@ Follows the `sysexits.h` convention so callers can branch on failure mode:

## Layout

- `Sources/LCROCR` reusable library: image loading, the Vision engine, and the result models.
- `Sources/lcr-ocr` thin CLI over the library.
- `Tests/LCROCRTests` engine tests that render text in-process (no binary fixtures).
- `Sources/LCROCR`, reusable library: image loading, the Vision engine, and the result models.
- `Sources/lcr-ocr`, thin CLI over the library.
- `Tests/LCROCRTests`, engine tests that render text in-process (no binary fixtures).

## Requirements

Expand Down
2 changes: 1 addition & 1 deletion ocr/Sources/LCROCR/ImageLoading.swift
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ public enum ImageLoadError: Error, CustomStringConvertible {
/// Loads bitmaps from disk into `CGImage` using ImageIO.
///
/// ImageIO is used instead of AppKit so the binary runs headless (no window
/// server) important for CI and for invocation from a CLI.
/// server), important for CI and for invocation from a CLI.
public enum ImageLoader {
/// Decode the first image in the file at `path`.
public static func loadCGImage(atPath path: String) throws -> CGImage {
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "localcontextrouter"
description = "Preflight router that picks the cheapest faithful path text, OCR, or vision before a document reaches a multimodal LLM."
description = "Preflight router that picks the cheapest faithful path (text, OCR, or vision) before a document reaches a multimodal LLM."
readme = "README.md"
requires-python = ">=3.10"
license = "MIT"
Expand Down
2 changes: 1 addition & 1 deletion src/localcontextrouter/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""LocalContextRouter cheapest faithful path for documents bound for a multimodal LLM."""
"""LocalContextRouter, cheapest faithful path for documents bound for a multimodal LLM."""

from .classify import classify_text, compute_signals
from .detect import is_vision_worthy
Expand Down
2 changes: 1 addition & 1 deletion src/localcontextrouter/classify.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
absent (:class:`PageClass.SCANNED`), or present but broken
(:class:`PageClass.GARBLED`). The two latter cases route to OCR downstream.

Thresholds are deliberately conservative when in doubt the page is sent to
Thresholds are deliberately conservative, when in doubt the page is sent to
OCR, since a wrong "digital" verdict silently feeds garbage to the model.
"""

Expand Down
2 changes: 1 addition & 1 deletion src/localcontextrouter/cli.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""``localctx`` route a document and report the cheapest faithful source per page."""
"""``localctx``, route a document and report the cheapest faithful source per page."""

from __future__ import annotations

Expand Down
2 changes: 1 addition & 1 deletion src/localcontextrouter/detect.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Some pages carry a perfectly good text layer yet still lose their meaning when
flattened to text: tables, charts, diagrams, and figure-heavy layouts. Those are
worth the vision-token cost. This module decides that from cheap layout features
(:class:`~.models.PageFeatures`) no rendering and no ML.
(:class:`~.models.PageFeatures`), no rendering and no ML.
"""

from __future__ import annotations
Expand Down
8 changes: 4 additions & 4 deletions src/localcontextrouter/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@ class PageClass(str, Enum):
"""How a PDF page should be sourced before it reaches an LLM."""

DIGITAL = "digital"
"""A usable embedded text layer is present extract the text directly."""
"""A usable embedded text layer is present, extract the text directly."""

SCANNED = "scanned"
"""Little or no text layer the page is image-only and needs OCR."""
"""Little or no text layer, the page is image-only and needs OCR."""

GARBLED = "garbled"
"""A text layer exists but is broken (unmapped glyphs) OCR is safer."""
"""A text layer exists but is broken (unmapped glyphs), OCR is safer."""


@dataclass(frozen=True)
Expand Down Expand Up @@ -74,7 +74,7 @@ class Source(str, Enum):
"""Produced by on-device OCR after rendering the page."""

VISION = "vision"
"""Send the page to a vision model its meaning lives in the visuals."""
"""Send the page to a vision model, its meaning lives in the visuals."""


@dataclass(frozen=True)
Expand Down
2 changes: 1 addition & 1 deletion src/localcontextrouter/ocr.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ def ocr_png_text(
) -> str:
"""OCR a PNG given as bytes; return the recognized lines joined by newlines.

Lines below ``min_confidence`` are dropped useful for filtering the
Lines below ``min_confidence`` are dropped, useful for filtering the
low-confidence glyphs that icons and logos tend to produce.
"""
with tempfile.NamedTemporaryFile(suffix=".png") as tmp:
Expand Down
2 changes: 1 addition & 1 deletion src/localcontextrouter/router.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Route each PDF page to the cheapest faithful source: text, OCR, or vision.

- Digital pages keep their extracted text, unless their meaning lives in visuals
(tables, charts, diagrams) those go to a vision model.
(tables, charts, diagrams), those go to a vision model.
- Scanned or garbled pages are rendered and sent to OCR.

Every page carries a token estimate so the savings of avoiding the image path
Expand Down
2 changes: 1 addition & 1 deletion src/localcontextrouter/text.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""Text normalization for routed output.

Applied to the text a page contributes to the model not before
Applied to the text a page contributes to the model, not before
classification, which relies on seeing control and replacement characters to
spot a broken text layer.
"""
Expand Down
Loading