Skip to content

feat: vision-worthy detection and token-savings estimate#6

Merged
sid732 merged 6 commits into
mainfrom
feat/vision-routing
Jun 19, 2026
Merged

feat: vision-worthy detection and token-savings estimate#6
sid732 merged 6 commits into
mainfrom
feat/vision-routing

Conversation

@sid732

@sid732 sid732 commented Jun 19, 2026

Copy link
Copy Markdown
Owner

Completes the routing brain. Even pages with a clean text layer can lose their meaning when flattened — tables, charts, diagrams — so those go to a vision model. Every page now carries a token estimate, making the cost avoided explicit.

What

  • Vision-worthy detection (detect.py): is_vision_worthy(features) routes a page to vision when raster images cover >= 40% of it, vector paths cover >= 30%, or there are >= 25 vector paths (ruled tables, charts, diagrams).
  • Layout features (Pdf.page_features): counts raster image and vector path objects and their coverage via pypdfium2 — no rendering, no ML, and no AGPL PyMuPDF dependency.
  • Token estimator (tokens.py): claude_image_tokens (28px patches, 1568/4784 caps), openai_image_tokens (tile counting), estimate_text_tokens.
  • Router: adds the Source.VISION branch and attaches a TokenEstimate to each page; RouteResult.tokens_saved totals the tokens avoided versus sending every page as an image.

Why pypdfium2 over PyMuPDF

PyMuPDF (get_drawings/find_tables) is AGPL, which would be imposed on this MIT package. pypdfium2 page objects give the same image/vector signals under a permissive license.

Tests

  • Detector: synthetic features for each rule, plus a real table PDF (>= 25 paths) versus prose.
  • Tokens: formulas asserted against documented provider examples (1296, 1521, 765, 1105, 3888 tokens).
  • Router: a table page routes to vision; a text page reports positive savings.

On a mixed 3-page document (prose / table / scan) the router saves ~3077 tokens versus sending every page as an image.

Verified locally: ruff, ruff format, mypy (strict), pytest (37) all pass.

sid732 added 6 commits June 19, 2026 13:03
Add PageFeatures (image/path counts and coverage), TokenEstimate with a
saved property, the Source.VISION case, the tokens field on PageRoute, and
RouteResult.tokens_saved.
Add Pdf.page_features, which counts raster image and vector path objects and
their page coverage via pypdfium2 — the signals that flag charts, tables, and
diagrams without rendering.
Add is_vision_worthy: route a page to a vision model when images cover much of
it, vectors cover a large area, or many vector paths suggest a table or chart.
Add token estimators following each provider's documented tokenization: Claude
28px patches with resolution caps, OpenAI tile counting, and a text estimate.
route_pdf now sends visually-dominant pages to vision and attaches a token
estimate to every page, so RouteResult.tokens_saved shows the cost avoided.
Test the detector on synthetic and real page features, the token formulas
against documented provider examples, and routing of a table page to vision.
@sid732 sid732 merged commit a8e4b6e into main Jun 19, 2026
6 checks passed
@sid732 sid732 deleted the feat/vision-routing branch June 23, 2026 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant