Mite is a Windows-first OCR overlay for reading Japanese text in games, visual novels, manga readers, videos, and other visual content. It is aimed at English speakers learning Japanese who want a Yomichan-style point-and-define workflow outside the browser.
Mite captures a target window, runs PP-OCRv5 detector and recognizer ONNX models
through ONNX Runtime, segments recognized Japanese with Lindera and JMdict, then
draws a transparent click-through Win32 overlay with hover definitions and
furigana. The primary command is watch; the rest of the CLI supports setup,
diagnostics, evals, and developer tooling.
Note
Mite is focused on native Windows apps, games, and visual media. For ordinary webpages, browser-native tools such as Yomitan, Yomichan, or 10ten are usually the better fit.
Most people should use the Mite desktop app. Download the installer from the releases page and run it: the app walks you through a one-time setup (it installs the engine and downloads the recognition models, and on NVIDIA GPUs it guides you through installing NVIDIA's GPU runtime for the fast path), then lets you pick a window from a live preview grid and start reading. No terminal required, and it keeps the engine up to date for you. See app/README.md for what the app manages.
The rest of this document covers building from source, the path for developers and contributors.
Measured against several hundred hand-checked 4K game screenshots, about 94 of every 100 text lines are read perfectly, and roughly 1 character in 80 is wrong or missing. Mistakes concentrate in tiny, faint, or heavily stylized text; ordinary dialogue and menus are read very reliably. When a misread does happen, it usually shows up as an "unknown" word rather than a wrong definition. What the mistakes look like in practice, and how much to trust a popup, is covered in docs/accuracy.md.
On the reference NVIDIA setup, a fresh full-screen read of a 4K game frame takes about a fifth of a second, so definitions feel immediate.
Mite is local-first and optimized for Windows/NVIDIA systems. The reference path uses a TensorRT -> CUDA -> CPU fallback chain and targets low-latency 4K OCR. The lookup core and eval tooling are designed to remain testable without a live game window.
Important details:
- Rust 2024 project using
cargo. - Windows Graphics Capture is the preferred capture backend for games.
- Default OCR assets are PP-OCRv5 mobile detector/recognizer ONNX files.
- Runtime model, dictionary, frequency, GPU DLL, cache, and eval data files are not committed to the source repository.
- Real-image eval data lives in the private
eval/submodule.
Run the consolidated developer setup script from PowerShell:
.\scripts\bootstrap-dev.ps1The script checks for Git, Rust, and Cargo; downloads OCR models, JMdict, and
JPDB frequency data; installs local Git hooks; creates mite.toml when missing;
builds Mite; and runs doctor.
It does not install the NVIDIA GPU runtime. Mite never downloads, hosts, bundles,
or installs NVIDIA binaries, and that applies to this developer tooling too. For
GPU acceleration, install the runtime yourself from NVIDIA (the CUDA Toolkit,
cuDNN, and TensorRT 10.x) or from the pinned pip wheels, make it discoverable on
PATH, and confirm the tier with cargo run -- doctor. See
docs/local-windows.md.
Useful setup modes:
.\scripts\bootstrap-dev.ps1 -ModelsOnly
.\scripts\bootstrap-dev.ps1 -ModelsOnly -IncludeServerModels
.\scripts\bootstrap-dev.ps1 -HooksOnly
.\scripts\bootstrap-dev.ps1 -EvalDataOnlyThen find a target window and start the overlay:
cargo run -- list-windows
cargo run -- watch
cargo run -- watch --title "Target Game" --auto
cargo run -- watch --hud
cargo run -- watch --metrics-interval-secs 5Use --auto for games that consume the Shift key, and pin the target with
--title, --window-id, or --pid.
- Window OCR overlay for Japanese text in native Windows apps and games, with
POS-coloured per-word underlines (
overlay.word_underlines, on by default) and optional always-on furigana (overlay.furigana, off by default). Turn underlines off for an invisible overlay that only shows a popup on hover. - Hover popups with dictionary forms, a plain-language grammar pill, glosses, inflection notes, and furigana.
- Click-through layered Win32 overlay that keeps game input uninterrupted.
- TensorRT/CUDA acceleration with CPU fallback.
- Temporal smoothing so stable text regions can be reused instead of re-OCR'd every frame.
- Manual real-image eval workflow for OCR, lookup, bounds, and popup metadata.
- Browser-based eval label UI for private eval corpora.
cargo run -- init-config [--force]
cargo run -- doctor
cargo run -- list-windows [--json] [--thumbnails] [--thumbnail-max-width PX]
cargo run -- watch [--title T | --window-id N | --pid P] [--auto] [--hud]
cargo run -- eval --image path\to\underlying.png --labels path\to\eval.json
cargo run -- eval-corpus --root eval --out target\eval\corpus-summary.json --allow-failures
cargo run --bin eval-ui
cargo run -- clean-images [--dry-run]- Local Windows usage: setup, running the overlay, capture troubleshooting.
- Character accuracy: how accurate the OCR is, why, and where the limits are. Start here if you want to know whether to trust a popup.
- Architecture: runtime boundaries, GPU pipeline, and latency.
- Model setup and provenance: the OCR models and their trade-offs.
- Performance evidence guide: how latency claims are measured, with current reference numbers.
- Eval metadata policy: which dictionary interpretation mite teaches when several are valid.
- Pure-GPU exploration notes: exploratory, not scheduled.
- Third-party notices and model manifest
- Agent guidance
Core checks:
cargo fmt --check
cargo test
cargo clippy --all-targets -- -D warningsThe local Git hook runs .\scripts\precommit.ps1. Install or refresh it with:
.\scripts\bootstrap-dev.ps1 -HooksOnlyRun private real-image evals when OCR, dictionary, detection, recognition, eval, or popup metadata behavior changes:
.\scripts\bootstrap-dev.ps1 -EvalDataOnly
cargo run -- eval-corpus --root eval --out target\eval\corpus-summary.json --allow-failures
.\scripts\precommit.ps1 -IncludeEvalThe private eval submodule contains corpus-specific annotation instructions and
the eval annotation skill under eval\.agents\.
The following paths are local artifacts and intentionally ignored:
models\cache\target\mite.toml.gpu-runtime\.venv-models\.env
OCR models, JMdict, JPDB frequency data, NVIDIA runtime DLLs, ONNX Runtime components, and eval captures remain under their own upstream terms. See THIRD_PARTY_NOTICES.md and model-manifest.json before redistributing any runtime assets or generated bundles.
Mite is licensed under the GNU Affero General Public License v3.0. See LICENSE.