Mite

Mite is a Windows-first OCR overlay for reading Japanese text in games, visual novels, manga readers, videos, and other visual content. It is aimed at English speakers learning Japanese who want a Yomichan-style point-and-define workflow outside the browser.

Mite captures a target window, runs PP-OCRv5 detector and recognizer ONNX models through ONNX Runtime, segments recognized Japanese with Lindera and JMdict, then draws a transparent click-through Win32 overlay with hover definitions and furigana. The primary command is watch; the rest of the CLI supports setup, diagnostics, evals, and developer tooling.

Note

Mite is focused on native Windows apps, games, and visual media. For ordinary webpages, browser-native tools such as Yomitan, Yomichan, or 10ten are usually the better fit.

Install

Most people should use the Mite desktop app. Download the installer from the releases page and run it: the app walks you through a one-time setup (it installs the engine and downloads the recognition models, and on NVIDIA GPUs it guides you through installing NVIDIA's GPU runtime for the fast path), then lets you pick a window from a live preview grid and start reading. No terminal required, and it keeps the engine up to date for you. See app/README.md for what the app manages.

The rest of this document covers building from source, the path for developers and contributors.

How well does it read?

Measured against several hundred hand-checked 4K game screenshots, about 94 of every 100 text lines are read perfectly, and roughly 1 character in 80 is wrong or missing. Mistakes concentrate in tiny, faint, or heavily stylized text; ordinary dialogue and menus are read very reliably. When a misread does happen, it usually shows up as an "unknown" word rather than a wrong definition. What the mistakes look like in practice, and how much to trust a popup, is covered in docs/accuracy.md.

On the reference NVIDIA setup, a fresh full-screen read of a 4K game frame takes about a fifth of a second, so definitions feel immediate.

Status

Mite is local-first and optimized for Windows/NVIDIA systems. The reference path uses a TensorRT -> CUDA -> CPU fallback chain and targets low-latency 4K OCR. The lookup core and eval tooling are designed to remain testable without a live game window.

Important details:

Rust 2024 project using cargo.
Windows Graphics Capture is the preferred capture backend for games.
Default OCR assets are PP-OCRv5 mobile detector/recognizer ONNX files.
Runtime model, dictionary, frequency, GPU DLL, cache, and eval data files are not committed to the source repository.
Real-image eval data lives in the private eval/ submodule.

Quick start (build from source)

Run the consolidated developer setup script from PowerShell:

.\scripts\bootstrap-dev.ps1

The script checks for Git, Rust, and Cargo; downloads OCR models, JMdict, and JPDB frequency data; installs local Git hooks; creates mite.toml when missing; builds Mite; and runs doctor.

It does not install the NVIDIA GPU runtime. Mite never downloads, hosts, bundles, or installs NVIDIA binaries, and that applies to this developer tooling too. For GPU acceleration, install the runtime yourself from NVIDIA (the CUDA Toolkit, cuDNN, and TensorRT 10.x) or from the pinned pip wheels, make it discoverable on PATH, and confirm the tier with cargo run -- doctor. See docs/local-windows.md.

Useful setup modes:

.\scripts\bootstrap-dev.ps1 -ModelsOnly
.\scripts\bootstrap-dev.ps1 -ModelsOnly -IncludeServerModels
.\scripts\bootstrap-dev.ps1 -HooksOnly
.\scripts\bootstrap-dev.ps1 -EvalDataOnly

Then find a target window and start the overlay:

cargo run -- list-windows
cargo run -- watch
cargo run -- watch --title "Target Game" --auto
cargo run -- watch --hud
cargo run -- watch --metrics-interval-secs 5

Use --auto for games that consume the Shift key, and pin the target with --title, --window-id, or --pid.

Features

Window OCR overlay for Japanese text in native Windows apps and games, with POS-coloured per-word underlines (overlay.word_underlines, on by default) and optional always-on furigana (overlay.furigana, off by default). Turn underlines off for an invisible overlay that only shows a popup on hover.
Hover popups with dictionary forms, a plain-language grammar pill, glosses, inflection notes, and furigana.
Click-through layered Win32 overlay that keeps game input uninterrupted.
TensorRT/CUDA acceleration with CPU fallback.
Temporal smoothing so stable text regions can be reused instead of re-OCR'd every frame.
Manual real-image eval workflow for OCR, lookup, bounds, and popup metadata.
Browser-based eval label UI for private eval corpora.

Commands

cargo run -- init-config [--force]
cargo run -- doctor
cargo run -- list-windows [--json] [--thumbnails] [--thumbnail-max-width PX]
cargo run -- watch [--title T | --window-id N | --pid P] [--auto] [--hud]
cargo run -- eval --image path\to\underlying.png --labels path\to\eval.json
cargo run -- eval-corpus --root eval --out target\eval\corpus-summary.json --allow-failures
cargo run --bin eval-ui
cargo run -- clean-images [--dry-run]

Documentation

Local Windows usage: setup, running the overlay, capture troubleshooting.
Character accuracy: how accurate the OCR is, why, and where the limits are. Start here if you want to know whether to trust a popup.
Architecture: runtime boundaries, GPU pipeline, and latency.
Model setup and provenance: the OCR models and their trade-offs.
Performance evidence guide: how latency claims are measured, with current reference numbers.
Eval metadata policy: which dictionary interpretation mite teaches when several are valid.
Pure-GPU exploration notes: exploratory, not scheduled.
Third-party notices and model manifest
Agent guidance

Development

Core checks:

cargo fmt --check
cargo test
cargo clippy --all-targets -- -D warnings

The local Git hook runs .\scripts\precommit.ps1. Install or refresh it with:

.\scripts\bootstrap-dev.ps1 -HooksOnly

Run private real-image evals when OCR, dictionary, detection, recognition, eval, or popup metadata behavior changes:

.\scripts\bootstrap-dev.ps1 -EvalDataOnly
cargo run -- eval-corpus --root eval --out target\eval\corpus-summary.json --allow-failures
.\scripts\precommit.ps1 -IncludeEval

The private eval submodule contains corpus-specific annotation instructions and the eval annotation skill under eval\.agents\.

Runtime Assets And Data

The following paths are local artifacts and intentionally ignored:

models\
cache\
target\
mite.toml
.gpu-runtime\
.venv-models\
.env

OCR models, JMdict, JPDB frequency data, NVIDIA runtime DLLs, ONNX Runtime components, and eval captures remain under their own upstream terms. See THIRD_PARTY_NOTICES.md and model-manifest.json before redistributing any runtime assets or generated bundles.

License

Mite is licensed under the GNU Affero General Public License v3.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.agents/skills		.agents/skills
.claude		.claude
.githooks		.githooks
.github		.github
app		app
docs		docs
eval @ beb4497		eval @ beb4497
examples		examples
scripts		scripts
site		site
src		src
.bastion.yaml		.bastion.yaml
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.nudge.yaml		.nudge.yaml
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
PRODUCT.md		PRODUCT.md
README.md		README.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
build.rs		build.rs
model-manifest.json		model-manifest.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mite

Install

How well does it read?

Status

Quick start (build from source)

Features

Commands

Documentation

Development

Runtime Assets And Data

License

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mite

Install

How well does it read?

Status

Quick start (build from source)

Features

Commands

Documentation

Development

Runtime Assets And Data

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages