Privacy-first speech-to-text for macOS.
- Works entirely offline with Parakeet TDT 0.6B v2 on MLX.
- Hold a hotkey, speak, release, and text appears in the focused app.
- Handles quiet speech and noisy rooms with optional denoising.
- Pastes most short transcriptions in under a second after release.
- Removes filler words and common transcription mistakes.
- Supports voice commands like
scratch thatanddelete last 3 words. - Shows a visual recording indicator while you speak.
- Cancels active recording automatically when macOS goes to sleep.
- macOS on Apple Silicon (M1 / M2 / M3 / …). MLX is arm64-only.
- Recommended: 16GB RAM. Running the model uses approximately 1.5GB RAM.
- Homebrew Python, not Anaconda. Anaconda can break macOS notifications and microphone permissions. Use
brew install python@3.14(or 3.12 / 3.13).
# 1. Install Homebrew Python (skip if you already have it)
brew install python@3.14
# 2. Clone and enter the project
git clone <repo-url> stt
cd stt
# 3. Create a virtual environment with Python's built-in venv module
python3 -m venv venv
# 4. Install dependencies into the virtual environment
venv/bin/python -m pip install -r requirements.txtThe first run downloads the Parakeet model from Hugging Face and caches it locally.
The script needs two permissions. macOS should prompt for each one the first time it triggers, but you can also grant them manually in System Settings → Privacy & Security:
- Microphone — for your terminal app (Terminal.app, iTerm2, etc.). Without this, recordings are silently empty.
- Accessibility — for your terminal app. Needed so the global hotkeys and the simulated
Cmd+Vpaste work.
After granting either permission, fully quit and reopen the terminal (Cmd+Q, not just close the window) for it to take effect.
./stt.pyClick where you want the text to land, then use either:
- Push-to-talk: hold Right Option, speak, release.
- Toggle: press Option + Command to start, press again to stop.
The transcription is pasted into the focused input and appended to transcriptions.md. Your existing plain text clipboard contents is preserved and the paste is hidden from clipboard history managers (Raycast, Maccy, Alfred, Pastebot, etc.).
Pause before and after a command so it is its own utterance:
scratch that— delete the previous utterancedelete last 3 words— delete the previous 3 words
Commands apply only within the current recording before paste.
Configure runtime behavior with environment variables:
STT_INPUT_DEVICE=<index|name> ./stt.py— choose input device by index or name; default: macOS default input.STT_DENOISE=<auto|0|1> ./stt.py—autodetects noisy clips,0disables,1forces; default:auto.STT_SOUNDS=<0|1> ./stt.py—0disables paste sound,1enables it; default:1.STT_UTTERANCE_GAP=<seconds> ./stt.py— pause length for voice-command boundaries; default:0.7.
List available devices:
python3 -c "import sounddevice as sd; print(sd.query_devices())"Common transcription fixes live in text_cleanup.py. Filler-word cleanup is always on. The wpm and word count logged in transcriptions.md reflect what you actually spoke (fillers included) — only the saved text is cleaned.
venv/bin/python -m pip install -r requirements-dev.txt
venv/bin/python -m pytestAttributeError: 'NoneType' object has no attribute 'removeAllDeliveredNotifications'— you're running under Anaconda Python. Recreate the venv using Homebrew Python (see Setup).- No audio captured / silent recordings — the terminal app doesn't have microphone permission, or Anaconda Python failed to trigger the TCC prompt. Grant permission manually in System Settings → Privacy & Security → Microphone, then fully quit and reopen the terminal.
- Hotkey does nothing — the terminal app doesn't have Accessibility permission. Grant it in System Settings → Privacy & Security → Accessibility and fully restart the terminal.
- Transcribed text appears in the terminal instead of where you wanted — the terminal was the focused window when you stopped recording. Click into your target app before pressing the stop hotkey.