stt

Privacy-first speech-to-text for macOS.

Works entirely offline with Parakeet TDT 0.6B v2 on MLX.
Hold a hotkey, speak, release, and text appears in the focused app.
Handles quiet speech and noisy rooms with optional denoising.
Pastes most short transcriptions in under a second after release.
Removes filler words and common transcription mistakes.
Supports voice commands like scratch that and delete last 3 words.
Shows a visual recording indicator while you speak.
Cancels active recording automatically when macOS goes to sleep.

Requirements

macOS on Apple Silicon (M1 / M2 / M3 / …). MLX is arm64-only.
Recommended: 16GB RAM. Running the model uses approximately 1.5GB RAM.
Homebrew Python, not Anaconda. Anaconda can break macOS notifications and microphone permissions. Use brew install python@3.14 (or 3.12 / 3.13).

Setup

# 1. Install Homebrew Python (skip if you already have it)
brew install python@3.14

# 2. Clone and enter the project
git clone <repo-url> stt
cd stt

# 3. Create a virtual environment with Python's built-in venv module
python3 -m venv venv

# 4. Install dependencies into the virtual environment
venv/bin/python -m pip install -r requirements.txt

The first run downloads the Parakeet model from Hugging Face and caches it locally.

macOS permissions (required)

The script needs two permissions. macOS should prompt for each one the first time it triggers, but you can also grant them manually in System Settings → Privacy & Security:

Microphone — for your terminal app (Terminal.app, iTerm2, etc.). Without this, recordings are silently empty.
Accessibility — for your terminal app. Needed so the global hotkeys and the simulated Cmd+V paste work.

After granting either permission, fully quit and reopen the terminal (Cmd+Q, not just close the window) for it to take effect.

Usage

./stt.py

Click where you want the text to land, then use either:

Push-to-talk: hold Right Option, speak, release.
Toggle: press Option + Command to start, press again to stop.

The transcription is pasted into the focused input and appended to transcriptions.md. Your existing plain text clipboard contents is preserved and the paste is hidden from clipboard history managers (Raycast, Maccy, Alfred, Pastebot, etc.).

Voice commands

Pause before and after a command so it is its own utterance:

scratch that — delete the previous utterance
delete last 3 words — delete the previous 3 words

Commands apply only within the current recording before paste.

Settings

Configure runtime behavior with environment variables:

STT_INPUT_DEVICE=<index|name> ./stt.py — choose input device by index or name; default: macOS default input.
STT_DENOISE=<auto|0|1> ./stt.py — auto detects noisy clips, 0 disables, 1 forces; default: auto.
STT_SOUNDS=<0|1> ./stt.py — 0 disables paste sound, 1 enables it; default: 1.
STT_UTTERANCE_GAP=<seconds> ./stt.py — pause length for voice-command boundaries; default: 0.7.

List available devices:

python3 -c "import sounddevice as sd; print(sd.query_devices())"

Common transcription fixes live in text_cleanup.py. Filler-word cleanup is always on. The wpm and word count logged in transcriptions.md reflect what you actually spoke (fillers included) — only the saved text is cleaned.

Tests

venv/bin/python -m pip install -r requirements-dev.txt
venv/bin/python -m pytest

Troubleshooting

AttributeError: 'NoneType' object has no attribute 'removeAllDeliveredNotifications' — you're running under Anaconda Python. Recreate the venv using Homebrew Python (see Setup).
No audio captured / silent recordings — the terminal app doesn't have microphone permission, or Anaconda Python failed to trigger the TCC prompt. Grant permission manually in System Settings → Privacy & Security → Microphone, then fully quit and reopen the terminal.
Hotkey does nothing — the terminal app doesn't have Accessibility permission. Grant it in System Settings → Privacy & Security → Accessibility and fully restart the terminal.
Transcribed text appears in the terminal instead of where you wanted — the terminal was the focused window when you stopped recording. Click into your target app before pressing the stop hotkey.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
overlay.py		overlay.py
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
stt.py		stt.py
text_cleanup.py		text_cleanup.py
voice_commands.py		voice_commands.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

stt

Requirements

Setup

macOS permissions (required)

Usage

Voice commands

Settings

Tests

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

stt

Requirements

Setup

macOS permissions (required)

Usage

Voice commands

Settings

Tests

Troubleshooting

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages