Speech Recognition

Voice Dictation

LilySpeech, works better than Windows Speech Recognition (buggy)
- Voice Commands: https://lilyspeech.com/dictation-voice-commands/
- if stops working, restart the app, after killing it if needed
- Add/modify Custom Word Replacements
  - c:\Users\Admin\AppData\Local\LilySpeechApp\LilySpeechUser\replacements.txt
WhisperFlow
- $12/mn but uses AI to automatically format punctuation and more
Whisper.CPP Server - NOT an effective solution
- delay was too slow
- runs locally (offline), for any Windows or WSL app that accepts keyboard input
- Hotkey: Alt+Space — press once to start recording, press again to stop; transcription is pasted at cursor
- Installation
  - Script location: D:\FSS\Software\Utils\whisper-hotkey\whisper-hotkey.py
  - Silent launcher: D:\FSS\Software\Utils\whisper-hotkey\whisper-hotkey-launch.vbs
  - Whisper server (runs in WSL, auto-starts): installed via uvx voice-mode — reinstall with uvx voice-mode whisper install — no backup needed
  - Windows Python dependencies (one-time): cd D:\FSS\Software\Utils\whisper-hotkey then uv sync — installs all dependencies into .venv
  - Microphone privacy check: Settings → Privacy → Microphone → ensure “Allow apps to access your microphone” is ON. After changing, restart WSL: run wsl --shutdown in PowerShell.
  - Auto-start at Windows login: A shortcut already exists in the Startup folder (whisper-hotkey.lnk) pointing to the launcher — nothing to copy. Verify: Win+R → shell:startup
- Usage
  - Press Alt+Space to start recording (script must be running)
  - Speak naturally — punctuation is added automatically
  - Press Alt+Space again to stop — text is pasted at cursor via Ctrl+V
  - Works in: Windows Terminal, Obsidian, browsers, Word, Claude Code — any app accepting keyboard input
  - There is a short delay after stopping before text appears (transcription time)
- Notes
  - Output uses clipboard paste (Ctrl+V), not keystroke injection — this ensures compatibility with Electron apps (Obsidian, VS Code)
  - The Whisper server runs in WSL at localhost:2022. If it’s not reachable, open WSL and run uvx voice-mode whisper start
  - The base Whisper model is used by default (fast, ~223MB RAM). For better accuracy at the cost of speed, change to small or medium in ~/.voicemode/voicemode.env → VOICEMODE_WHISPER_MODEL=small
  - Also used by VoiceMode MCP in Claude Code for two-way voice conversations (same local server)