Voice Dictation: Improvement Attempts
Section titled “Voice Dictation: Improvement Attempts”Date: 2026-03-30
Context
Section titled “Context”Explored improvements to voice dictation workflow. Three tools in use: LilySpeech, Whisper push-to-talk (custom implementation), and CC /voice mode.
Findings
Section titled “Findings”LilySpeech
Section titled “LilySpeech”Best tool for Windows apps (Obsidian, Ecco Pro, browsers). Two issues emerged recently:
- “and” dropped after spoken comma: A language model post-processing artifact introduced by a LilySpeech update. When “comma” is spoken, the
,character creates a clause boundary — modern grammar correctors filter “and” at clause boundaries as a connector filler. Workaround: say “comma and” quickly without pausing. Permanent fix: disable Smart Punctuation / Post-processing in LilySpeech Advanced Settings. - Spaces eliminated in WSL Ubuntu terminal: Windows SendInput APIs don’t translate correctly into WSL terminal windows. Not fixable — LilySpeech is the wrong tool for that context.
Whisper Push-to-Talk (custom)
Section titled “Whisper Push-to-Talk (custom)”- 10-20 second delay with no word-by-word feedback — inherent to batch transcription architecture
- Unreliable text insertion in WSL Ubuntu terminal (clipboard/xdotool paste doesn’t reliably target WSL windows)
- Works acceptably for Obsidian and Windows apps, but the delay degrades the experience vs LilySpeech
- Decision: removed entirely. Freed ~1.4 GB from
~/.voicemode/(whisper model + Kokoro TTS). Startup shortcut deleted, services disabled.
CC /voice Mode
Section titled “CC /voice Mode”- Whisper (STT) and Kokoro (TTS) were running locally and integrated with CC’s voice mode
- Hold-Space → inject text into CC input: failed for same root cause as LilySpeech — WSL terminal text injection is not reliable from Windows-side processes
- Conversational
/voicemode (speak → Claude hears via whisper → Claude responds with Kokoro TTS): functionally worked end-to-end, confirmed by log analysis - Not suitable for dictating long multi-sentence inputs; changes interaction model to back-and-forth conversation
Outcome
Section titled “Outcome”Whisper + Kokoro removed from system. /voice mode disabled in Claude Code settings (voiceEnabled: false).
Current dictation strategy:
- Claude Code: LilySpeech → dictate into Windows Notepad → paste into CC terminal. Not elegant but reliable.
- Obsidian / Windows apps: LilySpeech directly — best-in-class for these contexts.
- Claude Code conversational:
/voicemode remains available if re-enabled, useful for short back-and-forth.
Root Cause: Why WSL Terminal Blocks All Dictation Tools
Section titled “Root Cause: Why WSL Terminal Blocks All Dictation Tools”All external text injection approaches (LilySpeech, Whisper hotkey, CC Hold-Space) fail in the WSL Ubuntu terminal because they rely on Windows SendInput or clipboard paste, neither of which reliably targets WSL terminal windows. This is a fundamental architectural constraint — not a bug in any individual tool.