Skip to content

Date: 2026-03-30


Explored improvements to voice dictation workflow. Three tools in use: LilySpeech, Whisper push-to-talk (custom implementation), and CC /voice mode.


Best tool for Windows apps (Obsidian, Ecco Pro, browsers). Two issues emerged recently:

  • “and” dropped after spoken comma: A language model post-processing artifact introduced by a LilySpeech update. When “comma” is spoken, the , character creates a clause boundary — modern grammar correctors filter “and” at clause boundaries as a connector filler. Workaround: say “comma and” quickly without pausing. Permanent fix: disable Smart Punctuation / Post-processing in LilySpeech Advanced Settings.
  • Spaces eliminated in WSL Ubuntu terminal: Windows SendInput APIs don’t translate correctly into WSL terminal windows. Not fixable — LilySpeech is the wrong tool for that context.
  • 10-20 second delay with no word-by-word feedback — inherent to batch transcription architecture
  • Unreliable text insertion in WSL Ubuntu terminal (clipboard/xdotool paste doesn’t reliably target WSL windows)
  • Works acceptably for Obsidian and Windows apps, but the delay degrades the experience vs LilySpeech
  • Decision: removed entirely. Freed ~1.4 GB from ~/.voicemode/ (whisper model + Kokoro TTS). Startup shortcut deleted, services disabled.
  • Whisper (STT) and Kokoro (TTS) were running locally and integrated with CC’s voice mode
  • Hold-Space → inject text into CC input: failed for same root cause as LilySpeech — WSL terminal text injection is not reliable from Windows-side processes
  • Conversational /voice mode (speak → Claude hears via whisper → Claude responds with Kokoro TTS): functionally worked end-to-end, confirmed by log analysis
  • Not suitable for dictating long multi-sentence inputs; changes interaction model to back-and-forth conversation

Whisper + Kokoro removed from system. /voice mode disabled in Claude Code settings (voiceEnabled: false).

Current dictation strategy:

  • Claude Code: LilySpeech → dictate into Windows Notepad → paste into CC terminal. Not elegant but reliable.
  • Obsidian / Windows apps: LilySpeech directly — best-in-class for these contexts.
  • Claude Code conversational: /voice mode remains available if re-enabled, useful for short back-and-forth.

Root Cause: Why WSL Terminal Blocks All Dictation Tools

Section titled “Root Cause: Why WSL Terminal Blocks All Dictation Tools”

All external text injection approaches (LilySpeech, Whisper hotkey, CC Hold-Space) fail in the WSL Ubuntu terminal because they rely on Windows SendInput or clipboard paste, neither of which reliably targets WSL terminal windows. This is a fundamental architectural constraint — not a bug in any individual tool.