Skip to content

WhissleAI/claude_code_voice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

claude-voice

Voice-enabled wrapper for Claude Code CLI. Adds Alt+V push-to-talk dictation powered by Whissle ASR streaming.

Press Alt+V to start recording, speak your prompt, press Alt+V again to stop. Your speech is transcribed in real-time and injected into the Claude Code input — along with voice metadata (emotion, intent, speech rate) that Claude can use as context.

Quick start

One-liner install (macOS):

git clone https://github.com/WhissleAI/claude_code_voice.git && cd claude_code_voice && ./install.sh

Then run:

./claude-voice --token <your-whissle-token>

How it works

┌─────────────────────────────────────────────────┐
│  claude-voice (PTY wrapper)                     │
│                                                 │
│  stdin ──┬──→ claude (spawned in pseudo-TTY)    │
│          │                                      │
│          └─ Alt+V toggles voice ──→ sox (mic)   │
│                                     │           │
│                  Whissle ASR  ←─────┘           │
│                  (WebSocket)                    │
│                     │                           │
│                     ▼                           │
│              transcript + metadata              │
│              injected into prompt               │
└─────────────────────────────────────────────────┘
  • Spawns claude inside a pseudo-terminal, passing all I/O transparently
  • Intercepts Alt+V (ESC v) to toggle microphone recording via sox/rec
  • Streams 16kHz PCM audio to Whissle ASR over WebSocket
  • Final transcripts are typed into the Claude prompt automatically
  • Voice metadata (emotion, intent) is appended as inline HTML comments that Claude can read
  • A running metadata summary is written to .claude-voice/voice-metadata.md for Claude to reference
  • Terminal title bar shows live voice status and metadata while recording

Prerequisites

Requirement Version Notes
Node.js 22+ Native WebSocket support required
Claude Code CLI latest Must be installed and in PATH (claude command)
sox any Audio capture (rec/sox command)
Whissle token Get one from whissle.ai

The install.sh script checks for all prerequisites and offers to install any that are missing.

Install

Option 1: One-liner

git clone https://github.com/WhissleAI/claude_code_voice.git && cd claude_code_voice && ./install.sh

Option 2: Manual

# Install prerequisites
brew install sox                          # macOS
# sudo apt install sox                    # Linux
npm install -g @anthropic-ai/claude-code  # if not already installed

# Clone and install
git clone https://github.com/WhissleAI/claude_code_voice.git
cd claude_code_voice
npm install

Usage

# With --token flag
./claude-voice --token wh_your_token_here

# Or via environment variable
export WHISSLE_AUTH_TOKEN="wh_your_token_here"
./claude-voice

# Pass any Claude Code arguments through
./claude-voice --token wh_... --model sonnet
./claude-voice --token wh_... -p "explain this codebase"
./claude-voice --token wh_... --continue

Keyboard shortcuts

Key Action
Alt+V Toggle voice recording on/off
All other keys Passed through to Claude Code as normal

During recording

  • Terminal title bar updates with live metadata (emotion, intent, WPM)
  • Status messages appear in stderr
  • Transcribed text is injected into the prompt as you speak
  • Press Alt+V again to stop recording

Configuration

Configuration via CLI flags or environment variables:

CLI Flag Env Variable Default Description
--token <token> WHISSLE_AUTH_TOKEN (required) Whissle API auth token
--asr-url <url> WHISSLE_ASR_URL wss://api.whissle.ai/asr/stream ASR WebSocket endpoint
--language <code> WHISSLE_ASR_LANGUAGE en Speech recognition language

CLI flags take precedence over environment variables. All other flags are passed through to Claude Code.

Voice metadata

While recording, claude-voice tracks:

  • Emotion — detected emotional tone (e.g. happy, frustrated, neutral)
  • Intent — classified speech intent
  • Speech rate — words per minute, filler word count, pause count
  • Emotion trend — shift detection across segments

This metadata is:

  1. Shown in the terminal title bar in real-time
  2. Written to .claude-voice/voice-metadata.md after each segment (Claude can read this file)
  3. Appended as <!-- voice: emotion:X, intent:Y --> comments after transcribed text

Project structure

src/
├── index.ts         # Main entry — PTY wrapper, Alt+V intercept, orchestration
├── mic.ts           # Microphone capture via sox/rec subprocess
├── asr-client.ts    # WebSocket client for Whissle ASR streaming
└── metadata.ts      # Voice metadata accumulator + markdown writer

Troubleshooting

'claude' not found in PATH — Install Claude Code: npm install -g @anthropic-ai/claude-code

sox not found — Install sox: brew install sox (macOS) or sudo apt install sox (Linux)

Voice server connection failed — Check your --token value is valid and you have internet connectivity

Mic error — Ensure your microphone is connected and your terminal has microphone permissions (macOS: System Settings > Privacy & Security > Microphone)

Module errors after Node.js upgrade — Delete node_modules and reinstall: rm -rf node_modules && npm install

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors