Voice AI Agent
Tap to speak with AI
Speech Recognition Mode
Server-based (Modal Whisper)
In-browser WASM (Whisper.cpp)
Tiny English (31 MB, fastest)
Base English (57 MB, recommended)
Base Multilingual (142 MB)
Enable Text-to-Speech
Uncheck to skip speech generation for faster responses
Ready to record
🎤
Tap to Record
Test with Static Audio
Conversation
Mode:
Chat
State:
You said:
AI response:
Test TTS Service
Generate & Play Audio
Test Full Pipeline (LLM + TTS)
Send to AI & Generate Response