Getting Started
VoiceUse is a local desktop voice agent that controls your computer hands-free. Speak commands, and VoiceUse will transcribe, plan, execute, and respond — all running natively on your machine.
What VoiceUse Can Do
- Control windows — open, focus, minimize, resize, and move applications
- Type text — dictation into any text field
- Click UI elements — describe what you want to click in natural language
- Take screenshots — capture full screens or specific windows
- Execute system commands — run shell commands through an allow-list
- Browse the web — open URLs and navigate pages
Quick Start
1. Install
pipx (Recommended)
pipx install "voice-computer-use-agent[all]"
uv
uv tool install "voice-computer-use-agent[all]"
pip
pip install "voice-computer-use-agent[all]"
2. Set API Keys
export GROQ_API_KEY="gsk_..."
export OPENAI_API_KEY="sk-..." # optional fallback
export ANTHROPIC_API_KEY="sk-ant-..." # optional vision
3. Run
voiceuse
Hold right-ctrl and speak, then release to submit. Or say "Computer" if wake word is enabled.
System Requirements
| Requirement | Details |
|---|---|
| Python | 3.10 or higher |
| OS | Windows (primary), Linux, macOS (best-effort) |
| Microphone | Required for voice input |
| API Keys | Groq required; OpenAI/Anthropic optional |
Architecture Overview
VoiceUse follows a modular pipeline architecture:
flowchart LR
User[User speaks] --> STT[STT
Groq Whisper] STT --> Brain[Brain
LLM Orchestrator] Brain --> Tools[Tool Registry] Tools --> OS[OS Controller] Tools --> Vision[Vision Bridge] Brain --> TTS[TTS
edge-tts/pyttsx3] TTS --> User
Groq Whisper] STT --> Brain[Brain
LLM Orchestrator] Brain --> Tools[Tool Registry] Tools --> OS[OS Controller] Tools --> Vision[Vision Bridge] Brain --> TTS[TTS
edge-tts/pyttsx3] TTS --> User
Learn more in the Architecture section.
Next Steps
- Installation Guide — Detailed platform-specific setup
- Configuration — Customize behavior with
config.yaml - Usage — Learn how to use VoiceUse effectively
- Plugins — Extend with custom providers