VoiceUse / Docs

Getting Started

VoiceUse is a local desktop voice agent that controls your computer hands-free. Speak commands, and VoiceUse will transcribe, plan, execute, and respond — all running natively on your machine.

What VoiceUse Can Do

Quick Start

1. Install

pipx install "voice-computer-use-agent[all]"

Or with uv:

uv tool install "voice-computer-use-agent[all]"

2. Set API Keys

export GROQ_API_KEY="gsk_..."
export OPENAI_API_KEY="sk-..."      # optional fallback
export ANTHROPIC_API_KEY="sk-ant-..." # optional vision

3. Run

voiceuse

Hold Right Ctrl and speak, then release to submit. Or say "Computer" if wake word is enabled.

System Requirements

RequirementDetails
Python3.10 or higher
OSWindows (primary), Linux, macOS (best-effort)
MicrophoneRequired for voice input
API KeysGroq required; OpenAI/Anthropic optional

Architecture Overview

VoiceUse follows a modular pipeline architecture:

flowchart LR User[User speaks] --> STT[STT<br/>Groq Whisper] STT --> Brain[Brain<br/>LLM Orchestrator] Brain --> Tools[Tool Registry] Tools --> OS[OS Controller] Tools --> Vision[Vision Bridge] Brain --> TTS[TTS<br/>edge-tts/pyttsx3] TTS --> User
Tip

Start with --dry-run to validate your setup without making API calls: voiceuse --dry-run

Next Steps