VoiceUse / Docs

Getting Started

VoiceUse is a local desktop voice agent that controls your computer hands-free. Speak commands, and VoiceUse will transcribe, plan, execute, and respond — all running natively on your machine.

What VoiceUse Can Do

Quick Start

1. Install

pipx install "voice-computer-use-agent[all]"

uv

uv tool install "voice-computer-use-agent[all]"

pip

pip install "voice-computer-use-agent[all]"

2. Set API Keys

export GROQ_API_KEY="gsk_..."
export OPENAI_API_KEY="sk-..."      # optional fallback
export ANTHROPIC_API_KEY="sk-ant-..." # optional vision

3. Run

voiceuse

Hold right-ctrl and speak, then release to submit. Or say "Computer" if wake word is enabled.

System Requirements

Requirement Details
Python 3.10 or higher
OS Windows (primary), Linux, macOS (best-effort)
Microphone Required for voice input
API Keys Groq required; OpenAI/Anthropic optional

Architecture Overview

VoiceUse follows a modular pipeline architecture:

flowchart LR User[User speaks] --> STT[STT
Groq Whisper] STT --> Brain[Brain
LLM Orchestrator] Brain --> Tools[Tool Registry] Tools --> OS[OS Controller] Tools --> Vision[Vision Bridge] Brain --> TTS[TTS
edge-tts/pyttsx3] TTS --> User

Learn more in the Architecture section.

Next Steps