Getting Started

VoiceUse is a local desktop voice agent that controls your computer hands-free. Speak commands, and VoiceUse will transcribe, plan, execute, and respond — all running natively on your machine.

What VoiceUse Can Do

Control windows — open, focus, minimize, resize, and move applications
Type text — dictation into any text field
Click UI elements — describe what you want to click in natural language
Take screenshots — capture full screens or specific windows
Execute system commands — run shell commands through an allow-list
Browse the web — open URLs and navigate pages

Quick Start

1. Install

pipx (Recommended)

pipx install "voice-computer-use-agent[all]"

uv

uv tool install "voice-computer-use-agent[all]"

pip

pip install "voice-computer-use-agent[all]"

2. Set API Keys

export GROQ_API_KEY="gsk_..."
export OPENAI_API_KEY="sk-..."      # optional fallback
export ANTHROPIC_API_KEY="sk-ant-..." # optional vision

3. Run

voiceuse

Hold right-ctrl and speak, then release to submit. Or say "Computer" if wake word is enabled.

System Requirements

Requirement	Details
Python	3.10 or higher
OS	Windows (primary), Linux, macOS (best-effort)
Microphone	Required for voice input
API Keys	Groq required; OpenAI/Anthropic optional

Architecture Overview

VoiceUse follows a modular pipeline architecture:

flowchart LR User[User speaks] --> STT[STT
Groq Whisper] STT --> Brain[Brain
LLM Orchestrator] Brain --> Tools[Tool Registry] Tools --> OS[OS Controller] Tools --> Vision[Vision Bridge] Brain --> TTS[TTS
edge-tts/pyttsx3] TTS --> User

Learn more in the Architecture section.

Next Steps

Installation Guide — Detailed platform-specific setup
Configuration — Customize behavior with config.yaml
Usage — Learn how to use VoiceUse effectively
Plugins — Extend with custom providers