Wake Word & Hotkey
Activate with "Computer" or hold Right Ctrl. Voice Activity Detection knows when you stop speaking.
A local desktop voice agent that runs natively on your machine. Speak commands, get results. No clicking required.
VoiceUse combines speech recognition, AI reasoning, and system control into a single seamless experience.
Activate with "Computer" or hold Right Ctrl. Voice Activity Detection knows when you stop speaking.
Groq Whisper transcribes your speech in milliseconds. Runs off the main thread so the UI never freezes.
Windows, macOS, and Linux support for window management, typing, screenshots, and multi-monitor setups.
Click UI elements described in natural language. Powered by Codex CLI or Anthropic Computer Use API.
edge-tts and pyttsx3 with multi-backend playback. Confirms actions, reports errors, and keeps you informed.
Destructive actions trigger spoken confirmation. Keyword detection + allow-lists keep your system safe.
A streamlined pipeline from voice to action.
Hold the hotkey or say the wake word. Speak naturally — VAD detects when you finish.
Groq Whisper converts speech to text with high accuracy, running asynchronously.
LLM orchestrator plans actions. Groq primary, with OpenAI or Cerebras fallback for reliability.
Execute window commands, type text, take screenshots, or click UI elements via vision.
TTS speaks the result. You hear confirmations, errors, and status updates in real time.
Choose your platform. No Python install required — these are standalone executables built with PyInstaller.
Prefer the command line? Install via pipx or uv:
pipx install "voice-computer-use-agent[all]"
Replace or extend entire subsystems. Mix and match providers to suit your workflow.
Replace the entire STT→LLM→TTS pipeline with a single xAI Realtime API WebSocket connection. Stream 24 kHz PCM audio end-to-end for ultra-low latency voice interaction.
Groq Whisper (default)
Groq, OpenAI, Cerebras
edge-tts, pyttsx3
Codex CLI, Anthropic
Your system is protected by multiple layers of safeguards.
Before any destructive action — close, quit, delete, shutdown — the agent speaks a confirmation prompt and waits for your verbal response.
Configurable destructive keyword list: close, quit, delete, remove, kill, terminate, shutdown, reboot, format, rm -rf, and password entry.
System commands run through an allow-list by default. Unknown commands are blocked with an error message, not executed silently.
Join the growing community of developers controlling their desktops with voice.