Browser Agent
The Browser Agent is a core component of the Q-ACE framework, providing high-level AI-powered web automation. It leverages the browser-use library to translate natural language tasks into precise browser actions.
🚀 Key Capabilities
- Natural Language Control: Describe tasks like "Find the lowest price for a 4K monitor on Amazon" or "Log in to the staging portal and verify the latest report."
- Real-Time Visual Feedback: Watch the agent work through a live SSE (Server-Sent Events) stream, showing every step and URL transition.
- Intelligent Decision Making: Powered by advanced LLMs that understand DOM structures and interactive elements.
- Multimodal Support: Uses vision (screenshots) to navigate complex UIs where text analysis alone might fail.
- Robust Failure Recovery: Automatically retries failed actions and adjusts strategies when encountering unexpected layouts.
🛠 ️ Technology Stack
- Library:
browser-usev0.12+ - Engine: Playwright (via browser-use)
- Runtime: Isolated
.venvusing theuvpackage manager for high performance. - Interface: FastAPI SSE for streaming updates to a modern Alpine.js frontend.
🤖 Supported LLM Providers
The Browser Agent supports a wide range of LLM providers:
- Google Gemini: Optimized for efficiency and speed (e.g.,
gemini-2.5-flash). - Ollama: Run privacy-focused automation locally (recommends
gemma3:1bor higher). - OpenAI: Support for
o3,gpt-4o, etc. - Anthropic: High-precision reasoning with
claude-3-5-sonnet. - DeepSeek: Specialized coding and reasoning models.
- Azure/AWS Bedrock: Enterprise-grade cloud integrations.
⚙ ️ Configuration & Customization
The agent is highly configurable via the Browser Settings and Agent Settings tabs:
Browser Settings
- Headless Mode: Run silently in the background or watch the window in "headful" mode.
- Recording & Tracing: Enable video recording (
ffmpegrequired) and Playwright tracing for debugging. - Device Emulation: Set custom window/viewport sizes and User-Agents.
- Domain Restrictions: Define
allowed_domainsorprohibited_domainsfor safety.
Agent Settings
- Thinking Mode: Allow the agent to pause and "think" before committing to an action.
- Flash Mode: Rapid-fire execution for simple, high-speed tasks.
- Max Steps/Actions: Control the execution depth to prevent infinite loops or excessive token usage.
- Safety: Built-in sensitive data filtering to protect credentials and private info.
📊 History & Analytics
Every execution is logged and stored in the localized data/auth.db:
- Step-by-Step Replay: View the history of actions, thoughts, and extracted data.
- AI-Powered Analysis: After a run, use the "Analyze" feature to get a professional Root Cause Analysis (RCA) and performance report generated by an LLM.
- Stats Dashboard: Track success rates and daily trends across your automation suite.
🚀 Execution Model
The agent runs in a dedicated subprocess (handlers/browser_agent_runner.py). This ensures that:
- The main FastAPI server remains responsive.
- The agent has its own resource pool.
- Errors in automation don't crash the entire framework.
- Robust cleanup and process management on both Windows and Linux.
Built with ❤️ by ATID College