Control any webpage using natural language.
A Chrome extension that lets you automate browser actions — click, type, scroll, navigate — just by describing what you want in plain English or Chinese.
- Natural Language Control — Type commands like "click the login button" or "fill in my email" and ChromePilot does it for you
- Multi-step Automation — Chain complex tasks: "Go to Habitica and complete all my daily tasks"
- URL Navigation — Say "open YouTube" or "go to google.com" to navigate anywhere
- Smart Result Extraction — Ask "translate 'hello' on Google Translate" and get the answer directly in the chat
- Persistent Side Panel — The panel stays open across tab switches (powered by Chrome's native Side Panel API)
- Multi-provider LLM Support — Works with OpenAI, Anthropic Claude, GitHub Copilot, Ollama (local), or any OpenAI-compatible API
- Configurable Execution — Adjust action delay, max steps, and open-in-new-tab behavior from the panel header
- Dialog Awareness — Automatically detects and prioritizes popups, modals, and dialogs
- Teach Mode — Record your actions to demonstrate workflows, then replay them with AI assistance
- Action Preview & Confirm — Review planned actions with visual highlights before execution; provide feedback to re-analyze
Command: "drink water 10 times"
Command: "go to tasks and drink water 10 times"
Command: "go to Google Translate and translate 'what is surprise' to Chinese"
Command: "go to my github homepage and star the repository ChromePilot"
Use the 👁 button to visualize all detected interactive elements with their index numbers.
Actions are highlighted with numbered labels. Confirm to execute, or type feedback and re-analyze.
Toggle "Auto-run" to execute actions immediately without preview.
-
Clone the repository:
git clone https://github.com/GOODDAYDAY/ChromePilot.git
-
Open Chrome and navigate to
chrome://extensions -
Enable Developer mode (toggle in the top right)
-
Click Load unpacked and select the
srcfolder -
Click the ChromePilot icon in the toolbar to open the side panel
-
Right-click the ChromePilot icon → Options (or go to
chrome://extensions→ ChromePilot → Details → Extension options) -
Select a Provider Preset:
| Provider | Base URL | Notes |
|---|---|---|
| OpenAI | https://api.openai.com |
Requires API key |
| Anthropic Claude | https://api.anthropic.com |
Requires API key |
| GitHub Copilot | https://models.inference.ai.azure.com |
Requires GitHub token |
| Ollama (Local) | http://localhost:11434 |
Free, runs locally |
| Custom | Any OpenAI-compatible endpoint |
-
Enter your API Key and Model name
-
Click Test Connection to verify, then Save
The side panel header provides quick settings:
| Setting | Options | Default | Description |
|---|---|---|---|
| Same tab | On/Off | Off | Navigate in current tab instead of opening new tabs |
| Auto-run | On/Off | Off | Skip action preview, execute immediately |
| Max Steps | 5 / 10 / 20 / 50 / Unlimited | 10 | Maximum LLM rounds per command |
| Action Delay | 0s – 5s | 0.5s | Delay between each action execution |
| Action | Description | Example Command |
|---|---|---|
| click | Click any interactive element | "click the submit button" |
| type | Type text into input fields | "type 'hello world' in the search box" |
| scroll | Scroll the page | "scroll down" |
| navigate | Open a URL | "open YouTube", "go to baidu.com" |
| read | Extract text from the page | "what does the error message say?" |
src/
├── manifest.json # Chrome MV3 manifest
├── background/
│ ├── service-worker.js # Orchestrator: DOM → LLM → Actions loop
│ └── llm-client.js # Multi-provider LLM client
├── content/
│ ├── content-script.js # Message handler on web pages
│ ├── dom-extractor.js # Extracts interactive elements
│ ├── action-executor.js # Simulates click/type/scroll/read
│ ├── action-previewer.js # Preview overlay (red borders + step labels)
│ └── action-recorder.js # Teach mode action recording
├── sidepanel/
│ ├── sidepanel.html # Chat UI (Chrome Side Panel API)
│ ├── sidepanel.js # Panel logic & settings
│ └── sidepanel.css # Styles
├── options/ # LLM provider configuration page
├── lib/utils.js # Shared helpers
└── icons/ # Extension icons
- User types a command in the side panel
- Service worker extracts interactive elements from the active tab
- Elements + command are sent to the configured LLM
- LLM returns a list of actions (click, type, scroll, navigate, read)
- Actions are previewed with red highlights and step labels (unless Auto-run is on)
- User confirms or provides feedback to re-analyze
- Confirmed actions are executed sequentially on the page
- If the task isn't done (
done: false), repeat from step 2
- Chrome 114+ (for Side Panel API support)
- An LLM API endpoint (cloud or local)
MIT






