ChromePilot

Control any webpage using natural language.

A Chrome extension that lets you automate browser actions — click, type, scroll, navigate — just by describing what you want in plain English or Chinese.

English · 中文

Features

Natural Language Control — Type commands like "click the login button" or "fill in my email" and ChromePilot does it for you
Multi-step Automation — Chain complex tasks: "Go to Habitica and complete all my daily tasks"
URL Navigation — Say "open YouTube" or "go to google.com" to navigate anywhere
Smart Result Extraction — Ask "translate 'hello' on Google Translate" and get the answer directly in the chat
Persistent Side Panel — The panel stays open across tab switches (powered by Chrome's native Side Panel API)
Multi-provider LLM Support — Works with OpenAI, Anthropic Claude, GitHub Copilot, Ollama (local), or any OpenAI-compatible API
Configurable Execution — Adjust action delay, max steps, and open-in-new-tab behavior from the panel header
Dialog Awareness — Automatically detects and prioritizes popups, modals, and dialogs
Teach Mode — Record your actions to demonstrate workflows, then replay them with AI assistance
Action Preview & Confirm — Review planned actions with visual highlights before execution; provide feedback to re-analyze

Demo

Basic Actions — Click, Type, Scroll

Command: "drink water 10 times"

In-page Navigation — Multi-step Tasks

Command: "go to tasks and drink water 10 times"

Cross-page Navigation — Open URLs & Extract Results

Command: "go to Google Translate and translate 'what is surprise' to Chinese"

Cross-site Automation — Navigate & Interact

Command: "go to my github homepage and star the repository ChromePilot"

Debug Overlay — Inspect Detected Elements

Use the 👁 button to visualize all detected interactive elements with their index numbers.

Action Preview & Confirm — Review Before Execution

Actions are highlighted with numbered labels. Confirm to execute, or type feedback and re-analyze.

Auto-run Mode — Skip Confirmation

Toggle "Auto-run" to execute actions immediately without preview.

Installation

Clone the repository:

git clone https://github.com/GOODDAYDAY/ChromePilot.git

Open Chrome and navigate to chrome://extensions
Enable Developer mode (toggle in the top right)
Click Load unpacked and select the src folder
Click the ChromePilot icon in the toolbar to open the side panel

Configuration

Right-click the ChromePilot icon → Options (or go to chrome://extensions → ChromePilot → Details → Extension options)
Select a Provider Preset:

Provider	Base URL	Notes
OpenAI	`https://api.openai.com`	Requires API key
Anthropic Claude	`https://api.anthropic.com`	Requires API key
GitHub Copilot	`https://models.inference.ai.azure.com`	Requires GitHub token
Ollama (Local)	`http://localhost:11434`	Free, runs locally
Custom	Any OpenAI-compatible endpoint

Enter your API Key and Model name
Click Test Connection to verify, then Save

Panel Settings

The side panel header provides quick settings:

Setting	Options	Default	Description
Same tab	On/Off	Off	Navigate in current tab instead of opening new tabs
Auto-run	On/Off	Off	Skip action preview, execute immediately
Max Steps	5 / 10 / 20 / 50 / Unlimited	10	Maximum LLM rounds per command
Action Delay	0s – 5s	0.5s	Delay between each action execution

Supported Actions

Action	Description	Example Command
click	Click any interactive element	"click the submit button"
type	Type text into input fields	"type 'hello world' in the search box"
scroll	Scroll the page	"scroll down"
navigate	Open a URL	"open YouTube", "go to baidu.com"
read	Extract text from the page	"what does the error message say?"

Architecture

src/
├── manifest.json              # Chrome MV3 manifest
├── background/
│   ├── service-worker.js      # Orchestrator: DOM → LLM → Actions loop
│   └── llm-client.js          # Multi-provider LLM client
├── content/
│   ├── content-script.js      # Message handler on web pages
│   ├── dom-extractor.js       # Extracts interactive elements
│   ├── action-executor.js     # Simulates click/type/scroll/read
│   ├── action-previewer.js    # Preview overlay (red borders + step labels)
│   └── action-recorder.js     # Teach mode action recording
├── sidepanel/
│   ├── sidepanel.html         # Chat UI (Chrome Side Panel API)
│   ├── sidepanel.js           # Panel logic & settings
│   └── sidepanel.css          # Styles
├── options/                   # LLM provider configuration page
├── lib/utils.js               # Shared helpers
└── icons/                     # Extension icons

How It Works

User types a command in the side panel
Service worker extracts interactive elements from the active tab
Elements + command are sent to the configured LLM
LLM returns a list of actions (click, type, scroll, navigate, read)
Actions are previewed with red highlights and step labels (unless Auto-run is on)
User confirms or provides feedback to re-analyze
Confirmed actions are executed sequentially on the page
If the task isn't done (done: false), repeat from step 2

Requirements

Chrome 114+ (for Side Panel API support)
An LLM API endpoint (cloud or local)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.claude		.claude
docs/images		docs/images
requirements		requirements
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
PRIVACY.md		PRIVACY.md
README.md		README.md
README_CN.md		README_CN.md
build.bat		build.bat
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChromePilot

Features