Touchless hand control for the web. Add it to any web app and let users move a cursor, click, and trigger custom intents with their hand β just a webcam, no extra hardware.
- Requirements
- Setup
- Quick start
- Enabling the AirMouse ML model
- How it works
- Recipes
- API reference
- About AirMouse
- Contributing
- License
- Modern browser with
getUserMedia(Chrome, Edge, Safari 16+, Firefox). - Page served over HTTPS (or
localhost) β required by the camera API. - A bundler (Vite, Next.js, webpack, β¦) or any static host that can serve a public assets directory.
Single prompt install
Help me add touchless controls to my app with this repo:
https://github.com/marioandf/airpoint-sdkDone.
You can also use SKILL.md to install and configure Airpoint for you interactively.
1. Install the package.
npm install airpoint-sdk
# or: pnpm add airpoint-sdk Β· yarn add airpoint-sdk Β· bun add airpoint-sdk2. Copy the runtime assets into your public directory.
The MediaPipe model and WASM files can't be bundled β they need to be served as static files.
npx airpoint-sdk-copy-assets --out public --base airpointThis writes everything under public/airpoint/. If your framework uses a different static folder (e.g. SvelteKit's static/), pass --out static.
3. (Optional) Add your AirMouse license key to .env.
The SDK works without a key using the built-in heuristic engine. If you have a license, drop the key in your environment file:
# .env / .env.local
VITE_AIRPOINT_API_KEY=your-license-key-hereUse whatever env-var prefix your framework requires:
VITE_*for Vite,NEXT_PUBLIC_*for Next.js,PUBLIC_*for SvelteKit/Astro, etc. Keys are loaded in the browser, so anything you expose to the client is fine.
That's the full setup. Now you can wire it up.
import {
createAirpointPlugin,
createAirpointCursorOverlay,
createAirpointDomAdapter,
} from "airpoint-sdk";
const video = document.querySelector("video")!;
const cursor = createAirpointCursorOverlay({ style: "arrow" });
const apiKey = import.meta.env.VITE_AIRPOINT_API_KEY; // or process.env.NEXT_PUBLIC_AIRPOINT_API_KEY, etc.
const plugin = createAirpointPlugin({
apiKey, // optional β enables AirMouse if present
video,
adapter: createAirpointDomAdapter(),
manifest: {
runtime: { assets: { basePath: "/airpoint" } },
tracking: {
config: {
enableMLClassifier: Boolean(apiKey),
gestureModel: "airmouse-4.3-onnx",
},
},
intents: {
thumb_middle_pinch: { tap: "primary-select" },
},
},
});
plugin.on("move", (e) => {
cursor.move(e.x, e.y, {
space: "normalized",
clicking: e.clicking,
grabbing: e.grabbing,
rightClicking: e.rightClicking,
hand: e.hand,
});
});
plugin.on("hand_lost", () => cursor.hide());
// Recommended: warm assets and the gesture engine as soon as your app loads.
// This keeps the user's first "enable tracking" click fast.
void plugin.prepare().catch((error) => {
console.warn("Airpoint prepare failed:", error);
});
async function enableAirpoint() {
await plugin.startCamera(video);
await plugin.start();
}
function disableAirpoint() {
plugin.pause(); // keep loaded models warm for the next enable
plugin.stopCamera();
cursor.hide();
}
// Wire enableAirpoint/disableAirpoint to your app's touchless toggle.That's it. Show your hand to the camera, the cursor follows your fingertip, and a thumb-to-middle pinch clicks whatever's under it.
If
start()can't find an asset, it throws with the exact missing path and the copy-assets command to run β no silent failures.
Three things need to be true:
- You have a license key in your env (see Setup step 3).
- You pass the key as
apiKeywhen creating the plugin. - Your manifest has
tracking.config.enableMLClassifier: trueandgestureModel: "airmouse-4.3-onnx".
The Quick start above already does all three. Without the key, the plugin falls back to the heuristic engine β same API, lower accuracy, and no need to set enableMLClassifier.
Webcam ββΆ MediaPipe hand tracker ββΆ Gesture engine ββΆ Plugin events ββΆ Adapter (DOM, your code)
β
(optional AirMouse classifier, with key)
- Tracker β MediaPipe runs on-device and produces 21 hand landmarks per frame.
- Gesture engine β Built-in heuristics or AirMouse turn landmarks into pinches, grabs, scrolls, and a moving cursor.
- Manifest β You declare which gestures map to which intents (
tap,dispatch_event,focus, β¦) and which targets they hit. - Adapter β The bridge between intents and your app. The bundled DOM adapter turns taps into real DOM clicks.
Lifecycle:
plugin.prepare()β preload assets and warm the engine before the user starts. Recommended on app startup.plugin.startCamera(video)+plugin.start()β use when the user enables touchless tracking.plugin.pause()+plugin.stopCamera()β use for in-app toggles; processing and camera stop, but loaded models stay warm.plugin.stop()β full teardown. Reserve it for unmount, logout, or permanent disable because it unloads warmed state.
createAirpointDomAdapter({
actions: {
"open-menu": "dispatch_event",
"focus-search": "focus",
},
});Force intents to act on their declared manifest target instead of whatever's under the cursor:
createAirpointDomAdapter({ pointerTarget: "intent" });createAirpointCursorOverlay() ships with a built-in pulse. Forward click state from move events and it animates automatically:
const cursor = createAirpointCursorOverlay({
style: "arrow",
clickAnimation: "pulse",
});
plugin.on("intent", () => cursor.pulse()); // for app-defined intentsUse clickAnimation: "none" to handle feedback yourself.
Disable the classifier and listen for raw landmarks to build your own pose/dwell/swipe logic:
const plugin = createAirpointPlugin({
video,
manifest: {
runtime: {
emitRawLandmarks: true,
assets: { basePath: "/airpoint" },
},
tracking: { config: { enableMLClassifier: false } },
},
});
plugin.on("raw_landmarks", (event) => {
// your pinch / dwell / swipe logic
});The built-in
intentsmap is driven by SDK pose events. Plugging a custom recognizer directly into that pipeline isn't a stable public API yet β for now, custom heuristics live in your app code on top ofraw_landmarksandmove.
Camera, DOM, and MediaPipe all need browser APIs. Create the plugin only in client-side code (a useEffect, a dynamic import, etc.).
Stable v0 surface β won't break in patch/minor releases:
| Export | Purpose |
|---|---|
createAirpointPlugin(options) |
Main entry. Wires a video, manifest, and adapter into a running plugin. |
createAirpointCursorOverlay(options) |
Prebuilt cursor with click animations. |
createAirpointDomAdapter(options) |
Framework-agnostic DOM adapter. |
validateAirpointManifest(manifest) |
Throws on invalid manifests. Useful in tests. |
normalizeAirpointManifest(manifest) |
Fills in defaults; returns the resolved manifest. |
resolveAirpointSdkAssetPaths(assets) |
Resolves the full set of runtime asset URLs. |
getAirpointSdkRequiredAssets(assets, profile) |
Lists assets required for a given profile. |
validateAirpointSdkAssets(assets, profile) |
Verifies assets are reachable. |
Types: AirpointPlugin, AirpointPluginManifest, AirpointHostAdapter, AirpointIntent.
If you've ever tried to write your own gesture detection on top of hand landmarks, you know how it goes: a pinch threshold that works for your hand but not your coworker's, a "click" that fires when someone scratches their nose, distance heuristics that fall apart the moment the hand tilts. It's a lot of trial and error, and the result is usually still flaky.
AirMouse is the model we built so you don't have to do that.
It's a temporal convolutional network (TCN) trained on a hand-collected, hand-labeled dataset of pinches, clicks, grabs, scrolls, and idle motion across many hands, lighting conditions, and camera angles.
| Metric | airmouse-4.3-onnx |
|---|---|
| Test accuracy | 97.73% |
| Inference (ONNX, in-browser) | ~1β2 ms / frame |
| Gesture classes | idle, click, right_click, grab, scroll |
| Runtime | ONNX Runtime Web (WASM, on-device) |
Runs locally β no frames leave the user's machine.
Licenses are how the model and the rest of Airpoint stay maintained. Grab one at airpoint.app, or reach out if you're a student, researcher, or OSS maintainer.
PRs and issues welcome. The repo is a small pnpm workspace.
pnpm install
pnpm typecheck
pnpm test
pnpm build
pnpm dev:example # runs examples/basicThe example app lives in examples/basic. Copy .env.example to .env.local and set VITE_AIRPOINT_API_KEY to try AirMouse; without a key it uses the heuristic engine.
Questions? hello@airpoint.app.
Apache-2.0. MediaPipe and ONNX Runtime browser assets are covered by their upstream licenses β see NOTICES.md.
The AirMouse model is not part of the OSS package and is delivered separately under its own terms.