Skip to content

MarioAndF/airpoint-sdk

Repository files navigation

Airpoint SDK

Touchless hand control for the web. Add it to any web app and let users move a cursor, click, and trigger custom intents with their hand β€” just a webcam, no extra hardware.

npm license


Requirements

  • Modern browser with getUserMedia (Chrome, Edge, Safari 16+, Firefox).
  • Page served over HTTPS (or localhost) β€” required by the camera API.
  • A bundler (Vite, Next.js, webpack, …) or any static host that can serve a public assets directory.

Setup

AI-Assisted Setup (Claude, Codex, Copilot, etc)

Single prompt install

Help me add touchless controls to my app with this repo:
https://github.com/marioandf/airpoint-sdk

Done.

You can also use SKILL.md to install and configure Airpoint for you interactively.

Manual Setup

1. Install the package.

npm install airpoint-sdk
# or: pnpm add airpoint-sdk Β· yarn add airpoint-sdk Β· bun add airpoint-sdk

2. Copy the runtime assets into your public directory.

The MediaPipe model and WASM files can't be bundled β€” they need to be served as static files.

npx airpoint-sdk-copy-assets --out public --base airpoint

This writes everything under public/airpoint/. If your framework uses a different static folder (e.g. SvelteKit's static/), pass --out static.

3. (Optional) Add your AirMouse license key to .env.

The SDK works without a key using the built-in heuristic engine. If you have a license, drop the key in your environment file:

# .env / .env.local
VITE_AIRPOINT_API_KEY=your-license-key-here

Use whatever env-var prefix your framework requires: VITE_* for Vite, NEXT_PUBLIC_* for Next.js, PUBLIC_* for SvelteKit/Astro, etc. Keys are loaded in the browser, so anything you expose to the client is fine.

That's the full setup. Now you can wire it up.

Quick start

import {
  createAirpointPlugin,
  createAirpointCursorOverlay,
  createAirpointDomAdapter,
} from "airpoint-sdk";

const video = document.querySelector("video")!;
const cursor = createAirpointCursorOverlay({ style: "arrow" });

const apiKey = import.meta.env.VITE_AIRPOINT_API_KEY; // or process.env.NEXT_PUBLIC_AIRPOINT_API_KEY, etc.

const plugin = createAirpointPlugin({
  apiKey, // optional β€” enables AirMouse if present
  video,
  adapter: createAirpointDomAdapter(),
  manifest: {
    runtime: { assets: { basePath: "/airpoint" } },
    tracking: {
      config: {
        enableMLClassifier: Boolean(apiKey),
        gestureModel: "airmouse-4.3-onnx",
      },
    },
    intents: {
      thumb_middle_pinch: { tap: "primary-select" },
    },
  },
});

plugin.on("move", (e) => {
  cursor.move(e.x, e.y, {
    space: "normalized",
    clicking: e.clicking,
    grabbing: e.grabbing,
    rightClicking: e.rightClicking,
    hand: e.hand,
  });
});

plugin.on("hand_lost", () => cursor.hide());

// Recommended: warm assets and the gesture engine as soon as your app loads.
// This keeps the user's first "enable tracking" click fast.
void plugin.prepare().catch((error) => {
  console.warn("Airpoint prepare failed:", error);
});

async function enableAirpoint() {
  await plugin.startCamera(video);
  await plugin.start();
}

function disableAirpoint() {
  plugin.pause(); // keep loaded models warm for the next enable
  plugin.stopCamera();
  cursor.hide();
}

// Wire enableAirpoint/disableAirpoint to your app's touchless toggle.

That's it. Show your hand to the camera, the cursor follows your fingertip, and a thumb-to-middle pinch clicks whatever's under it.

If start() can't find an asset, it throws with the exact missing path and the copy-assets command to run β€” no silent failures.

Enabling the AirMouse ML model

Three things need to be true:

  1. You have a license key in your env (see Setup step 3).
  2. You pass the key as apiKey when creating the plugin.
  3. Your manifest has tracking.config.enableMLClassifier: true and gestureModel: "airmouse-4.3-onnx".

The Quick start above already does all three. Without the key, the plugin falls back to the heuristic engine β€” same API, lower accuracy, and no need to set enableMLClassifier.

How it works

Webcam ─▢ MediaPipe hand tracker ─▢ Gesture engine ─▢ Plugin events ─▢ Adapter (DOM, your code)
                                          β”‚
                              (optional AirMouse classifier, with key)
  • Tracker β€” MediaPipe runs on-device and produces 21 hand landmarks per frame.
  • Gesture engine β€” Built-in heuristics or AirMouse turn landmarks into pinches, grabs, scrolls, and a moving cursor.
  • Manifest β€” You declare which gestures map to which intents (tap, dispatch_event, focus, …) and which targets they hit.
  • Adapter β€” The bridge between intents and your app. The bundled DOM adapter turns taps into real DOM clicks.

Lifecycle:

  • plugin.prepare() β€” preload assets and warm the engine before the user starts. Recommended on app startup.
  • plugin.startCamera(video) + plugin.start() β€” use when the user enables touchless tracking.
  • plugin.pause() + plugin.stopCamera() β€” use for in-app toggles; processing and camera stop, but loaded models stay warm.
  • plugin.stop() β€” full teardown. Reserve it for unmount, logout, or permanent disable because it unloads warmed state.

Recipes

DOM adapter β€” non-click actions

createAirpointDomAdapter({
  actions: {
    "open-menu": "dispatch_event",
    "focus-search": "focus",
  },
});

Force intents to act on their declared manifest target instead of whatever's under the cursor:

createAirpointDomAdapter({ pointerTarget: "intent" });

Cursor click animation

createAirpointCursorOverlay() ships with a built-in pulse. Forward click state from move events and it animates automatically:

const cursor = createAirpointCursorOverlay({
  style: "arrow",
  clickAnimation: "pulse",
});

plugin.on("intent", () => cursor.pulse()); // for app-defined intents

Use clickAnimation: "none" to handle feedback yourself.

Custom gestures from raw landmarks

Disable the classifier and listen for raw landmarks to build your own pose/dwell/swipe logic:

const plugin = createAirpointPlugin({
  video,
  manifest: {
    runtime: {
      emitRawLandmarks: true,
      assets: { basePath: "/airpoint" },
    },
    tracking: { config: { enableMLClassifier: false } },
  },
});

plugin.on("raw_landmarks", (event) => {
  // your pinch / dwell / swipe logic
});

The built-in intents map is driven by SDK pose events. Plugging a custom recognizer directly into that pipeline isn't a stable public API yet β€” for now, custom heuristics live in your app code on top of raw_landmarks and move.

SSR / Next.js

Camera, DOM, and MediaPipe all need browser APIs. Create the plugin only in client-side code (a useEffect, a dynamic import, etc.).

API reference

Stable v0 surface β€” won't break in patch/minor releases:

Export Purpose
createAirpointPlugin(options) Main entry. Wires a video, manifest, and adapter into a running plugin.
createAirpointCursorOverlay(options) Prebuilt cursor with click animations.
createAirpointDomAdapter(options) Framework-agnostic DOM adapter.
validateAirpointManifest(manifest) Throws on invalid manifests. Useful in tests.
normalizeAirpointManifest(manifest) Fills in defaults; returns the resolved manifest.
resolveAirpointSdkAssetPaths(assets) Resolves the full set of runtime asset URLs.
getAirpointSdkRequiredAssets(assets, profile) Lists assets required for a given profile.
validateAirpointSdkAssets(assets, profile) Verifies assets are reachable.

Types: AirpointPlugin, AirpointPluginManifest, AirpointHostAdapter, AirpointIntent.

About AirMouse

If you've ever tried to write your own gesture detection on top of hand landmarks, you know how it goes: a pinch threshold that works for your hand but not your coworker's, a "click" that fires when someone scratches their nose, distance heuristics that fall apart the moment the hand tilts. It's a lot of trial and error, and the result is usually still flaky.

AirMouse is the model we built so you don't have to do that.

It's a temporal convolutional network (TCN) trained on a hand-collected, hand-labeled dataset of pinches, clicks, grabs, scrolls, and idle motion across many hands, lighting conditions, and camera angles.

Metric airmouse-4.3-onnx
Test accuracy 97.73%
Inference (ONNX, in-browser) ~1–2 ms / frame
Gesture classes idle, click, right_click, grab, scroll
Runtime ONNX Runtime Web (WASM, on-device)

Runs locally β€” no frames leave the user's machine.

Licenses are how the model and the rest of Airpoint stay maintained. Grab one at airpoint.app, or reach out if you're a student, researcher, or OSS maintainer.

Contributing

PRs and issues welcome. The repo is a small pnpm workspace.

pnpm install
pnpm typecheck
pnpm test
pnpm build
pnpm dev:example   # runs examples/basic

The example app lives in examples/basic. Copy .env.example to .env.local and set VITE_AIRPOINT_API_KEY to try AirMouse; without a key it uses the heuristic engine.

Questions? hello@airpoint.app.

License

Apache-2.0. MediaPipe and ONNX Runtime browser assets are covered by their upstream licenses β€” see NOTICES.md.

The AirMouse model is not part of the OSS package and is delivered separately under its own terms.

About

Touchless controls plugin for any application πŸ–₯οΈπŸ‘†πŸ»

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors