Real-time hand, face, and body pose tracking via webcam — powered by Google MediaPipe Vision
The MediaPipe module brings computer-vision tracking to Nova64 carts. Track hands, faces, and body poses using your webcam, then use the landmark data to drive 3D scenes — perfect for AR filters, gesture controls, motion capture, and interactive installations.
init*Tracking(). Carts that don't use tracking pay nothing.getHandLandmarks() etc. in your update() to read the latest results.showCameraBackground() sets the webcam feed as your scene background for an AR passthrough effect.getUserMedia only works over HTTPS or localhost.Starts the webcam and initializes the MediaPipe Hand Landmarker. Begins detecting hands immediately. This function is async — the model download may take a few seconds on first call.
'user' (default) or 'environment'
'GPU' (default) or 'CPU'
export async function init() {
await initHandTracking({ numHands: 2 });
showCameraBackground(); // AR passthrough
}
Returns the latest detected hand landmarks. Each hand has 21 landmark points with normalized coordinates (0–1 range, relative to the video frame).
| Index | Landmark | Index | Landmark |
|---|---|---|---|
| 0 | Wrist | 11 | Middle finger PIP |
| 1 | Thumb CMC | 12 | Middle finger tip |
| 2 | Thumb MCP | 13 | Ring finger MCP |
| 3 | Thumb IP | 14 | Ring finger PIP |
| 4 | Thumb tip | 15 | Ring finger DIP |
| 5 | Index finger MCP | 16 | Ring finger tip |
| 6 | Index finger PIP | 17 | Pinky MCP |
| 7 | Index finger DIP | 18 | Pinky PIP |
| 8 | Index finger tip | 19 | Pinky DIP |
| 9 | Middle finger MCP | 20 | Pinky tip |
| 10 | Middle finger DIP |
export function update(dt) {
const hands = getHandLandmarks();
if (hands.length === 0) return;
const thumb = hands[0][4]; // thumb tip
const index = hands[0][8]; // index tip
const dx = thumb.x - index.x;
const dy = thumb.y - index.y;
const dist = Math.sqrt(dx * dx + dy * dy);
if (dist < 0.04) {
// Pinch detected!
sfx('coin');
}
}
Returns the recognized gesture for each detected hand. Uses MediaPipe's built-in gesture classifier.
null if no gesture recognized
| Gesture | Description |
|---|---|
None | No recognized gesture |
Closed_Fist | All fingers curled into fist |
Open_Palm | All fingers extended, palm facing camera |
Pointing_Up | Index finger pointing upward |
Thumb_Down | Thumbs down gesture |
Thumb_Up | Thumbs up gesture |
Victory | Peace / victory sign (index + middle fingers) |
ILoveYou | ASL "I love you" (thumb + index + pinky extended) |
const gestures = getHandGesture();
if (gestures[0]?.name === 'Closed_Fist') {
// Fist = attack!
triggerAttack();
} else if (gestures[0]?.name === 'Open_Palm') {
// Open palm = shield
activateShield();
}
Starts the webcam and initializes the MediaPipe Face Landmarker with blend shape output. Detects face mesh (478 landmarks) and expression blend shapes.
'user' (default) or 'environment'
'GPU' (default) or 'CPU'
export async function init() {
await initFaceTracking({ numFaces: 1 });
showCameraBackground(); // webcam behind the 3D scene
}
Returns the 478-point face mesh for each detected face. Landmarks are normalized coordinates (0–1) relative to the video frame.
let hat;
export async function init() {
await initFaceTracking();
showCameraBackground();
hat = createCone(0.3, 0.8, 0xff0000, [0, 0, 0]);
}
export function update(dt) {
const faces = getFaceLandmarks();
if (faces.length === 0) return;
// Place hat above forehead (landmark 10)
const forehead = faces[0][10];
const x = (forehead.x - 0.5) * 8;
const y = -(forehead.y - 0.5) * 6 + 0.6;
setPosition(hat, x, y, -3);
}
Returns blend shape coefficients for each detected face — 52 expression values (0–1) for driving avatar animation or expression detection.
| Category | Blend Shapes |
|---|---|
| Eyes | eyeBlinkLeft, eyeBlinkRight, eyeWideLeft, eyeWideRight, eyeSquintLeft, eyeSquintRight |
| Brows | browDownLeft, browDownRight, browInnerUp, browOuterUpLeft, browOuterUpRight |
| Mouth | mouthSmileLeft, mouthSmileRight, mouthFrownLeft, mouthFrownRight, mouthOpen, jawOpen |
| Nose / Cheek | noseSneerLeft, noseSneerRight, cheekPuff, cheekSquintLeft |
const shapes = getFaceBlendShapes();
if (shapes.length > 0) {
const bs = shapes[0].categories;
const smile = bs.find(b => b.categoryName === 'mouthSmileLeft');
if (smile?.score > 0.5) {
print('😊 Smiling!', 10, 10, 0x00ff00);
}
}
Starts the webcam and initializes the MediaPipe Pose Landmarker. Detects 33 body keypoints for full-body pose estimation.
'user' (default) or 'environment'
'GPU' (default) or 'CPU'
Returns 33 body keypoints per detected person. Coordinates are normalized (0–1).
| Index | Landmark | Index | Landmark |
|---|---|---|---|
| 0 | Nose | 17 | Left pinky |
| 1–3 | Left eye (inner, center, outer) | 18 | Right pinky |
| 4–6 | Right eye (inner, center, outer) | 19–20 | Left / right index |
| 7–8 | Left / right ear | 21–22 | Left / right thumb |
| 9–10 | Left / right mouth | 23–24 | Left / right hip |
| 11–12 | Left / right shoulder | 25–26 | Left / right knee |
| 13–14 | Left / right elbow | 27–28 | Left / right ankle |
| 15–16 | Left / right wrist | 29–32 | Left / right heel + foot index |
let head, bodyMesh, leftHand, rightHand;
export async function init() {
await initPoseTracking();
showCameraBackground();
head = createSphere(0.3, 0xff8800, [0, 0, -3]);
leftHand = createSphere(0.15, 0x00ff88, [0, 0, -3]);
rightHand = createSphere(0.15, 0x00ff88, [0, 0, -3]);
}
export function update(dt) {
const poses = getPoseLandmarks();
if (poses.length === 0) return;
const p = poses[0];
// Map nose to head position
setPosition(head,
(p[0].x - 0.5) * 8,
-(p[0].y - 0.5) * 6,
-3
);
// Map wrists to hand spheres
setPosition(leftHand, (p[15].x - 0.5) * 8, -(p[15].y - 0.5) * 6, -3);
setPosition(rightHand, (p[16].x - 0.5) * 8, -(p[16].y - 0.5) * 6, -3);
}
Starts the webcam. Called automatically by init*Tracking(), but can be called independently to preview the camera feed without tracking.
'user' (front camera, default) or 'environment' (rear camera)
Stops the webcam, releases the media stream, and disposes the video texture. Also calls hideCameraBackground().
Returns a THREE.VideoTexture of the webcam feed. Useful for mapping the camera onto 3D objects (e.g., a TV screen, a mirror, a portal).
await startCamera();
const tex = getCameraTexture();
// Apply to a plane as a "TV screen"
tv.material.map = tex;
Sets the webcam feed as the scene's background, creating an AR passthrough effect. 3D objects appear overlaid on the real world.
Removes the webcam background and restores the previous scene background (skybox, color, etc.).
Complete cleanup — closes all landmark detectors, stops the detection loop, stops the camera, and clears all cached results. Call this on game exit or cart reload.
A full example combining hand tracking with AR camera background, pinch gestures, and 3D particle effects.
let cursor, particles = [];
export async function init() {
await initHandTracking({ numHands: 2 });
showCameraBackground();
cursor = createSphere(0.15, 0xff00ff, [0, 0, -3],
{ material: 'holographic' });
setAmbientLight(0xffffff, 0.8);
}
export function update(dt) {
const hands = getHandLandmarks();
if (hands.length === 0) return;
// Move cursor to index finger tip
const tip = hands[0][8];
const x = (tip.x - 0.5) * 8;
const y = -(tip.y - 0.5) * 6;
setPosition(cursor, x, y, -3);
// Pinch = spawn particle
const thumb = hands[0][4];
const dist = Math.hypot(thumb.x - tip.x, thumb.y - tip.y);
if (dist < 0.04) {
particles.push({
mesh: createSphere(0.08, Math.random() * 0xffffff, [x, y, -3]),
life: 2,
});
}
// Fade and clean up particles
for (let i = particles.length - 1; i >= 0; i--) {
particles[i].life -= dt;
if (particles[i].life <= 0) {
removeMesh(particles[i].mesh);
particles.splice(i, 1);
}
}
}
export function draw() {
print('AR Hand Tracking', 10, 10, 0x00ffcc);
const g = getHandGesture();
if (g[0]) print(`Gesture: ${g[0].name}`, 10, 30, 0xffffff);
}