ESP32-CAM RC Car — Hand-Gesture Controlled (Team Project)

5/15/20251 min read

With a teammate, we built one gesture-recognition app that could control two different devices: my wireless RC car and his bionic hand. I focused on the car: live video from an ESP32-CAM, servo steering + motor control, and a Python app that translated MediaPipe hand gestures into HTTP commands. It was fast, surprisingly reliable, and a nice proof that one software pipeline can drive multiple hardware targets.

Team & Goal

  • Team: 2 people

  • Goal: Use one gesture-recognition app to control two separate hardware systems, an RC car and a 10-servo bionic hand—showing the approach scales across devices.

My Role

  • Build the car

  • ESP32-CAM firmware (video streaming + motor/servo control endpoints)

  • Python control app & UI, networking and test tooling

  • Worked with my teammate to keep the gesture → action schema consistent for both devices

How it works

  • Webcam → Python app (MediaPipe Hands) counts visible fingers

  • Gesture map: e.g., Fist = stop, One finger = forward, Two = reverse, Three/Four = left/right, Open palm = center/neutral

  • App sends HTTP commands to the ESP32-CAM (/control?dir=w/a/s/d/x/c)

  • Tkinter UI shows two live panels: the ESP32 video stream and the gesture overlay

  • Small “freeze window” after each command to avoid rapid, repeated inputs

Car_software.py used MediaPipe, OpenCV, Tkinter, and requests; it rotated/ flipped frames for correct orientation and debounced commands with a short cooldown handy for stability.

Hardware & Stack

  • ESP32-CAM (camera + Wi-Fi), motor driver, micro-servo for steering

  • Chassis: simple, robust build (3D-printed mounts, TT motors)

  • Firmware: tiny HTTP handler for drive/steer commands + MJPEG stream

  • Control app: Python (OpenCV, MediaPipe, Tkinter UI)

Challenges → Solutions

  • Noisy gesture detection: Added debounce + cooldown and mapped only distinct finger-count changes to commands.

  • Network hiccups: On failed frames, the UI kept the last good image, so the operator wasn’t “flying blind.”

  • Human factors: Tuned steering angles and motor PWM so it felt smooth, not twitchy.

Teamwork Wins

  • We agreed on a shared gesture schema and kept a simple API contract so the same app could drive both devices.

  • Paired testing: while I tuned car control, my teammate matched servo motions on the hand great practice in versioning, quick feedback, and clear commit messages.

Outcome

  • Reliable demo: smooth live video + responsive gesture control

  • Reusability: the same app controlled two hardware systems with zero code forks

  • What I’d improve: tracked-gesture “trails” for speed control, obstacle sensors, and a safer “arm/disarm” state machine