Skip to main content

On This Page

Building Aura: Engineering a Real-Time AI Pitch Mentor with Google Gemini

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Building Aura: What We Learned Building a Real-Time AI Mentor

Aura is a real-time AI-powered pitch mentor developed for the Google Gemini Live Agent Challenge. The system utilizes MediaPipe for frame-by-frame body language tracking and Gemini for high-level content analysis.

Why This Matters

Technical implementations of behavioral AI often fail when using hard-coded thresholds for human movement, as physical baselines vary significantly between users. Aura addresses this by implementing gesture-driven calibration, ensuring that metrics like ‘Neck Ratio’ and ‘Shoulder Expansion’ are personalized to the user’s anatomy rather than a generic, often inaccurate, model.

Key Insights

  • Gesture-Driven Calibration: Aura uses a ‘thumbs up’ gesture to capture a personalized baseline, preventing errors in posture detection for users of different heights.
  • Stable Metric View: Developers refactored custom React hooks to freeze the last known data point during pauses, preventing battery waste and data loss.
  • AudioContext Synchronization: The team resolved 0.0000 RMS amplitude reports by ensuring synchronous permission handling for both video and audio streams.
  • MediaPipe Integration: The system tracks granular metrics using Face, Pose, and Gesture Recognizers to quantify ‘shrimp’ (kyphotic) posture in real-time.
  • Persona-Based Logic: Gemini’s reasoning capabilities were leveraged to create a ‘Shark’ coaching persona that processes data packets for brutal content analysis.

Practical Applications

  • Use Case: Personalized posture monitoring for remote presenters using custom ‘Neck Ratio’ metrics. Pitfall: Relying on universal constants for posture leads to false positives for taller or shorter users.
  • Use Case: High-stakes pitch training with low-latency feedback via the Aura CI design system. Pitfall: Clunky UI in high-stress environments increases user anxiety and degrades performance.
  • Use Case: Real-time audio amplitude monitoring for public speakers. Pitfall: Async race conditions in browser permissions can cause microphone inputs to fail silently.

References:

Continue reading

Next article

Building Heritage Keeper: A Gemini Live Agent for Family Story Preservation

Related Content