Building Heritage Keeper: A Gemini Live Agent for Family Story Preservation
These articles are AI-generated summaries. Please check the original sources for full details.
Building Heritage Keeper: A Gemini Live Agent for Family Story Preservation
Heritage Keeper is a voice-first AI agent built on the Gemini Live API that processes PCM 16-bit audio at 16kHz to preserve oral histories. The system autonomously coordinates five function-calling tools and Google Search grounding to verify historical facts in real-time.
Why This Matters
Technical reality dictates that preserving family history often fails due to the high friction of manual data entry in traditional genealogy software. Heritage Keeper addresses this by moving from rigid forms to a bidirectional audio session where the AI manages state and context extraction, though developers must implement custom filtering for the model’s internal reasoning parts to maintain a clean user experience.
Key Insights
- The gemini-2.5-flash-native-audio model includes internal reasoning ‘thought’ parts in its responses that must be filtered before forwarding to the user interface.
- Grounding AI responses with Google Search transforms historical context from speculative trivia into verifiable data such as cost of living and historical wage comparisons.
- WebSocket stability in Cloud Run environments requires exponential backoff reconnection strategies (1s, 2s, 4s) to handle network blips and timeouts.
- The agent utilizes five specific tools including save_story and search_photos to autonomously extract names, dates, and relationships from streaming audio.
- Browser-side audio capture at 16kHz PCM 16-bit is required for the bidirectional session, while Gemini responds with native 24kHz audio.
Practical Applications
- Use Case: Building complex family trees via natural voice commands like ‘Bob is my father’ to trigger the add_family_member function tool. Pitfall: Failing to provide specific instructions for short commands may cause the agent to incorrectly attempt full story extraction.
- Use Case: Automated historical photo retrieval using the Wikimedia Commons API with bitmap-only filtering for timeline entries. Pitfall: Neglecting to handle varied SDK message formats (LiveServerMessage vs JSON) can cause parser crashes during audio streaming.
References:
Continue reading
Next article
CVS Health Partners with Google Cloud for Health100 Consumer Platform
Related Content
Building a 40-Nanosecond Pharmacogenomic Agent with C++23 and Gemini Live
PharmaShield uses C++23 and Gemini Live to prevent adverse drug reactions with 40-nanosecond deterministic lookups and real-time phenoconversion.
Memoo: Scaling Browser Automation with Gemini Multimodal Vision and Voice
Memoo uses Gemini 2.0 Flash to transform manual browser workflows into reusable playbooks with real-time vision and voice guidance.
NadirClaw: Building Cost-Aware LLM Routing with Local Prompt Classification
NadirClaw introduces an intelligent local routing layer that classifies prompts into simple and complex tiers, enabling dynamic switching between Gemini Flash and Pro to reduce LLM costs by up to 50%.