Hey builders! 🛑 Stop typing, and start interacting! We are moving beyond the text box. The future isn't about just chatting with AI—it's about immersive, real-time experiences. To celebrate the power of multimodal AI, we’re challenging you to build the next generation of agents that can help you see 🙈, hear 🙉, speak 🙊, and create in the Gemini Live Agent Challenge.
Requirements
What to Build
Entrants must develop a NEW next-generation AI Agent that utilizes multimodal inputs and outputs and moves beyond simple text-in/text-out interactions. Projects should leverage Google’s Live API with the creative power of video/image generation to solve complex problems or create entirely new user experiences within one of these three categories:
-
Live Agents 🗣️
-
Focus: Real-time Interaction (Audio/Vision).
-
Build an agent that users can talk to naturally can be interrupted. This could be a real-time translator, a vision-enabled customized tutor that "sees" your homework, or a customer support voice agent that handles interruptions gracefully.
-
Mandatory Tech: Must use Gemini Live API or the use of ADK. The agents are hosted on Google Cloud.
-
Creative Storyteller ✍️
-
Focus: Multimodal Storytelling with Interleaved Output
-
Build an agent that thinks and creates like a creative director, seamlessly weaving together text, images, audio, and video in a single, fluid output stream. Leverage Gemini's native interleaved output to generate rich, mixed-media responses that combine narration with visuals, explanations with generated imagery, or storyboards with voiceover, all in one cohesive flow. Examples include Interactive storybooks (text + generated illustrations inline), marketing asset generator (copy + visuals + video in one go), educational explainers (narration woven with diagrams), and social content creator (caption + image + hashtags together).
-
Mandatory Tech: Must use Gemini's interleaved/mixed output capabilities. The agents are hosted on Google Cloud.
-
UI Navigator ☸️
-
Focus: Visual UI Understanding & Interaction
-
Build an agent that becomes the user's hands on screen. The agent observes the browser or device display, interprets visual elements with or without relying on APIs or DOM access, and performs actions based on user intent. Examples include a universal web navigator, a cross-application workflow automator, or a visual QA testing agent.
-
Mandatory Tech: Must use Gemini multimodal to interpret screenshots/screen recordings and output executable actions. The agents are hosted on Google Cloud.
All projects MUST:
-
Leverage a Gemini model
-
Agents must be built using either Google GenAI SDK OR ADK (Agent Development Kit)
-
Use at least one Google Cloud service
What to Submit
- 📃 Text Description: Summary of the Project’s features and functionality, technologies used, information about any other data sources used, and your findings and learnings as you worked through the project.
- 👨💻 URL to your Public Code Repository: Let us see how you built it!
- Include spin-up instructions in your README for the judges to see your project is reproducible
- 🖥️ Proof of Google Cloud Deployment: You must demonstrate that the backend is running on Google Cloud with a short recording (separate from your demo) proving your Project’s backend is running on Google Cloud. Proof would either be (1) a quick screen recording that shows the behind-the-scenes of their app running on GCP (e.g. console logs or console view of a deployment) or (2) a link to a code file in their code repo that demonstrates use of Google Cloud services and APIs (e.g. API calls to Vertex AI endpoints)
- 🏗️ Architecture Diagram: A clear visual representation of your system (e.g., how Gemini connects to your backend, database, and frontend)
- Pro tip: Add this to th file upload or image carousel so it's easy for judges to find!
- 📹 Demonstration Video:
- <4-minute video
-
Demos your multimodal/agentic features working in real-time (no mockups)
-
Pitches your project: what problem did you solve and what value does your solution bring?
-
- <4-minute video
For Bonus Points, optionally you can do one or all of the following:
-
Publish a piece of content (blog, podcast, video) covering how the project was built with Google AI models and Google Cloud. You must include language that says you created the piece of content for the purposes of entering this hackathon. When sharing on social media, use the hashtag #GeminiLiveAgentChallenge.
- Prove you automated your Cloud Deployment using scripts or infrastructure-as-code tools. This code must be included in your public repository.
- Sign up for a Google Developer Group and provide a link to your public GDG profile
Prizes
Grand Prize
• $25,000 in USD
• $3,000 in Google Cloud Credits for use with a Cloud Billing Account
• Virtual Coffee with a Google Team Member
• Social Promo
• Maximum of two (2) Google Cloud Next 2026 conference tickets for two (2) teammates (April 22-24, 2026) (Value: $2,299 each)
• Maximum of two (2) travel stipends for airfare and hotel to Google Cloud Next 2026 in Las Vegas, NV for two (2) teammates (maximum of $3,000 USD each)
• Opportunity to demo your Project in a Google Cloud Next 2026 presentation (additional requirements in Official Rules)
Best of Live Agents
• $10,000 in USD
• $1,000 in Google Cloud Credits for use with a Cloud Billing Account
Virtual Coffee with a Google Team Member
• Social Promo
• Maximum of two (2) Google Cloud Next 2026 conference tickets for two (2) teammates (April 22-24, 2026) (Value: $2,299 each)
Best of Creative Storytellers
• $10,000 in USD
• $1,000 in Google Cloud Credits for use with a Cloud Billing Account
Virtual Coffee with a Google Team Member
• Social Promo
• Maximum of two (2) Google Cloud Next 2026 conference tickets for two (2) teammates (April 22-24, 2026) (Value: $2,299 each)
Best of UI Navigators
• $10,000 in USD
• $1,000 in Google Cloud Credits for use with a Cloud Billing Account
Virtual Coffee with a Google Team Member
• Social Promo
• Maximum of two (2) Google Cloud Next 2026 conference tickets for two (2) teammates (April 22-24, 2026) (Value: $2,299 each)
Best Multimodal Integration & User Experience
• $5,000 in USD
• $500 in Google Cloud Credits for use with a Cloud Billing Account
Best Technical Execution & Agent Architecture
• $5,000 in USD
• $500 in Google Cloud Credits for use with a Cloud Billing Account
Best Innovation & Thought Leadership
• $5,000 in USD
• $500 in Google Cloud Credits for use with a Cloud Billing Account
Honorable Mentions
• $2,000 in USD
• $500 in Google Cloud Credits for use with a Cloud Billing AccountCloud Billing Account
Devpost Achievements
Submitting to this hackathon could earn you:
<%= name %>
<% if (has_levels) { %> level <%= next_level %> <% } %>Judges
TBD
Judging Criteria
-
Innovation & Multimodal User Experience (40%)
Does the project break the "text box" paradigm? Does the agent help "See, Hear, and Speak" in a way that feels seamless? Does it have a distinct persona/voice? Is the experience "Live" and context-aware, or does it feel disjointed and turn-based? -
Technical Implementation & Agent Architecture (30%)
Does the code effectively utilize the Google GenAI SDK or ADK? Is the backend robustly hosted on Google Cloud? Is the agent logic sound? Does it handle errors gracefully? Does the agent avoid hallucinations? Is there evidence of grounding? -
Demo & Presentation (30%)
Does the video define the problem and solution? Is the architecture diagram clear? Is there visual proof of Cloud deployment? Does the video show the actual software working? View Full Rules for Details
