Skip to content

LLM

Building Voice Applications with ElevenLabs Agents

Voice-based AI applications represent the next frontier in customer service training, offering a more natural and effective alternative to traditional methods. They create consistent, risk-free practice environments where trainees can master complex interactions before facing real customers.

To build great user experiences with these voice agents, there are three critical design principles to keep in mind

  • Thoughtful Context Design: Creating authentic scenarios using real-world data like actual menus and service protocols
  • Strategic Randomization: Implementing controlled variability that builds adaptability rather than rote responses
  • Skill-Targeted Scenarios: Focusing on specific competencies rather than general conversation abilities

This article unpacks the practical lessons learned while building a voice-based training solution with ElevenLabs's new conversational AI that enables flight attendants to perfect premium cabin service interactions—without risking passenger satisfaction or airline reputation during the learning process.

Why are voice agents hard to build?

Voice interfaces feel inevitable: they promise hands‑free, universally accessible computing that matches the cadence of ordinary conversation. Yet building production‑grade voice agents remains stubbornly difficult.

This comes in the form of three specific constraints that make building these interfaces challenging

  1. The strict 500ms response window to keep responses feeling natural that creates fundamental UX limitations
  2. Complex end-to-end latency optimization challenges across the entire processing pipeline
  3. The lack of established UX patterns for different conversational contexts

MCPs are really LLM microservices

Language Model applications have a fundamental problem: they need better ways to access tools and services. Developers currently spend hours coding custom integrations, maintaining authentication flows, and defining complex schemas for each external service. This creates bottlenecks that limit what AI systems can actually do for users.

Anthropic's Model Context Protocol (MCPs) offers a potential solution by providing a standardized way for LLMs to discover and use tools dynamically. Think of MCPs as an API specification for AI microservices - they define how AI systems can find, call, and combine different tools without requiring developers to hardcode every possible interaction.

In this article, I'll explore what makes MCPs promising, the challenges they solve, and what's still missing for them to move towards become production-ready. This largely serves as some of my own thoughts after chatting with people about them over the past week or so, I'd love to know if you think differently.