Index

April 22, 2025
in Voice, LLM, Deployment
6 min read

Building Voice Applications with ElevenLabs Agents

Voice-based AI applications represent the next frontier in customer service training, offering a more natural and effective alternative to traditional methods. They create consistent, risk-free practice environments where trainees can master complex interactions before facing real customers.

To build great user experiences with these voice agents, there are three critical design principles to keep in mind

Thoughtful Context Design: Creating authentic scenarios using real-world data like actual menus and service protocols
Strategic Randomization: Implementing controlled variability that builds adaptability rather than rote responses
Skill-Targeted Scenarios: Focusing on specific competencies rather than general conversation abilities

This article unpacks the practical lessons learned while building a voice-based training solution with ElevenLabs's new conversational AI that enables flight attendants to perfect premium cabin service interactions—without risking passenger satisfaction or airline reputation during the learning process.

April 20, 2025
in Voice, LLM, Deployment
3 min read

Why are voice agents hard to build?

Voice interfaces feel inevitable: they promise hands‑free, universally accessible computing that matches the cadence of ordinary conversation. Yet building production‑grade voice agents remains stubbornly difficult.

This comes in the form of three specific constraints that make building these interfaces challenging

The strict 500ms response window to keep responses feeling natural that creates fundamental UX limitations
Complex end-to-end latency optimization challenges across the entire processing pipeline
The lack of established UX patterns for different conversational contexts

March 25, 2025
in UI/UX, LLMs
5 min read

Use streaming UIs and customers are willing to wait 5x longer

If you're not building your application with streaming in mind, you're making two major mistakes

You're going to have to spend months refactoring your code to adapt to streaming once you make this decision
You're missing out on a major opportunity to improve the user experience.

I've built a fair bit of user interaces that rely on streaming content with tools like the ai-sdk by vercel and I've found that there are three main considerations to think about when building out your application.

March 8, 2025
in MCPs, LLM, Trends
6 min read

MCPs are really LLM microservices

Language Model applications have a fundamental problem: they need better ways to access tools and services. Developers currently spend hours coding custom integrations, maintaining authentication flows, and defining complex schemas for each external service. This creates bottlenecks that limit what AI systems can actually do for users.

Anthropic's Model Context Protocol (MCPs) offers a potential solution by providing a standardized way for LLMs to discover and use tools dynamically. Think of MCPs as an API specification for AI microservices - they define how AI systems can find, call, and combine different tools without requiring developers to hardcode every possible interaction.

In this article, I'll explore what makes MCPs promising, the challenges they solve, and what's still missing for them to move towards become production-ready. This largely serves as some of my own thoughts after chatting with people about them over the past week or so, I'd love to know if you think differently.

January 27, 2025
in Evals, Instructor, Synthetic Data
8 min read

Why Structured Outputs matter for LLM Applications in 2025

I gave a short talk at NUS in January 2025 about structured outputs and how they enable faster iteration and testing when building language models. I've written up a more detailed version of the talk here as well as provided the slides below.

LLM applications in 2025 face a unique challenge: while they enable rapid deployment compared to traditional ML systems, they also introduce new risks around reliability and safety.

In this article, I'll explain why structured outputs remain crucial for building robust LLM applications, and how they enable faster iteration and testing.

January 4, 2025
in Clustering, LLMs
15 min read

Using Language Models to make sense of Chat Data without compromising user privacy

If you're interested in the code for this article, you can find it here where I've implemented a simplified version of CLIO without the PII classifier and most of the original prompts ( to some degree ).

Analysing chat data at scale is a challenging task for 3 main reasons

Privacy - Users don't want their data to be shared with others and we need to respect that. This makes it challenging to do analysis on user data that's specific
Explainability - Unsupervised clustering methods are sometimes difficult to interpret because we don't have a good way to understand what the clusters mean.
Scale - We need to be able to process large amounts of data efficiently.

An ideal solution allows us to understand broad general trends and patterns in user behaviour while preserving user privacy. In this article, we'll explore an approach that addresses this challenge - Claude Language Insights and Observability ( CLIO ) which was recently discussed in a research paper released by Anthropic.

We'll do so in 3 steps

We'll start by understanding on a high level how CLIO works
We'll then implement a simplified version of CLIO in Python
We'll then discuss some of the clusters that we generated and some of the limitations of such an approach

Let's walk through these concepts in detail.

January 4, 2025
in AI, Personal
5 min read

How I Use Claude

I've been a heavy user of Claude for the past few months and anecdotally, ever since Sonnet 3.6, I've been using it more and more.

I was kind of curious to see how I use it on a day to day basis and so when I realised I could export my claude chat history, I thought I'd try to do some analysis on it.

I'll write a separate post on how I did the analysis but I thought I'd share some of the results here. Here is a guide on how to export your Claude chat history.

I was inspired by this post by Boretti Fernando and thought I'd try to do something similar.

December 29, 2024
in LLMs, Advice
11 min read

Getting Started with Language Models in 2025

After a year of building AI applications and contributing to projects like Instructor, I've found that getting started with language models is simpler than most people think. You don't need a deep learning background or months of preparation - just a practical approach to learning and building.

Here are three effective ways to get started (and you can pursue all of them at once):

Daily Usage: Put Claude, ChatGPT, or other LLMs to work in your daily tasks. Use them for debugging, code reviews, planning - anything. This gives you immediate value while building intuition for what these models can and can't do well.
Focusing on Implementation: Start with Instructor and basic APIs. Build something simple that solves a real problem, even if it's just a classifier or text analyzer. The goal is getting hands-on experience with development patterns that actually work in production.
Understand the Tech: Write basic evaluations for your specific use cases. Generate synthetic data to test edge cases. Read papers that explain the behaviors you're seeing in practice. This deeper understanding makes you better at both using and building with these tools.

You should and will be able to do all of these at once. Remember that the goal isn't expertise but to discover which aspect of the space you're most interested in.

There's a tremendous amount of possible directions to work on - dataset curation, model architecture, hardware optimisation, etc and other exiciting directions such as Post Transformer Architectures and Multimodal Models that are happening all at the same time.

December 26, 2024
in Career, Machine Learning, Personal Development
4 min read

What Happened in 2024

2024 has been a year of remarkable transformation. Just two and a half years out of college, I went from feeling uncertain about my path in software engineering to finding my stride in machine learning engineering. It's been a journey of pushing boundaries – improving my health, contributing to open source, and diving deeper into research.

The year has felt like a constant acceleration, especially in the last six months, where everything from technical growth to personal development seemed to shift into high gear.

Four achievements stand out from this transformative year:

Helped grow instructor from ~300k downloads to 1.1M downloads this year as core contributor
Quit my job as a swe and started working full time with llms
Got into better shape, lost about 6kg total and total cholesterol dropped by 32% w lifestyle changes
Delivered a total of 4 technical talks this year for the first time

December 22, 2024
in Modal, Diffusion Models, ComfyUI
6 min read

A Weekend of Text to Image/Video Models

You can find the code for this post here.

I had a lot of fun playing around with text to image models over the weekend and thought I'd write a short blog post about some of the things I learnt. I ran all of this on Modal and spent ~10 USD across the entire weekend which is honestly well below the Modal $20 free tier credit.

This was mainly for a small project i've been working on called CYOA where users get to create their own stories and have a language model automatically generate images and choices for each of them.