2024

November 28, 2024
in LLMs
4 min read

You're probably using LLMs wrongly

In this post, I'll share three simple strategies that have transformed how I work with language models.

Treat working with language models as an iterative process where you improve the quality of your outputs over time.
Collect good examples that you can use as references for future prompts.
Regularly review your prompts and examples to understand what works and what doesn't.

Most complaints I hear about language models are about hallucinations or bad outputs. These aren't issues with the technology itself. It's usually because we're not using these models the right way. Think about the last time you hired someone new. You didn't expect them to nail everything perfectly on day one.

The same principle applies to language models.

November 27, 2024
in Mac, RWKV
3 min read

Setting Up My MacBook for ML Development: A Living Guide

This is a living document that I'll be updating over the next few weeks as I continue setting up and optimizing my new MacBook development environment. While I followed most of Eugene Yan's excellent minimal setup guide, I made some specific choices based on my workflow needs.

Core Development Setup

I kept things minimal for my general development environment, following most of Eugene's recommendations but focusing on just the essentials:

November 26, 2024
in Evaluations
5 min read

Write Stupid Evals

Evals should start simple and get progressively more complex. It's important to start simple because what we're aiming to do is to build a habit for writing these simple assertions early. By doing so, we can start taking our vibes and turning them into objective metrics. This allows us to compare different approaches easily and make a data-driven decision into what works and what doesn't.

Don't overthink it and really just use a assert statement at the start.

There's a famous story about a pottery teacher who divided their class into two groups. The first group would be graded solely on quantity - how many pieces they could produce. The second group would be graded on quality - they just needed to produce one perfect piece. When grading time came, something interesting happened: the best pieces all came from the quantity group. While the quality group got stuck theorizing about perfection, the quantity group learned through iterative practice.

November 24, 2024
in RAG
4 min read

Is RAG dead?

What is RAG?

RAG is a fancy way of stuffing additional information into the prompt of a language model. By giving the model more information, we can get more contextual responses that are contextually relevant to what we need. But don't language models already have access to all of the world's information?

Imagine you're starting a new job. Would you rather:

Have access to all of Wikipedia and hope the information you need is somewhere in there
Have your company's specific documentation, procedures, and guidelines

November 23, 2024
in LLMs, Synthetic Data
6 min read

Synthetic Data is no Free Lunch

I spent some time playing with a new framework called Dria recently that uses LLMs to generate synthetic data. I couldn't get it to work but I did spend some time digging through their source code, and I thought I'd share some of my thoughts on the topic.

Over the past few weeks, I've generated a few million tokens of synthetic data for some projects. I'm still figuring out the best way to do it but I think it's definitely taught me that it's no free lunch. You do need to spend some time thinking about how to generate the data that you want.

The Premise

An example

When I first started generating synthetic data for question-answering systems, I thought it would be straightforward - all I had to do was to ask a language model to generate a few thousand questions that a user might ask.

November 20, 2024
in LLMs, Applied AI
4 min read

You're probably not doing experiments right

I recently started working as a research engineer and it's been a significant mindset shift in how I approach my work. it's tricky to run experiments with LLMs efficiently and accurately and after months of trial and error, I've found that there are three key factors that make the biggest difference

Being clear about what you're varying
Investing time to build out some infrastructure
Doing some simple sensitivity analysis

Let's see how each of these can make a difference in your experimental workflow.

September 21, 2024
in LLMs, langchain, Instructor
7 min read

Why Instructor might be a better bet than Langchain

Introduction

If you're building LLM applications, a common question is which framework to use: Langchain, Instructor, or something else entirely. I've found that this decision really comes down to a few critical factors to choose the right one for your application. We'll do so in three parts

First we'll talk about testing and granular controls and why you should be thinking about it from the start
Then we'll explain why you should be evaluating a framework's ability to experiment quickly with different models and prompts and adopt new features quickly.
Finally, we'll consider why long term maintenance is also an important factor and why Instructor often provides a balanced solution, offering both simplicity and flexibility.

September 8, 2024
in Instructor
8 min read

How does Instructor work?

For Python developers working with large language models (LLMs), instructor has become a popular tool for structured data extraction. While its capabilities may seem complex, the underlying mechanism is surprisingly straightforward. In this article, we'll walk through a high level overview of how the library works and how we support the OpenAI Client.

We'll start by looking at

Why should you care about Structured Extraction?
What is the high level flow
How does a request go from Pydantic Model to Validated Function Call?

By the end of this article, you'll have a good understand of how instructor helps you get validated outputs from your LLM calls and a better understanding of how you might be able to contribute to the library yourself.

September 5, 2024
in Evals, Braintrust
8 min read

Getting Started with Evals - a speedrun through Braintrust

For software engineers struggling with LLM application performance, simple evaluations are your secret weapon. Forget the complexity — we'll show you how to start testing your LLM in just 5 minutes using Braintrust. By the end of this article, you'll have a working example of a test harness that you can easily customise for your own use cases.

We'll be using a cleaned version of the GSM8k dataset that you can find here.

Here's what we'll cover:

Setting up Braintrust
Writing our first task to evaluate an LLM's response to the GSM8k with Instructor
Simple recipes that you'll need

August 27, 2024
in LLMs, Synthetic Data
5 min read

How to create synthetic data that works

Synthetic data can accelerate AI development, but generating high-quality datasets remains challenging. In this article, I'll walk through a few experiments I've done with synthetic data generation and the takeaways I've learnt so that you can do the same.

We'll do by covering

Limitations of simple generation methods : Why simple generation methods produce homogeneous data
Entropy and why it matters : Techniques to increase diversity in synthetic datasets
Practical Implementations : Some simple examples of how to increase entropy and diversity to get better synthetic data