Structured Outputs From Scratch
Building a structured output parser from first principles — from JSON Schema to regex to FSM to constrained decoding.
Pydantic to Regex: Compiling JSON Schemas for Structured Outputs
Build a recursive JSON Schema-to-regex compiler from first principles
If you've ever asked a LLM to output JSON, it might return markdown code blocks, hallucinate extra fields or produce invalid syntax. A common way to solve this is to handle this at a decoding level -…
Building our FSM
Going beyond a simple regex to an FSM for quick lookups
In the previous article, we compiled our Pydantic model into a regex that can validate a JSON string. This helps validate a final response, but it can't really be used at each step of the decoding…
Compiling IR to NFA
Turn regex IR into a graph-based automaton for fast next-step lookup
In the previous articles, we went from a JSON Schema to a regular expression, and then from a regular expression to a structured intermediate representation (IR). That IR made all of the implicit…