Smart LLM Routing
Building LLM apps is easy, but scaling them without setting a pile of money on fire is hard. You really don't need the massive brainpower of GPT-5 for every single user query. Routing is how we fix th
Search for a command to run...
Building LLM apps is easy, but scaling them without setting a pile of money on fire is hard. You really don't need the massive brainpower of GPT-5 for every single user query. Routing is how we fix th
Building a multi-turn conversational AI is surprisingly easy right now. Evaluating it is incredibly hard. For single-turn tasks, a standard static dataset works fine: you just feed in a prompt and ass
I’ve been reading through a recent paper titled "Is Grep All You Need? How Agent Harnesses Reshape Agentic Search". It’s a provocative piece with a premise I normally love. The authors claim that simp
I’ve been spending the last few weeks messing around with open-weight models to build conversational interfaces. By now, the new reality is obvious: generating natural language is no longer the bottle
If you’ve built anything with LLMs in the past couple of years, you’ve probably wired up a Retrieval-Augmented Generation (RAG) pipeline. The playbook is burned into our brains: take a PDF, smash it i
I’ve been thinking a lot recently about the "chunking problem" in Retrieval-Augmented Generation. If you've played around with the llm CLI tool or built anything with vector embeddings, you've probabl
If you’ve spent any time building Retrieval-Augmented Generation (RAG) prototypes, you inevitably hit the exact same wall. You wire up a great embedding model, point it at an excellent local LLM, and
In an article I recently co-authored, we argued that a fundamental shift is underway in product design. The traditional principles of User Experience (UX), which rest on direct user control and manipulation, are becoming obsolete with the rise of tru...
For years, evaluating traditional machine learning models, while never simple, followed a well-trodden path. Your team knew the drill: assemble a labeled dataset, define success with metrics like precision and recall, and track performance. The core ...