
Launch Video: https://www.youtube.com/watch?v=E4_v-pY_4fs
Founded by Jerry Zhang & Cole Gawin
They met freshman year at USC and have been building together ever since instead of going to classes.
Before starting Lemma, they were engineers at two high growth, AI-native startups: Tandem (AI for healthcare) and Chipstack (AI agents for chip design). At both companies, setting up evaluations looked like clunky Retool dashboards and multiple engineers manually tweaking experiments. They built internal systems that automated both running the evaluations themselves, as well as the error-driven feedback loop. The result: 2x accuracy improvement and speed of iteration.
They soon realized every AI company was reinventing the same internal tooling in-house. So they left college, joined YC, and are now bringing continuous learning infrastructure to everyone else.

AI agents don’t learn from their mistakes. In fact, they get worse with use.
In production, prompts and agents continuously degrade due to real-world input drift (new user behaviors or unseen edge cases). Agent performance can often drop ~40% in a few weeks, and suddenly what worked in testing breaks in front of customers.
When that happens, engineers are forced to dig through logs, collect failing examples, and manually tweak prompts rather than building core product features.
That’s why the team built Lemma: the first end-to-end system that closes the loop between agent deployment and improvement.
Here's what that means:
Step 1: Lemma detects failed outcomes directly from live traffic, and it automatically identifies the exact cause in an agent chain.
Step 2: Lemma alerts you, and with one click, it runs targeted prompt optimizations to fix the failing behavior without any manual tracing or guesswork.
Step 3: They give you back an improved prompt and automatically open a PR in your codebase so your prompts can live where you want them. Alternatively, you can also fetch your prompt from the Lemma dashboard.
Plus, Lemma provides all the LLM eval and observability features you rely on, just reimagined for continuous learning:



Teams using Lemma cut manual prompt iteration by 90%, resolve production drifts in minutes instead of days, and improve model performance ~2–5% every optimization cycle.