Cascade Launches: Evaluation Infrastructure to Run AI Agents Reliably

Casacde recently launched!

Launch YC: Cascade: Evaluation Infrastructure to Run AI Agents Reliably

‍

^{"Operational Intelligence safe and personalized"}

^TL;DR:

^{Companies are trying to deploy general-purpose AI agents onto highly specific workflows. They fail in subtle ways, teams can’t see the full scope of those failures, and the agents are vulnerable.}

^{Cascade learns from real agent behavior in production and turns it into training signal, allowing the agent to continuously improve after deployment.}

^{We believe every company will end up with its own operational intelligence - models and agents specialized to how that organization actually works and the data it uniquely produces.}

‍

‍Founded by Adam AlSayyad & Haluk Cem Demirhan

They are best friends from UC Berkeley who were working on different research problems around agents systems and reliability. Over time they realized both of their work was pointing to the same underlying issue, and almost everyone deploying agents would have the same problem. They gave up their return offers and paused PhD paths to build Cascade.

Haluk previously built production monitoring infrastructure and scaled agent systems at companies like Netflix and Amazon. His research at BAIR Lab covered long-horizon memory optimization and failure mode taxonomies for AI agents. Haluk studied Computer Science and Mathematics at UC Berkeley.

Adam previously conducted research at BAIR Lab, where his work focused on graph reasoning models, and agentic safety under some of the world's leading ML and AI safety researchers. He studied Computer Science at UC Berkeley.

‍

🎬 Launch Video:

https://youtu.be/MNVVHCZHwc4

‍

🚨 The Problem:

Right now organizations deploy generalist agents into custom processes. An agent that preforms well on benchmarks might fail terribly in production. Teams understand these pains:

Knowing the agent is failing but not understanding why
Writing evals that cover an immensely large scope
Hitting a performance plateau
Getting prompt injected

Teams inspect logs, tweak prompts, and write rubrics but they’re mostly guessing. As a result they can’t deploy agents where they matter most.

‍

🚀 Their Solution:

Cascade treats agent execution as data.

They observe real production runs and train evaluator models that learn what correct behavior looks like inside a company’s workflows. They analyze reasoning steps, tool usage, and outcomes to detect failure modes, threats, and reliability issues automatically.

Those evaluations are then converted into structured feedback that can improve rubrics, prompts, and models.

‍

Learn More

‍

^Visit^{runcascade.com}^{to learn more.}

^‍

*^{Follow Cascade on}^LinkedIn***^&^X^.

‍

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

About the author

David J. Phillips

CEO & Founder

David is the CEO & Founder of Fondo (YC W18). He is an angel investor in Rippling, Flexport, LiquidDeath, and 100+ other startups. David began his career as an accountant at Deloitte before learning to code and becoming a founder. Previously, he was co-founder of Hackbright where 1,000+ software engineers have been trained and placed at tech companies including Slack, Disney, and Uber and was acquired by Capella Education NASDAQ: $CPLA in 2016.

Launch

Cascade Launches: Evaluation Infrastructure to Run AI Agents Reliably

David J. Phillips

"Operational Intelligence safe and personalized"

TL;DR:

Companies are trying to deploy general-purpose AI agents onto highly specific workflows. They fail in subtle ways, teams can’t see the full scope of those failures, and the agents are vulnerable.

Cascade learns from real agent behavior in production and turns it into training signal, allowing the agent to continuously improve after deployment.

We believe every company will end up with its own operational intelligence - models and agents specialized to how that organization actually works and the data it uniquely produces.

🎬 Launch Video:

🚨 The Problem:

🚀 Their Solution:

Learn More

Visit runcascade.com to learn more.

‍

Follow Cascade on LinkedIn & X.

Heading

About the author

David J. Phillips

More posts

Mantys launches Modern day business planning & analytics tool for high-growth startups

Expensify vs Smart Receipts

IRS Audit Startup Receipts: Navigating Financial Documentation for Tax Success

Your bookkeeping, taxes, and tax credits on autopilot.

Join our newsletter!

Company

Platform

Resources

Founder Guides

Sign Up

Pages

Home pages

About pages

Contact pages

Pricing pages

Blog pages

Team members pages

Services pages

Help center pages

Internal pages

Careers pages

Utility pages

Get a demo pages

Coming soon pages

Webinar pages

Thank you pages

Lead form landing pages

E-book pages

Template pages

^{"Operational Intelligence safe and personalized"}

^TL;DR:

^{Companies are trying to deploy general-purpose AI agents onto highly specific workflows. They fail in subtle ways, teams can’t see the full scope of those failures, and the agents are vulnerable.}

^{Cascade learns from real agent behavior in production and turns it into training signal, allowing the agent to continuously improve after deployment.}

^{We believe every company will end up with its own operational intelligence - models and agents specialized to how that organization actually works and the data it uniquely produces.}

^Visit^{runcascade.com}^{to learn more.}

^‍

*^{Follow Cascade on}^LinkedIn***^&^X^.