Atla Launches Selene: The World’s Most Accurate LLM-as-a-Judge

Atla recently launched Selene!

Launch YC: Selene - The World’s Most Accurate LLM-as-a-Judge

^‍

^{"Half of AI’s answers are brilliant, half aren’t. Atla trained a model to tell them apart."}

‍

^{TL;DR Meet Selene, a state-of-the-art LLM Judge trained specifically to evaluate AI responses. Selene is the best model on the market for evals, beating all frontier models from leading labs across 11 commonly used benchmarks for evaluators.}*^Atla^{has announced the release of:}***

‍

^{• API/SDK - Integrate Selene into your AI workflow}

^{• Alignment Platform - Build custom evaluation metrics for your use case}

‍

Get started using Selene for free.

‍

Watch the demo here.

‍

‍Founded by Maurice Burger & Roman Engeler‍

‍

Who The Team Is & Why They’re Building This

Atla is a small, highly technical team of AI researchers and engineers, with folks from leading AI labs and startups. Their mission is to enable the safe development of AGI. As models grow more powerful, a ‘frontier evaluator’ that keeps pace with frontier AI is needed. The team at Atla sees Selene as a stepping stone toward scalable oversight of powerful AI.

‍

The Problem

Generative AI is unpredictable. Even the best models occasionally hallucinate, contradict themselves, or produce unsafe outputs. Many teams rely on the same general-purpose LLMs to evaluate AI outputs, but these models weren’t trained to be judges. That leads to:

Inaccurate evaluations and inefficient iteration cycles in development.
Risky, unpredictable AI behavior in production.

‍

The Solution

A SOTA model for evals: Selene outperforms all frontier models (OpenAI’s o-series, Claude 3.5 Sonnet, DeepSeek R1, etc.) across 11 benchmarks for scoring, classifying, and pairwise comparisons.

‍

A platform to align the evaluator: Adapt Selene to your exact evaluation criteria—like “detect medical advice,” “flag legal errors,” or “judge whether the agent upgraded its workflow correctly.”

‍

Selene works seamlessly with popular frameworks like DeepEval (YC W25) and Langfuse (YC W23) — just add it to your pipeline. And it runs faster than GPT-4o and Claude 3.5 Sonnet.

‍

Learn More

‍

^{🌐 Visit}^{www.atla-ai.com}^{to learn more.}

‍

^🎁*^{Try Selene for free}^{→ Integrate the API into your eval pipeline.}***

^‍

^✨*^{Try the Alignment Platform}^{→ Craft a custom eval for your application.}***

^‍

^⭐*^Discord^{→ Leave feedback, get to know the team, and brainstorm cool ideas.}***

^‍

^‍^‍*^{👣 Follow Atla on}^LinkedIn***^&^X^.

^‍

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

About the author

David J. Phillips

CEO & Founder

David is the CEO & Founder of Fondo (YC W18). He is an angel investor in Rippling, Flexport, LiquidDeath, and 100+ other startups. David began his career as an accountant at Deloitte before learning to code and becoming a founder. Previously, he was co-founder of Hackbright where 1,000+ software engineers have been trained and placed at tech companies including Slack, Disney, and Uber and was acquired by Capella Education NASDAQ: $CPLA in 2016.

Launch

Atla Launches Selene: The World’s Most Accurate LLM-as-a-Judge

David J. Phillips

"Half of AI’s answers are brilliant, half aren’t. Atla trained a model to tell them apart."

TL;DR Meet Selene, a state-of-the-art LLM Judge trained specifically to evaluate AI responses. Selene is the best model on the market for evals, beating all frontier models from leading labs across 11 commonly used benchmarks for evaluators. Atla has announced the release of:

• API/SDK - Integrate Selene into your AI workflow

• Alignment Platform - Build custom evaluation metrics for your use case

Who The Team Is & Why They’re Building This

The Problem

The Solution

Learn More

🌐 Visit www.atla-ai.com to learn more.

🎁 Try Selene for free → Integrate the API into your eval pipeline.

‍

✨ Try the Alignment Platform → Craft a custom eval for your application.

‍

⭐ Discord → Leave feedback, get to know the team, and brainstorm cool ideas.

‍

‍‍👣 Follow Atla on LinkedIn & X.

Heading

About the author

David J. Phillips

More posts

Gusto vs Paycor: Finding the Best Payroll Software for Your Business

Kaelio Launches: The AI Analytics Copilot for Healthcare Organizations

Why Your Startup Should Outsource Its Bookkeeping

Your bookkeeping, taxes, and tax credits on autopilot.

Join our newsletter!

Company

Platform

Resources

Founder Guides

Sign Up

Pages

Home pages

About pages

Contact pages

Pricing pages

Blog pages

Team members pages

Services pages

Help center pages

Internal pages

Careers pages

Utility pages

Get a demo pages

Coming soon pages

Webinar pages

Thank you pages

Lead form landing pages

E-book pages

Template pages

^{"Half of AI’s answers are brilliant, half aren’t. Atla trained a model to tell them apart."}

^{TL;DR Meet Selene, a state-of-the-art LLM Judge trained specifically to evaluate AI responses. Selene is the best model on the market for evals, beating all frontier models from leading labs across 11 commonly used benchmarks for evaluators.}*^Atla^{has announced the release of:}***

^{• API/SDK - Integrate Selene into your AI workflow}

^{• Alignment Platform - Build custom evaluation metrics for your use case}

^{🌐 Visit}^{www.atla-ai.com}^{to learn more.}

^🎁*^{Try Selene for free}^{→ Integrate the API into your eval pipeline.}***

^‍

^✨*^{Try the Alignment Platform}^{→ Craft a custom eval for your application.}***

^‍

^⭐*^Discord^{→ Leave feedback, get to know the team, and brainstorm cool ideas.}***

^‍

^‍^‍*^{👣 Follow Atla on}^LinkedIn***^&^X^.