"Half of AI’s answers are brilliant, half aren’t. Atla trained a model to tell them apart."
TL;DR Meet Selene, a state-of-the-art LLM Judge trained specifically to evaluate AI responses. Selene is the best model on the market for evals, beating all frontier models from leading labs across 11 commonly used benchmarks for evaluators. Atla has announced the release of:
• API/SDK - Integrate Selene into your AI workflow
• Alignment Platform - Build custom evaluation metrics for your use case
Atla is a small, highly technical team of AI researchers and engineers, with folks from leading AI labs and startups. Their mission is to enable the safe development of AGI. As models grow more powerful, a ‘frontier evaluator’ that keeps pace with frontier AI is needed. The team at Atla sees Selene as a stepping stone toward scalable oversight of powerful AI.
The Problem
Generative AI is unpredictable. Even the best models occasionally hallucinate, contradict themselves, or produce unsafe outputs. Many teams rely on the same general-purpose LLMs to evaluate AI outputs, but these models weren’t trained to be judges. That leads to:
Inaccurate evaluations and inefficient iteration cycles in development.
Risky, unpredictable AI behavior in production.
The Solution
A SOTA model for evals: Selene outperforms all frontier models (OpenAI’s o-series, Claude 3.5 Sonnet, DeepSeek R1, etc.) across 11 benchmarks for scoring, classifying, and pairwise comparisons.
A platform to align the evaluator: Adapt Selene to your exact evaluation criteria—like “detect medical advice,” “flag legal errors,” or “judge whether the agent upgraded its workflow correctly.”
Selene works seamlessly with popular frameworks like DeepEval (YC W25) and Langfuse (YC W23) — just add it to your pipeline. And it runs faster than GPT-4o and Claude 3.5 Sonnet.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.