TL;DR: AI workers must learn how to use computers, browsers, and software interfaces to deliver real-world value. Today, computer use agents are unreliable and inaccurate. Halluminate is building realistic sandboxes and datasets to train better computer/browser use AI.
Wyatt and Jerry met while studying CS at Cornell and have been living and working together for over 7 years.
Jerry previously led product/research at Capital One Labs, where he launched one of the first AI agents in banking. Wyatt previously was a Cornell Milstein scholar and did large-scale data engineering for 2 early-stage startups in NYC.
They faced these problems first-hand while building evals for browser/computer use agent companies. They didn’t see any good solutions, so they are building one themselves.
They are in SF for the foreseeable future. Contact them here if you wanna grab a coffee, go for a walk, or play pick-up basketball!
OpenAI’s Operator and Claude’s Computer Use give them a glimpse into the future where AI can take control of digital interfaces and do real work.
Performance today is inaccurate/unreliable. There are two bottlenecks to performance improvements.
First, reliance on real-world testing: researchers today train/test their browser- and computer-use agents on real-world sites. This is
Unsafe - agent actions have real consequences and data impact
Slow - challenging to parallelize
Difficult to reproduce - the real world is dynamic; the starting conditions cannot be “reset” easily, and data changes
Lots of noise - proxy, captcha, auth/login, ads, etc,. make it difficult to do clean testing/training
Second, lack of high-quality data: High-quality data provides the basis for evaluations and benchmarking. Producing this at scale is expensive, time-consuming, and logistically exhausting.
At Halluminate, they are building a suite of products and services to address both these issues.
Realistic sandboxes – Fully managed, parallelizable environments modeled after popular systems (e.g., Salesforce, Slack, Ticketing Software, websites) for safe and accurate computer/browser use training and testing.
Datasets – proprietary benchmarks and datasets
Evaluations - high-quality error analysis powered by expert annotators to identify and prioritize the biggest failure modes for their customers
Their customers see
Improved browser- and computer-use agent performance
New and emergent frontier agent capabilities
Data-driven prioritization leading to exponentially faster development speed
Increased revenue/sales via marketing from public benchmarks
🌐 Their Mission
Unlock significant advancements in browser and computer use AI capabilities. They believe this is necessary to usher in a new generation of use cases, startups, and productive AI workers.
🙏 Asks: Introductions to researchers and founders building computer/browser use agents - experts in RL & post-training - companies that sell training data to large labs
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.