Foundry
The Training and Evaluation Infrastructure for Web-Native Agents
The next step for AI is not more scale, but more agency. We’re moving beyond static models to agents that operate in the real world—and that world is the web. They're navigating SaaS tools, searching databases, and automating workflows. But today's agents are at a GPT-1 level of web intelligence. Silent failures are the norm, not the exception. Unforeseen layout shifts, DOM mutations, and race conditions cause agents to break in ways that are rarely caught and even more rarely understood.
Foundry addresses this by building high-fidelity simulations of real websites and workflows. We provide a rigorous, reproducible environment for testing agents on end-to-end tasks under realistic conditions.
Our Core Functionality:
- Deterministic Environments: We provide a frozen content with website versioning you can run a million eval runs and be confident the difference was due to agent performance and not webshft.
- State-Based Evaluation: We provide structured state json and handle state manmagement on the backend, allowing you to define your own evaluation/reward functions with your own criteria.
- Informed Data Collection: When a failure occurs, we can collect demonstration data for behavorial cloning or more similar sim examples for on-policy RL.
This creates a critical feedback loop, enabling researchers to build reliable agents through iterative improvement. Foundry allows you to move beyond simple success/failure reporting and into a data-driven understanding of agent performance.
The economic implications are significant: over 60% of all global knowledge work is mediated through the browser. Agents that can master this environment will unlock a multi-trillion dollar opportunity across support, sales, and internal operations.
Building these agents requires purpose-built infrastructure. That's Foundry.
- San Francisco. YC-backed.
- We're hiring exceptional researchers and engineers.
- Contact: founders @ foundryrl.com