After a long week of coding, you might think San Francisco builders would retreat to the Bay Area's mountains, beaches, or vibrant club scene. But in fact, the end of the week marks the start of an AI hackathon.
AI hackathons have been popular in San Francisco for the past few years. Every Saturday and Sunday, technologists talk about the latest advances in AI and networking, and most importantly, put their ideas into working demos. Hackathons often offer cash prizes, cloud credits, and other prizes, but the real winners go home with startup prototypes.
“There's no better place in the world than San Francisco to build the most ambitious project of your life,” says agency co-founder Alex Leibman. “You see these hackathon-like competitions all the time, but they're not competitive. They're competitive and collaborative at the same time.”
Last summer, at a San Francisco hackathon, Leibman decided to try his hand at building an AI agent that could scrape the web. Agents are a hot topic in Silicon Valley, where the AI boom is at its peak. The term isn't precisely defined, but it generally refers to AI-based bots that can perform tasks automatically, using interfaces and services that weren't originally designed for automation. They're like replacements for routine tasks that once required human intervention.
But Mr. Reipman quickly ran into problems. “They were terrible,” Mr. Reipman said in an interview. “The agents failed about 30 to 40 percent of the time, and they often failed in unexpected ways.”
To fix this, Reibman's team built an internal debugging tool to dig into the agent to see where things were going wrong. As a result, the agent worked a little better, but the debugging tool itself attracted attention and won the hackathon.
“We started introducing the tool at hackathons and events in San Francisco, and people started asking for access to it,” Leibman says. “That was basically the validation I needed: that instead of people building agents themselves, we should build tools that make it easy to build agents.”
So Leibman started Agency with co-founders Adam Silverman and Sean Chiu to give them the tools to see what their AI agents were actually doing and catch where they were going wrong. A year later, those tools eventually became Agency's core product, the AgentOps platform, which is now used by thousands of teams every month, Leibman told TechCrunch. The startup has raised $2.6 million in pre-seed funding led by 645 Ventures and Afore Capital.
Chief Operating Officer Adam Silverman told TechCrunch that AgentOps is like “multi-device management for agents,” analyzing everything an agent does to make sure it doesn't get out of control.
“You have to understand if an agent is going to get out of control and identify what limitations you can put in place,” Silverman said in an interview. “A lot of the work is figuring out where the guardrails are and visually verifying that the agent is following them before putting them into production.”
The startup has partnered with AI model development companies Cohere and Mistral, which also offer agent creation services, and customers can use AgentOps' dashboard to see how their agents interact with the world and how much each agent costs. Agency is model-agnostic, so it works with several different AI agent frameworks, but it integrates with popular tools like Microsoft's AutoGen, CrewAI, and AutoGPT.
Beyond the AgentOps dashboard, Agency also offers consulting services to help companies get started building their agents. (Reibman previously worked at consulting firm EY.) Agency wouldn't disclose the names of its clients, but did say that hedge funds, consultancies and marketing firms are using its tools.
For example, Reibman says Agency helped its clients create AI agents that write blog posts about companies they do business with, and now those same clients are using the AgentOps dashboard to track agent performance and costs.
Large companies like OpenAI and Google are likely to develop agent products in the coming months, so AI startups like Agency will need to figure out how to work with those advances rather than against them.
“There are so many layers in the stack that it's unlikely that an LLM provider would try to capture them all,” Reibman says. “OpenAI and Anthropic are building agent builders, but there are all these layers around that to ensure a production-ready code base.”