What 200 engineers taught us about building AI products

4 minute read

Just over a month ago, we brought together 200 engineers from 40 product and engineering teams in London, Paris, and New York and gave them 48 hours to build agentic AI products from scratch.

Every portfolio team came with real items from their AI roadmaps: agentic builds they'd been putting off because they felt too ambitious for a normal sprint cycle, and everything had to run against their own production data.
To help, we partnered with Anthropic, who shared best practice on building agents in production. Engineers and product leaders from Cognition, Stripe, Ramp, Datadog, and Pendo also came along to share knowledge and hack alongside the teams.

The goal was to show engineering and product teams what's now possible and accelerate collective learning. In other words, put the right people in a room with expert support, compress the timeline, and let them see for themselves how fast they can go.

The foundations were the easy part

Nearly every team, regardless of vertical, independently arrived at multi-agent architectures: an orchestrator agent delegating to specialist agents, each with a bounded scope.

Nobody prescribed this. Waystone’s fund distribution agent, Empyrean's stress-testing tools for bank asset-liability management, and team.blue's onboarding portal spanning multiple product ecosystems all landed on the same pattern.

A fund distribution agent and a bank stress-testing tool share nothing in their business rules, data sources, or compliance requirements. But the underlying foundations; the engineering patterns, the quality frameworks, the monitoring infrastructure, the way outputs are validated, were remarkably consistent across all of them.

Where things diverge, and where real customer value lies, is the context that goes into these foundations. By this I mean the domain expertise, customer proximity, vertical knowledge, and proprietary data that each company brings and has accumulated over many years, sometimes decades. That's not something any team can replicate in a hackathon or anywhere else.

Waystone's team showed what this looks like in practice. They built an agent that processes fund distribution calculations, using AI for language understanding and rule extraction from legal documents, then handing off to a deterministic engine for the arithmetic. Output: 100% accurate validation against a real Q4 2024 production workbook. Fund administrators currently spend hours on manual reconciliation for each distribution cycle, exactly the kind of high-value, high-accuracy task where agentic software can shift work from the salary line into the product.

Measuring quality is what separates products from demos

With tools like Replit and Lovable, anyone can build a demo. But in our verticals, there's a big gap between a demo and a real product that works consistently, that you can prove isn't degrading, and that runs at a viable cost. We designed the hackathon around that conviction, scoring every team against a rubric covering twelve dimensions of AI engineering maturity, from quality measurement and monitoring to cost tracking and guardrails. Multiple teams reached the highest tier in two days, meaning they had automated quality checks running on every change and blocking any release that fell below the bar.

Most teams got monitoring flowing quickly, which was encouraging. The area with the most room to grow was closing the loop: reviewing interaction logs, clustering errors, and feeding those patterns back into the next iteration. That's where the real compounding happens, and it's where many of our teams will be focusing in the months ahead. We've also published the full rubric as a companion to this piece.

Cross-pollination is the portfolio advantage that compounds

The individual builds were impressive, but it was the speed of knowledge transfer across companies that showcased the value of the event. A guardrails architecture built for financial services post-trade reconciliation could be adapted by a team building a healthcare integration product. A quality pipeline from one team became a reference implementation for three others. The rubric gave everyone a shared vocabulary for what good looks like, and that common language made the transfer fast.

No single company can create that environment alone. Across 60 portfolio companies, each building AI products in their own domain, the learning rate accelerates because the engineering patterns are shared even when the domains aren't. Every engineer left with a sharper sense of what production-grade AI engineering requires and took that back to their company the following week.

What happens next

Products and features from the event are already being shipped at several companies. The Hg Catalyst team is working with engineering leaders on the transition to production: architecture decisions, eval frameworks, go-to-market readiness. The advantage goes to whoever learns fastest, and the question now is how fast these products reach customers.

We’ll be sharing more updates from some of the hackathon teams over the coming months – so keep an eye out on the Hg Catalyst blog.

Share this article