The agentic flywheel: building the next decade of product differentiation in B2B Tech

6 minute read

By Chris Kindt, Head of Value Creation at Hg

B2B SaaS vendors are at the start of a generational value creation opportunity: agents let software move beyond assisting the work to actually doing it, which pushes vendors out of the IT budget and into the salary budget sitting behind every function, budgets that typically run 10x larger.

Vertical SaaS enjoy a head start in capturing it. The complex domain knowledge they have built up over years, the proprietary customer data, and the customer distribution at scale are all advantages AI-native challengers cannot manufacture quickly.

But that is a head start, not a finishing position. It buys the right to build the next set of differentiation capabilities: agentic products that customers adopt at scale, the proprietary action data their adoption generates, and the golden data sets, evals and LLM Ops infrastructure that turns that data into compounding product advantage. We’re calling this the agentic flywheel. Get it spinning early and it becomes the moat that defines the next decade. Wait, and either an AI-native challenger or your customer’s rough prototype on a horizontal platform will spin theirs first.

This is the view from our 1,600 live AI projects across the Hg portfolio, more than 100 agentic product builds, and a decade of building out our Hg AI and Data value creation team.

From system of record to system of action

For two decades, vertical SaaS has been the system of record for a profession or industry: the place where the work is logged, reported, calculated, and reconciled. That is already a valuable role, and it has automated a meaningful slice of what the function used to do by hand.

But recording, calculating, and reporting is not the same as getting the task done. Around every interaction with a system of record sits the real work of the function: gathering the inputs, chasing the missing context, drafting the document, the email, the recommendation, reviewing what came out, compiling the reports, routing it onward. The system of record supports that work, but it does not do it.

Agents can change that. They can read the case file, draft the letter, schedule the build, triage the ticket, close the books. Not perfectly, not yet, and not without supervision, but increasingly enough that the task gets done by software rather than merely supported by it. That is the system of action: software that does the work rather than recording it.

It is what opens the salary budget, and it also reorders who wins in each vertical, because the capabilities that built leadership in the system of record are not the same capabilities that win the system of action.

The head start - and its limits.

Three advantages set vertical SaaS incumbents apart at the starting line.

Domain expertise. Years of watching how the work gets done in a specific profession: the edge cases, the regulatory wrinkles, the shortcuts practitioners use but never document.

Proprietary data. In some verticals this is genuinely unique, cross-customer network data only an incumbent at scale could have assembled. Even where it is not, incumbents hold customer records, transactions, configurations, and integration patterns no foundation model has seen, and no challenger can replicate from public sources.

Customer trust and distribution. Customers, and in regulated verticals their regulators, have spent years relying on you with sensitive data and mission-critical workflows. That trust is what lets you put an early agent in front of thousands of customers at once and have them actually try it.

Together these advantages let you do the one thing that matters most in this phase: get agentic products into customers’ hands early, and get adoption at scale. That is the whole point of the head start. It is not a moat by itself. It is the chance to start building one.

The asset that then compounds is the action data that captures the logic of work: the reasoning traces, corrections, outcomes, and edge cases generated when your agents are used at scale in production. This is what improves the next version of the agent, and the version after that. No challenger working from public data alone can replicate it.

The head start lets you start spinning that flywheel sooner.

The competition will not wait

Vertical AI-native startups are unencumbered by legacy product. They are free to design for the agentic workflow from day one, and increasingly well-resourced to do it. Harvey has just published a new benchmark for long-running legal agents: 1,200 tasks across 24 practice areas, evaluated against 75,000 rubrics written by lawyers, with a team of 23 named contributors blending AI engineering and legal domain expertise. That is the scale of investment a single AI-native challenger is now putting into eval infrastructure alone. If competitors of this calibre reach scaled customer use first, they start their own flywheel and the incumbent head start dissolves.

Sophisticated customers are the second threat, and the more underestimated one. Many are already building their own agents on horizontal platforms: Claude, Copilot, ChatGPT, whatever sits on the desktop. The result might be rough, brittle, and something they would rather not maintain. Given a real choice, they would prefer an existing partner to maintain and develop. But the risk is timing: once a workflow forms around the home-grown agent, the switching cost is the habit, not the technology, and habit is a hard moat to dislodge.

The unglamorous work that wins

Initial CEO and boardroom efforts have rightly focused on product strategy: identifying which workflows to extend, how the core SaaS product becomes a compelling system of action, where the first agents should land. That work matters and has to happen first.

What now needs to follow, fast, is focus on the operational capabilities that turn an early agent into a durable competitive position, and the ongoing grind to build effective agents and have customers adopt them.

Building effective agents requires LLM Ops and evals, embedded in the product development cycle and kept running as models and agent behaviours change underneath them. The instrumentation that captures every agent interaction, the labelling pipelines, the trace storage, the regression tests, the feedback loops back into the next model build. The eval suites that tell you whether your agent meets the quality bar, whether the latest model release is sufficient to launch a withheld feature, and whether your latest release is genuinely better than the last.

Driving adoption also requires new practices. Agentic pricing is shifting from seats to consumption and outcomes, which means revenue now depends on the agent being actively used inside the customer's workflow and demonstrably delivering value . Commercial success depends on a function that can get the agent live, prove value in production, and keep it performing against contractually meaningful metrics. That is not a job for traditional customer success or professional services. It requires engineers sitting alongside customers and coding side-by-side with their teams, instrumenting their workflows, tuning the agent against real data, surfacing the edge cases that evals never catch in the lab, and closing the loop back into the product. Each deployment then becomes reusable product for the next ten customers. Without this capability, usage or outcome-based pricing becomes a commercial liability rather than a growth engine.

All this requires new specialist talent, organised into dedicated teams that most incumbents will need to build deliberately. Technical experts to stand up and run the eval and LLM Ops infrastructure; product and domain experts to author the rubrics, label edge cases in production, and define what "better" actually means for the product; and forward deployed engineers, who get critical agentic deployments live and working inside the customer's environment, and whose highest-leverage work over time is extracting the tacit knowledge that lives with subject matter experts and turning it into evals and product improvements that compound across customers.

Fin (formerly Intercom) built this new muscle for their Support agents: a 60+ person AI Group running evals and infrastructure, alongside a forward deployed engineering team that embeds with strategic customers. Harvey's 23-person eval team, blending AI engineers and lawyers, paired with a growing forward deployed engineering function, is what good looks like. Two of the most-watched AI-native companies in B2B software, in different verticals, have converged on the same answer.

This is not back-office plumbing, it is the strategy made operational. You cannot improve an agent from action data without an eval suite that tells you whether the next version is better. You cannot capture action data at scale without the infrastructure to log, label, and route it. Skip this work and you have shipped agents without the means to compound.

This is a CEO and board commitment, not a CTO line item, and one that deserves monthly tracking, especially in these early innings. Agent adoption and performance belong at the top of the monthly board pack: activated customers, weekly active usage, share of target workflows running through the agent, and the agent’s efficacy at completing outcomes scored on a regular cadence with the trajectory visible to the board. Boards that anchor on solely on revenue will get ERR, not ARR: experimental recurring revenue, the kind that turns over the moment the renewal cycle arrives and the customer realises the agent never quite worked. Real ARR follows real adoption and real agentic value delivered - in that order.

Where to start

Five moves for the CEO and board to take now.

Target customer outcomes that start to shift your proposition to become ‘the system of action’. Not the easiest products to ship, but the ones where you have the deepest data and trust, and where you can extend your software to get the customer’s task done end to end that really matter.

Ship before it feels polished. Early users are how the product gets good: real workflows, real failures, and the feedback that shapes the next version. Derisk with design partners or beta programmes to secure that early usage and the case studies that come with it. And your customers are already experimenting on horizontal platforms anyway, so every month you wait is a month their habits form somewhere else.

Embed engineers with your first customers. Adoption does not happen in the abstract. Your highest-stakes deployments need strong technical people alongside the operators who own the outcome, wiring the agent into real workflows, tuning against real data, and feeding what they learn back into the product and the eval suite. Start with two or three on the most strategic accounts, and treat it as a permanent capability, not a launch-phase workaround. Without it, outcome-based pricing is a promise you cannot keep.

Stand up evals and LLM Ops at launch, not after. The instrumentation, trace storage, labelling pipelines, and eval suites need to be live the day the first agent ships. Without them, the first six months of usage data is wasted: no baseline, no regression signal, no way to tell whether the next model release is better or worse. The capability has to be ready on day one, not retrofitted six months in.

Run the review loop with domain experts inside the product organisation. Eval rubrics, edge cases, and the definition of "better" cannot come from engineers alone. Much of this talent already sits inside the business: practitioners and customer-facing teams who understand the work and need to be redirected into the eval and product loop. Put them in the same room as the engineers, on a regular review cadence, scoring real outputs against real rubrics. The cadence is the point. Agents drift, models change underneath you, and customer workflows evolve. Treat this as a permanent rhythm, not a launch-phase project.

This piece has deliberately focused on building the product differentiation muscle. It is the foundation of the agentic flywheel and the prerequisite for everything else: without a differentiated agentic product, there is nothing meaningful to sell, deploy, or price for. But the full shift from SaaS-first to agents-first runs wider. It reshapes the commercial and pricing model, the go-to-market and customer adoption motion, and the back-office and operating model that supports both. Each is its own concerted CEO agenda, and each will demand the leadership and organisational capability to deliver it. We will return to them.

How we can help

The Hg Value Creation team and the Catalyst programme are already working alongside portfolio companies on every part of this agenda.

Catalyst can help you stand up agentic product builds at speed, pressure-test which workflows to attack first, put the LLM Ops and eval infrastructure in place, plug you into the Anthropic partnership, and connect you to portfolio peers already further along the journey.

Reach out to AI@hgcapital.com

Share this article