A World of Automatable Domains

Published: November 12, 2025
Location: SF
Share on X

We are supervisors of ever capable and powerful reasoning models and we need to build tools to shape these models to learn from the world.

Fleet's description here is apt: "We aim to accelerate the shift towards the allocation economy, empowering humanity to transition from doing work to directing it—a capability previously reserved for the top ~1% of the global economy."

In software land, evals and tool use capabilities are the last frontier to conquer before we can design models for enterprise use cases. To fuel this, we need engineering real world tasks to be deterministically completed and the ever rapid delivery of that data format at top shelf quality.

Human Data markets are crucial to this.

Human Data Markets

There are lots of reasons to start a human data company nowadays:

  1. Most high-agency people have some asymmetrical access to a reasoning-rich high complexity source of data for automating some form of human labor
  2. Data companies transition into enduring productized RL as a service at best and are highly profitable at worst
  3. Selling to hyperspenders and the first sophisticated enterprise buyers allows you to learn as much possible to transition to when enterprises mature
  4. The business model is tried and true - once you establish a relationship with researchers and can promise some sort of scalability on data quality, you can access a fountain of gold
    1. Vendor consolidation doesn't exist as it does in enterprise SaaS - labs' research teams are siloed into research focuses and have little cross-pollination (increasing the universe of buyers within a single lab!)

Human data markets don't show signs of slowing down either:

  1. Its easy to make the bet that whichever paradigm comes after RL and pretraining as the dominant form to advance models will still need frontier data only creatable by humans
  2. OAI only just recently started seriously building out their own investment banker team, which is the first publicized mention of something everyone knew was going on for a long time
  3. The list of enterprises now contracting RL env and data vendor companies who traditionally only sold to labs have been growing - these are companies like FRL and First Intelligence

From the investor lens, I take the following as true in human data markets:

Treat margin sacrifice as strategic. I prioritize growth, reliability, and outcomes to get entrenched with top labs/startups, then decide whether to productize or stay a scaled services engine. This requires operational excellence via global workforce leverage, tight SLAs, security/compliance, and deep empathy for researchers to out-execute generic vendors. Once you get 2-3 labs, you don't have to build out a net new GTM motion, only an excellent researcher-centric AE-type motion to continue accessing lab revenues.

From a venture lens, I'm bullish on founding here (especially given NPV from founding in a founder friendly market) but selective on investing at today's prices. The upside depends on continued growth in human-in-the-loop/post-training spend and on winning concentrated lab contracts (OpenAI often delineates projects to 5-15 vendors; Anthropic uses few vendors). Multiple "Mercor-like" vendors can exist with vertical or supplier specialization, but venture-scale outcomes demand exceptional commercial chops and infra-grade operations; most failures stem from weak execution and shallow lab relationships, not lack of market.

Power-law concentration with a few anchor customers is a feature, not a bug; that's how this market behaves. This has caused an emerging consensus in venture investors today that labs will remain sole buyers forever and, consequently, RL env companies are largely not venture scalable. Indeed, this also implies that many founders in the RL env/data space today are those maximizing short terminism NPV, leveraging pedigree to pitch eventual pivot-to-enterprise visions that they don't have personal conviction behind, but allows them to raise A's/B's/C's with potential for lucrative secondaries. Even in that world, a few plausible outs exist for investors:

  1. A new paradigm shift happens like RL supplanting pre-training that dramatically shifts data markets for more Mercor-type players to emerge for various verticals/modalities
  2. Albeit unplanned, talent aggregation allows for companies to pivot into enterprise easily
  3. OAI and Anthropic, as predicted by Doria, continue their steady advance into the app layer by acqui-hiring domain expert teams and giving them specialized early access to app layer models, giving investors stock in companies now almost guaranteed to IPO in the next 2 years

Persisting Human Data Needs

Researchers themselves remain convinced that human data will stay central despite synthetic data, better autoraters, and speculation about model self-sufficiency.

One researcher told me, "Models will need even less data, but whoever can generate that data will capture most of the value. The spend will stay constant - less data, but harder to get data." He went on to describe how every major lab now runs internal synthetic-data pipelines yet still expects external providers to drive the frontier: "I foresee data providers becoming the real experts in that field and driving innovation."

Another researcher explained the economic shift within labeling work: "Some of the easier tasks are such that we're not going to be asking Invisible that much anymore because the models are really good. But there are very specific needs that synthetic data can't get right. That's an opportunity for them to charge higher. Every incremental specialized task the customer asks, they have to rely on me."

A third framed it as a natural flywheel: "People imagine AGI in two years and then you don't need anything and I don't think that's likely. There'll always be specialized tasks. Our autorating models have to be retrained to judge better quality outputs, which requires new human data. The raw quantity of labels may fall, but we'll need finer, more descriptive judgments. My best guess is the total dollars spent will be comparable." He added that despite more automation, "our labeling costs have grown these past two years … we're also licensing more unsupervised data from niche, high-quality sources, and we're willing to pay a lot for it."

Even as models improve, human data markets compound with fewer tasks, higher complexity, richer feedback, and sustained spend, whereas human data companies that master operational excellence and research relationships will be top performers long after the next training paradigm.

Observations Today

Splinter effects of increased competitions in a tri-opoly of AI labs for novel frontier environments include:

  • WTP uplift for exclusivity in contracts, once a lab has clearly identified a team as able to scale quality, rare envs
  • Complex data sourcing teams within RL env/human data companies alike that go as deep as obtaining bankruptcy filings from 3rd world countries with relatively unsophisticated financial market regulation for complex finance RL envs
  • Synthetic data not as a function for complete training data replacement, but rather proxying:
    • Data anonymization for sensitive domains - which preserves the variety and difficulty associated with original data (that current synthetic data approaches struggle with) while using synthetic data to "anonymize" enough for enterprise usage and GDPR compliance
    • Synthetic data engines for task creation in environments, such as to maximum contributors' time spent providing reasoning traces and completed work rather than formulating scenarios that may or may not be representative of real world ones
    • We generally find that ensuring quality, diversity, as well as predicting real world tasks are where most synthetic data generation fails today
  • ML 1.0 players that have pivoted from trust & safety for the big tech firms into data providers (TaskUs, SuperAnnotate, Appen, etc...)
  • Pre-mature appetite to pre-empt the next phase of the market, given the proliferation of players now.
    • Technically RL environments falls under the category of RLaaS (RL envs are the first form of sophisticated post-training that labs are willing to outsource at scale) but many RL envs companies are now professing that their primary customers will be the likes of Doordash, Ramp, and Goldman Sachs' applied MLE teams

Some last observations as labs as buyers compared to buyers in enterprise:

  • Data procurement processes are much less bureaucratic, much more in spend limit, and much more unsophisticated in scope. If your data is novel, and our models perform poorly on your benchmarks, we'll buy your data
  • Although the "list" of companies buying data is small, the unique structure of labs are such that different research teams are such disparate units that they procure data differently

From my previous piece - but still relevant here!

From my previous piece - but still relevant here!

RL as a service

What is RLaaS? Quoting from Felicis, they are "managed platforms where companies can train RL agents on their own objectives without needing internal expertise in RL." Lets extend that definition to cover smaller tasks in RL that sophisticated teams at talent aggregator startups can slot into their workflows, like what Judgement Labs does providing detailed traces of AI agent trajectories and evals, and universal verifiers that make RL easier for non-strictly verifiable tasks.

Even as models advance in what they can do on benchmarks and ever complex tasks (with extended tool use), they need proprietary enterprise data, context, and integration to actually do real world tasks well. But with proper orchestration and reward training, you can optimize for business-specific metrics to great effect, replacing a lot of web 2.0 functionality (like brittle RPA automation, process-built software, etc.).

If RL environments are a nascent space, then RL as a service is in its premature infancy. Many companies who are early self-professed "RLaaS" startups are still discovering what enterprises MLE teams are or aren't willing to outsource. The ripest use cases today are with enterprises like AirBnB, Stripe, Doordash, Ramp, and others where AI-native products are integral to core applications and data availability is both well documented, architected, and available to use (surprisingly not the case in a few recent hyper-growth startups despite willingness to adopt).

But if this is true, RL as a service today looks like one that can only work selling to power law companies. I wrote previously about how quickly enterprises are becoming sophisticated enough to adopt, but with that first point in mind:

Inference-wise (and sophistication as a qualitative aspect), Cognition, Cursor, Reflection, top code-gen players, Perplexity, and Suno are the biggest AI inference users. Fireworks has close to 70% customer concentration on two customers - Cursor and Perplexity - for one. One can, if taking a usage/value-based pricing model approach, get to 9-figures ARR territory just by selling/contracting with one of these players. In other words, you can "immediately bootstrap" to nine figures ARR by doing RL-as-a-service for a few top power-law spenders (e.g., Reflection, Cursor, Cognition).

In a vacuum - if you take this market condition to last forever - my go-to-market is a Super FDE, services-first play: win the biggest, fastest-moving buyers even at zero or slightly negative margin to maximize learning, surface high-value workflows, and capture massive budgets quickly.

RL as a Service today can mean many things, but likely just means everything within the training pipeline - how I can work with product teams or internal eng teams to transform their data into actionable models?

But RL as a service has flavors in approaches too, molded in spirit of ML as an art form meaning that there is no industry standard to RL adoption/sophistication. These are split into:

  • Companies that build on policy RL and model interactions across pre-existing company ontology and data without altering pre-existing data lakes and ERP/CRM systems
    • Because implementations are bespoke and highly custom, this is largely services-based and targets the highest value workflows first
  • Companies building generalized MLE platforms/products to abstract away academic MLE in enterprise adoption, allowing companies to frictionlessly plug in enterprise data and create workable models
    • This was the goal of companies like openpipe, osmosis, metis, and Veris AI, albeit with different assumptions about the ML sophistication of the end user
  • Companies that verticalize from the start, developing automations, models, and RL use cases one by one in effort to create fully vertical agents
    • Teams here find the most success stealing budget from UiPath/Cognizant $20M budgets in legacy hospital chains
  • Companies that rip out and rebuild ERP/CRM systems completely ground up, to implement environments more conducive to on policy RL for any particular automation
    • Companies here are still entirely seed stage and experimental

In no particular order, here are examples of companies that embody the above:

Forge, a37, brain co, phinity, osmosis, veris, e3group, distyl, dubsuf, sola, applied compute:

Helpful additional reading:

  1. Mary Meeker's 340 page report on trends in AI (https://www.bondcap.com/reports/tai)
  2. Redpoint ai64 (https://www.redpoint.com/ai64/)
  3. Coatue deep dive into whether we're in an AI bubble (https://drive.google.com/file/d/1Y2CLckBIjfjGClkNikvfOnZ0WyLZhkrT/view)
  4. Why AI is not a bubble (https://www.derekthompson.org/p/why-ai-is-not-a-bubble)
  5. Mechanize Blog on RL (https://www.mechanize.work/announcing-mechanize-inc/)
  6. Vintage Data / Alexander Doria - The Model is the product (https://vintagedata.org/blog/posts/model-is-the-product)
  7. Chemistry VC – "RL Reigns Supreme" (https://www.chemistry.vc/post/rl-reigns-supreme)
  8. Felicis – "Rocket Fuel for AI: Why Reinforcement Learning Is Having Its Moment" (https://www.felicis.com/insight/reinforcement-learning)
  9. Github Repo for inferring what the shape of human data for better post-training looks like in the future (continuous learning)
  10. Personal list of all notable human data/RL env companies I've seen (https://www.notion.so/RL-Envs-Dump-29545bdf028180c4ae84f3be65973d10?source=copy_link)