We all need personal vector databases
Today, agentic systems are built on the back of architectures linking together foundation model calls with finetuned prompts and vectorized data. Arguably, LLM output capabilities reached a critical point in 2022 with the ability of GPT 3.5 to perform well on many white collar workflows. With the advent of reasoning agents that not only perform better (~99%) than humans on most white collar benchmarks, but are also given access to low latency capbilities (see bitnet advancements) as well as a diverse array of tooling capabilities (see computer use), its easy to imagine how the barrier to creating JARVIS nowadays is not our systems or base capabilities, but finetuning. Put another way, if we wanted to create truly agentic white collar squads in 2022, we wouldn't have been able to because our agentic workforce was only at a high school level. Now, we essentially have millions of low latency, eager, Ivy league graduate-level agents available on demand and ready to be imprinted with knowledge workflows. The querying infrastructure for how agents "actually" learn is still nascent at the time I'm writing this, and likely restricted to labs currently. Ceramic.ai and Theta Software are two very new startups tackling this that I've had an eye on.
In preparation for the fact that our base foundation models and system capabilities aren't limiting factors anymore, we have to look at data. Currently, there's a huge dearth of data in a multimodal format illustrating white collar workflows effectively to agents. This is of no fault of our own - we used to use humans to conduct most entry level tasks and they used to "learn on the job," with possibly some manuals and training videos (that were evidently far from reality in most jobs). This lack of data preparation relative to base foundation model advancements is the reason for the meteoric rise of startups like Mercor, Scale AI, and the explosion of evals providers today like Judgement Labs. Labs, having focused on building effective reasoning agents, now lack the domain expertise of markets with trillion dollar TAMs that they try to build for, hence enlisting outside help in acquiring data and building pipelines. This is the modus operandi behind Claude Code and Operator, who represent a special class of models trained on a large corpus of specially sourced data. Of course, much data in the world is actually not supposed to be observable from the computer. Many of it is proprietary workflows that are learned on the job. A strong majority of white collar industries also gain "alpha" by obfuscating their exact diligence and performance workflows for their outputs.
The democratization of training and agentic building infrastructure in the near future (see Crew AI and LangGraph) ought to mean that there is substantial alpha for white collar workers, who are somewhat versed with building with Cursor and take the initiative to understand basic full stack design and architecture, to collect their own workflow data and build their own workflow models. If, say, one were to simply build a series of crew AI agents in a conversation window with nothing more than reasoning model calls and some good prompting, one could instate direct memory transfers to a vector DB like SQLite and feed back vectorized interactions and querying capability to Crew AI's framework. Through just some good data hygiene, as well as "apprenticeship" based training, one could create a database of vectorized interactions with these agents such as to essentially train your own "junior employees." What is unclear, though, is how long a single white collar worker would have to spend through interactions to reach a strong level of performance. Additionally, the evaluation suites needed to test these "employee agents" for deployment at scale would add another layer of complexity.
While there remain many challenges to enterprise deployment of this model, I strongly believe the prerequisite is its informal deployment at the individual prosumer level. High agency junior employees who were able to get a job in this market whose workplace cultures allow for some degree of autonomy should explore this. The dearth of junior employee hiring today and the 1st degree proprietary access junior white collar workers have towards domain specific info should create an immense opportunity for high agency individuals. In my personal friend networks, I am already observing unreasonably large alpha between small technical teams, often still in university, building AI contracting deployments for legacy brick and mortars.
THe physical expansion is particularly interesting for me. We are seeing nascent examples of data collection in the physical world to train models with real world use cases (Peripheral Labs, every single prosumer AI wearable like the friend necklace). Many workflows, of course, require some sort of physical input. If we were to put a "Gartner Hype Cycle" lable to this, I'd say that we are in the early prosumer days of adoption for some sort of a physical workflow recording tool. For some reason, the robotics landscape has always lagged behind the software AI landscape. I'll won't dig into why here (though I suspect it has to do with high CapEx and the fact that every robotics company is building things in-house), but I strongly believe in some medium existing in our current personal hardware devices to capture and vectorize data like physical conversations, our sights, and hearing (perhaps like the black mirror episode, Eulogy).
This may be unreasonable, but I see everyone becoming a "middle manager" in the future. While specialization of labor is how we factorialized and improved outputs in the 20th century, agents have multiplied the population power of one individual, such that one trained manager with some savviness of building with modern day systems can specialize their agents. Everyday, the product focused direction of top AI labs kills another wrapper-layer startup - Otter AI and internal business ops for example - and imagine the effectiveness of all of these apps when you plug and play a year's worth of operating data already. I love the implication. On one hand, as an almost history major at uni, it means that we are reaping the consequences of another abstraction layer in computing (Von Neumann's machine code and architectures to compilers to C to python to LLM code prompting). In another sense, this is how our GDP per capita skyrockets while our population decreases. This is a new scaling law for enabling a higher quality of life and the early opportunity is a huge enabler for social mobility.