The Unreasonable Hype of Generalist AI Agents

AI is replacing traditional software systems and is set to make the world go a lot faster, literally in front of our eyes. But we must discuss a new trap set in the market: a dream of general, independently thinking AI agents that can handle anything you throw at them is entering peak hype. But this seductive fantasy is a misunderstanding of both what AI can do near-term and what business processes need.

Generalist AI agents vs specialist AI agents. Wild claims surrounding AI agents - SaaS is dead - but the reality is different.
(Many such cases of egregious claims-it is easy to get hyped about AI agents, but the reality in the trenches with current technology is very different.)

Prediction time–in 2025, we will hear two new kinds of cautionary stories about AI agents. First, an enterprise will deploy generalist AI agents, expecting them to outperform their best long-tenured employees at their main specialty. But the generalist AI agents will be found to be missing three things: niche expertise,1 necessary meticulousness,2 and domain experience.3 In other words, a junior (digital) worker replaced a senior (human), the work quality is in free fall, and losses are mounting.

Second, a Fortune500 corporation will deploy proactive AI agents designed to make independent decisions and “not just operate, but innovate.” The project will start hitting snags when their AI agents complete 50% of their assigned work items, get stuck in a loop on another 30%,4 and get distracted and go entirely off the rails in the remaining 20%. It’s finally sent back to the drawing board when one order management agent tries to be proactively helpful by initiating a contract renegotiation with a major supplier, while IRS puts the company under investigation due to a finance agent creatively exploring somewhat edgy opportunities for tax optimization. Over $10M in sunk costs, integrations development and damages have to be written off.

We love the tales of the Renaissance polymaths and out-of-the-box innovators. We aspire to be humans with such qualities. And without such people, the best organizations couldn’t succeed. But the reality of company operations is that the gears wouldn’t be turning without many more workers excelling in deep specialty, meticulousness, and (ugh, but it’s true) staying in their lane most of the time. It’s the same with humans – a Pixar creative director or Coca-Cola CEO would likely do a terrible (though certainly expensive) job pushing paper in the back office.

What does this teach us about rolling out AI agents5 this year? I have some thoughts on this as someone who led the product development of Rossum (we can be described as a widely adopted, highly specialized AI agent platform) for almost a decade now. These will be definitely biased towards what I have and haven’t seen work, but also quite informed.

Let’s look harder together at generalist AI agents vs. specialist AI agents, including some real success stories. Then, we’ll lay out a path for organizations to start properly with AI agents, and scale what works. AI-first organizations will beat the laggards, but how can you move fast without breaking things?

Be the first to see Rossum’s specialist AI agents in action.
Automate paperwork end-to-end and focus on growing your business.

Why focus beats flexibility

Current AI models have an incredible level of intuition6 coupled with massive encyclopedic knowledge across many domains and patterns learned from the ~whole internet. This makes them excel in many tasks, especially narrowly defined ones. However, the AI models struggle7 with true expertise (knowing the limits of their knowledge, hence hallucinations and reasoning lapses), connectivity (infrastructure to gather all necessary inputs and perform the actions they need), and reliability (when repeating mundane trivial operations many times, they make a mistake much sooner than a human).8

Challenges in scaling AI agent use cases. Reliability is the highest challenge when deploying agentic AI.
(Insight Partners surveyed 105 enterprise executives on short-term challenges in deploying agentic AI.)

That’s a problem when using these models to build AI agents! But instead of fixing it, the market’s newfound obsession with “proactive” agents that figure things out independently pours fuel onto the fire as every single one of these issues is exacerbated. With our current state of technology,9 jumping straight to out-of-the-box, proactive agents is not innovation, but as reckless as building the first nuclear power plants without any experience operating steam engines first. The “steam engines” of our day are in-the-box, workflow agents. They can still behave more like coworkers rather than tools, but work reliably within the guardrails of existing processes, and follow clear escalation paths when facing uncertainty.

Look at where AI is delivering value today. Take Gong. They could have built a general sales AI that tries to do everything. Instead, they focused ruthlessly on one thing: Understanding sales conversations. They built specialized infrastructure to process calls, specialized models to analyze patterns, and specialized interfaces to deliver insights. The result? They’re redefining how enterprise sales works.

Or look at document processing systems like Rossum. We didn’t try to build a general-purpose AI that could handle any paperwork task. We focused on becoming the best in the world at extracting structured data from business documents and end-to-end automation of transactional paperwork. This focus allowed us to build deep domain expertise into our models and create validation systems that financial teams trust.

Different droids.
(Even the Star Wars universe is teeming with specialist robots!)10

Technology for specialist AI agents

Because of the current AI model challenges, AI agents are not going to work well “as is” without extra support. Anyone building AI agents must solve three main problems: domain expertise, domain integration, and quality control.

Domain expertise can be solved by purpose-built intelligence: Turning generalists into specialists involves (a) narrow factual knowledge, (b) even narrower procedural knowledge, and (c) adopting an appropriate style of work. In practice, this may mean fine-tuning AI models on specialized datasets, embedding them into predefined fixed workflows, and asking them to produce output in a handcrafted format that enables them to avoid hallucinations and remain 100% focused.11

Domain integration is an infrastructure problem: Specialist agents, like specialist employees, need focused tools and clear interfaces to (a) understand the task context,12 and (b) execute appropriate actions. This can certainly be solved by a haphazard screen/mouse agent slowly navigating the corporate SAP on its own, like a generalist ops person might navigate the SAP “intuitively” by trial and error.13 14 But it seems more efficient to replicate the specialist experience instead (of someone deeply familiar and streamlined in the specific tooling to get their tasks done) with an enterprise-grade integration platform that limits the agent action space to purpose-relevant operations executed immediately and reliably.

Quality control is all about the right guardrails: When AI can access any system and make any decision, you have created a risk equal to giving a self-confident but junior person unchecked power. Experienced specialists instead (a) value and follow their processes, (b) know when to ask others for advice or a second opinion, and (c) want to measure their own work so they can improve. AI agents must be able to reliably follow the same processes, end to end (with the ability to “self-configure” for that process as a huge bonus – that’s the true AI). They must have a human escalation mechanism (including a great in-app user experience for the humans) as a first-class capability. And they should be accountable for their automation, offering reporting and analytics of the AI performance within their process.

New wave of systems of agents. Shows groups of cooperating AI agents as autonomous workers.
(Foundation Capital’s Joanne Chen and Jaya Gupta envision a new wave of Systems of Agents, that “reimagine how software operates [differently from] traditional software systems that passively wait for human input. Groups of cooperating AI agents aren’t just tools; they’re autonomous workers—capable of capturing, processing, and acting on both structured and unstructured data with unprecedented intelligence. The agents understand context, make decisions, and continuously improve.” This is very close to our “AI agents platform” thesis.)

Perhaps you see the common theme – it would be desirable to have an AI agents platform to cover these concerns in a specific domain. In fact, much like we had “systems of record” in past decades, “systems of agents” will be the agent-centric type of platform of the future. These platforms will create an environment for both agents and humans – robust integration framework, performance management system, and user interface for efficiently handling human escalations.15 When a document processor spots an anomaly, it needs to know exactly how to route it to the right human expert. When a sales analyzer spots a risk pattern, it needs to alert the right manager immediately. But the escalation doesn’t end with that – the agent must learn from human actions and not escalate the same situation again and again!

Drake meme. AlongsideAI agent process - gather context, follow process, execute actions.

Ultimately, a workflow agent in this platform follows a very straight loop: gather context, follow process, and execute actions.16 This is a lot simpler than proactive agent’s structure involving goal-gathering, planning, coordinating17 and memory. And simpler means faster, cheaper, and more reliable. Now think about your company processes: does it actually deliver any less value?

Proof in practice

Did you know that the most successful enterprise AI practices have been running “systems of agents” for years, long before the current agent hype cycle began?

Gong: Specialized intelligence in sales

Gong could have built another CRM or a generic “AI sales assistant.” Instead, they focused on one crucial problem: Understanding sales conversations at scale.

They built specialized infrastructure to process conversations, analyze patterns, and deliver insights. Their system doesn’t only transcribe calls – it understands sales-specific concepts like deal stages, competitive mentions, and pricing discussions. This specialization allows them to solve problems that would be impossible for general AI, creating an intelligence layer that modern sales teams rely on daily.

Rossum: Because generic AI is bad at paperwork

At Rossum, we succeed in the Intelligent Document Processing space by understanding what enterprises need – not an AI that can “read anything,” but a complete system for handling business transactions end to end.

At the core of our approach is a specialized AI architecture. Our T-LLM take the LLM formula and adapt it to the best style of document work. The T-LLM solves three critical problems of generic LLMs: the ADHD of generative models that can hallucinate or lose track when capturing swathes of information, instant learning from user feedback on specific fields and layouts, and intelligent, auditable confidence scoring that knows when to escalate to humans.18 But the technology is just half of the equation.

Rossum is a fully-fledged AI agents platform because of how it handles the entire transaction flow. Instead of only extracting data, the system manages communication with document senders, orchestrates approvals and validations, and integrates deeply with ERPs and other enterprise systems to provide the context (such as master data) and actions (like communication or ERP export) to the AI workflow. This end-to-end coverage means businesses can finally treat their transaction documents as a proper digital process, not just a collection of files to process.19

Intelligent document processing framework for AI agents. Workflow AI agent following process and executing actions within Rossum's document AI agent platform.
(Workflow AI agents “straight loop” of gathering context, following process and executing actions within Rossum’s document AI agent platform. AI evaluation reporting and human escalation20 are also integral part of the platform.)

Your road ahead

Let’s get tactical. The AI revolution is coming (maybe even all the way to Artificial General Intelligence), and you need to have a plan as to how to make AI Agents work for you.

Time it right: Don’t rush, don’t wait

More haste, less speed—as the saying goes. Rush into AI without proper foundations and you’ll build something that works in 70% of situations but fails catastrophically in the remaining 30%. Worse, you can’t predict which 30% until it’s too late. This isn’t about technical failures—a catastrophic AI performance like this will create lasting mistrust in your organization that will slow you down even as the technology keeps improving. Think self-driving car fleets that declared victory too early, burning billions in capital and years of market confidence.

But being too cautious is equally fatal. Adapting organizations to AI takes time, just as it took years to transform paper offices into digital workplaces. The business landscape may not be totally reshaped by AI in 2025, but wait five years, and everything will change. Margin structures will shift dramatically. The pace of business will be diametrically different. In most industries, being a year behind your competition in AI adoption will mean watching your book of business evaporate.

Remember Pan Am, Kodak, Polaroid, Compaq, Blockbuster, or Radio Shack? Perhaps not, but that’s the point—these were world-class brands backed by indomitably-looking enterprise structures. None exist today. Even industry leaders can and will get disrupted and outrun, and AI will be their ultimate stress test.

Start small, start right

Here’s the thing about AI agents: They’re going to transform everything, but not overnight. Your job isn’t to build the perfect AI system today. It’s to build something valuable now that’s ready for the age of agents tomorrow.

Start with the boring stuff. The repeatable processes. The high-volume workflows where reliability matters more than creativity.21 This isn’t playing it safe—it’s playing it smart. Build systems that nail the fundamentals first, then let them evolve as AI capabilities grow. Remember our specialist principle: Just as your best employees excel at specific tasks, your AI agents need to focus before they can scale.

“A typical enterprise runs hundreds of processes that could see value from even the most basic of automation, and many of the use cases that will provide direct and tangible value in the short-term will inevitably be agent-ish, not agentic.” – Your AI is not agentic, merely agent-ish by Leslie Joseph

And remember that to successfully identify practical AI agents, the first step is to evaluate their infrastructure. You already provide your employees office space, email & chat, and onboarding bootcamp. Your AI agents need their “operating system,” tools and integrations. (Then, the very best excel mainly at how little onboarding they need.)

SpaceX roadmap for heavy lift systems
(Like SpaceX, it’s ok to start small.)22

The magic happens when you get these principles right, not from chasing the latest AI pixie dust. Once you nail reliability in fixed workflows, complex territory opens up naturally. Your institutional knowledge compounds with every deployment. To be clear, I personally believe that in the long term, the AI pixie dust will be completely real.23 But I also believe that one must get good at operating steam engines before building a nuclear reactor,24 and that SpaceX’s Starship wouldn’t fly without building hundreds of Falcons first.

The future belongs to the deliberate but decisive. Pick your battles carefully. And always build something that genuinely works in the real world.

(P.S. If you are interested in AI agents for high-precision paperwork, keep a close look at Rossum–we are going to launch a lot, very soon!)

THANKS to Dan Lucarini, Leslie Joseph, Nathan Benaich,25 Nathan Warren, Petra Beck, Ralph Gammon, Tomas Matejcek and many of the Rossum team for reading drafts of this.

  1. The niche knowledge seems to be there, but nothing but a void is often hidden under a thin veil of impressive-seeming surface knowledge and hallucinations. ↩︎
  2. A generalist creative gushing with ideas and mild ADHD (the current base LLM “personality”) might not be the best person to carefully follow through. Leonardo da Vinci did not finish most of his inventions. ↩︎
  3. It is one thing to have an LLM that has read many accounting books do your accounting, and have a trained accountant do it who understands how your whole company operates and made many mistakes in the past they learnt from. ↩︎
  4. Devin the “autonomous” AI agent is famous for getting stuck in infinite loops the moment something doesn’t work as it expects. When testing agentic computer use, Claude takes a break from coding and begins to peruse photos of Yellowstone National Park. ↩︎
  5. Here, we use the term “AI agent” both for “agentic” and “agentish” AI as defined e.g. by Leslie Joseph. There exist even stricter technical definitions of agents, but they are ambiguous in practice, and not aligned with lay audience expectations. We assume an intuitive user-centric perspective: AI agent is a virtual colleague I delegate to, rather than a tool I use. ↩︎
  6. The trope of “calculating rational machines” couldn’t be further from the truth when talking about AI based on artificial neural networks. The breakthroughs in deep learning and transformers from the last 15 years are based not on an increase in precise calculation abilities, but in finding an extremely efficient way of encoding intuition. The first demo was a neural network that could recognize cars from birds from airplanes etc., in pictures–not by reasoning about wings and wheels, but by a deeper intuitive pattern recognition that results in a snap-moment of “I can just tell immediately without thinking”. ↩︎
  7. Or, as Anthropic’s Chief Scientist puts it: AI agents need to improve in tools, context, coding and safety. ↩︎
  8. They suffer from what could be described as severe ADHD – they’re anything but meticulous. One moment they’re brilliantly solving complex problems, the next they’re missing obvious details or chasing irrelevant tangents. Or simply and most frequently, they give up and skip part of the work. For tasks that require precision – like processing thousands of line items or ensuring regulatory compliance – they need a specialized harness of controls to keep them focused and reliable.  The easiest way to check your favorite AI about this is asking it to multiply two extremely long numbers “manually”. Even top AIs like Claude Sonnet, GPT-o1 and DeepSeek fail this egregiously (but confidently) 9 times out of 10 at the start of 2025. ↩︎
  9. And we are talking about the newest frontier Large Language Models here. ↩︎
  10. Picture by Andy Moore. ↩︎
  11. In some cases, adjusting output requires algorithmic changes. For example, LLMs coupled with tool use guarantee that the tool is used according to the function specification JSON schema. And for many tasks, instead of a forward-looking LLM, a bi-directional transformer AI such as BERT is much more accurate. ↩︎
  12. Let’s say on demand search in the master data list of suppliers, orders, or employees (or agents!). ↩︎
  13. The ops person will do the trial and error at first, but learn quickly. But experience-based procedural learning from tool use is a completely open research problem in the AI agent space yet–no one is near solving this! ↩︎
  14. It needs to carefully replicate also human permissions. Just like humans, they cannot go and increase your salary or take 300 days off in the HR system. Instead, they have access and expertise to use only specific tools – HR will not work within Github, while software engineers won’t go into Workday to get their tasks done. ↩︎
  15. Looking to the future, there are three next milestones for systems of agents. Let’s agree with Wayne Hamadi that AI agents should also manage their own setup, and as AI research progresses they should identify issues proactively. Ultimately, systems of agents bring data close to the compute, which can reshape entire IT architectures and blur the boundary between structured and unstructured data. This will get rid of data quality and nuance loss challenges that data-driven businesses struggle with today. But more on the future of systems of agents another time! ↩︎
  16. This is a good moment to check our workflow agent against other popular AI agent definitions. A popular definition by Wayne Hamadi again is that “an agent is a system that can act autonomously and proactively.” The workflow agent is certainly autonomous–handling most runs of the process without any human escalation is the basic minimum we aim at. Is it proactive? It follows a fixed workflow, but that can include ambiguous steps such as “identify anomalies” or extracting data by analogy without ever seeing a particular situation it handles. Proactivity becomes a philosophical matter. That is why I believe in a user-centric agent definition more. ↩︎
  17. Agent coordination is a weird one–the concept of “billions of agents” and building “multi-agent systems” is even more tenuous than deploying pro-active agents in the current state of technology. The golden rule is that if a small team or even better a single excellent person can get something done from start to end, the output will usually be (much) better than if a larger team does it. In fact, any distributed system brings awful complexity and overhead, and should be adopted only when there is no other option. Similarly, employ as few agents as possible within a single process–almost always, a single-agent system is the best call. ↩︎
  18. In case you are still curious about what it means to “adapt to the problem” from technical perspective, I’m covering a little extra on T-LLMs and our “discriminative decoder” in my last essay on building software products for the age of AGI. ↩︎
  19. Comparatively, a generalist pro-active AI agent would leave it to chance what happens with each business transaction processed, whether an irrelevant detail on the document won’t influence how it’s routed, and even whether and how are all the line items on each document transcribed. ↩︎
  20. To qualify as an AI agents platform, of course the human escalation must mean that the AI learns from every human escalation. In Rossum, this “instant learning” happens in real time. ↩︎
  21. This isn’t advice just to enterprise process owners but to product managers and founders alike! Financial times talks about the “boring” AI agents from the lens of the Y Combinator partners who “had been deluged with mind-blowing applications from start-ups looking to apply AI agents to fields that include recruitment, onboarding, digital marketing, customer support, quality assurance, debt collection, medical billing, and searching and bidding for government contracts. Their advice was: find the most boring, repetitive administrative work you can and automate it. Their conclusion was that vertical AI agents could well become the new SaaS. Expect more than 300 AI agent unicorns to be created.” ↩︎
  22. Picture by YNot1989. ↩︎
  23. I also strongly believe that specialist AI agents will retain their place and be immensely valuable even in a world where general AI agents finally work. In fact, I recently wrote a deep dive precisely on the topic of building software products for the age of AGI. ↩︎
  24. While the best scientists are running the AI “Manhattan project” and improving AI capabilities day and night. ↩︎
  25. Nathan seems way ahead of us! His main feedback was that this is nothing new and pretty obvious stuff. ↩︎

Related resources

Sign up to our newsletter

Ready to meet our AI agents?

Automate paperwork end-to-end and focus on growing your business.