It’s not the agents, it’s the system: How to move forward with multi-agent AI

Agentic AI is at the planning and piloting stages in many enterprises, with few implementing it at scale. Its emergence coincides with growing recognition that AI initiatives often take longer than expected to generate significant top- and bottom-line improvements. This Viewpoint details the benefits of moving to multi-agent AI despite understandable hesitancy and shows how a systems-based approach can deliver gains while maintaining adequate safeguards.

HOW AGENTIC AI IS EVOLVING

Well before generative AI (GenAI) becomes fully embedded and scaled up, the next wave — agentic AI — is already in full swing. In the Viewpoint, “Beyond Prompts: Building Business for the Age of Agentic AI,” we explored the shift from generative to agentic AI, including our predicted market growth of nearly 50% from around US $7 billion today to $45 billion in 2030.

Unlike GenAI, which responds to prompts and follows static rules, agentic AI systems act with goal-directed intelligence. Agentic AI itself is not new. Autonomous vehicles represent the first wave of early-stage agentic capabilities that perceive, reason, and act to navigate a vehicle toward a destination while continuously learning and adapting. What’s new is applying this same ability to perceive, reason, act, and learn to a new wave of AI tools for business (see Figure 1).

show modal — Figure 1. How does agentic AI work?

Evolution has been rapid. Three years ago, the first single AI agents were introduced, enabling large language models (LLMs) to call APIs, search, and run code. In 2023, tools like AutoGen emerged — multi-agent frameworks featuring role-based agents that talk with one another and with humans to complete complex tasks. Over the last two years, vendors like Microsoft, Google, and AWS have introduced multi-agentic platforms capable of deploying coordinated agent teams, with one agent managing the others. Vendors claim to have built in key features necessary for enterprise deployment (e.g., security, compliance, audit trails, and workflow engines).

Early multi-agent use cases include multidisciplinary tasks, such as software development, customer support, and financial services onboarding. For example, MetaGPT is an open source framework that lets users create a multidisciplinary team of AI agents to develop software solutions. Similarly, multi-agent solutions for customer support bring together individual AI agents that focus respectively on customer intake, data retrieval, decision-making, execution, compliance checking, and supervision (including escalation to humans). These solutions aim to reduce processing times, improve accuracy, and increase reliability while maintaining adequate personalization.

Multi-agent AI promises to be key in getting significant business value from AI. By integrating more deeply into core processes — and more closely mirroring how humans collaborate — it offers inherently greater capabilities than traditional or single-agent AI tools, which typically focus on isolated tasks. As Andrew Ng, founder of DeepLearning.AI, has argued, for most businesses the greatest AI opportunity lies in building applications that leverage agentic workflows rather than focusing solely on scaling traditional AI models.

IF ONLY IT WERE THAT SIMPLE

Unfortunately, new multi-agent solutions and benefit claims come at a time when many large businesses are disillusioned with their AI ROI. For example, MIT’s “State of AI in Business 2025” report found that despite $30-$40 billion in enterprise investment in GenAI, 95% of organizations were getting zero return. What’s more, although nearly half of companies had implemented general-purpose LLMs, only 5% had successfully moved task-specific GenAI beyond the pilot stage. While a lag between deployment and measurable business impact is common with emerging technologies, AI appears to be experiencing a deeper-than-expected trough of disillusionment.

Business media have explored a range of reasons for this disconnect — including poor data quality foundations, a tendency to stall at the pilot phase rather than driving process change, overreliance on generic “horizontal” AI tools and copilots instead of domain-specific solutions, limited attention to the human dimension, and weak alignment with financial metrics. The common factor across these problems is an emphasis on the AI tool itself rather than on changes to the broader system — data, processes, organization, people, and governance. This lack of focus is perhaps understandable as long as AI is limited to single tasks or copilot roles. It’s also why multi-agent AI has much greater transformational potential.

Importantly, multi-agent AI is not a technology fix for these issues. It does not remove the need for process redesign, workforce adoption, and change management — if anything, it raises the bar. Multi-agent AI should be viewed as an orchestration capability that enables end-to-end process change, but its value still depends on whether the organization embeds it into daily (updated) workflows, redefines roles and decision rights, and actively manages adoption.

Executives who have already invested significantly in AI tools, including single AI agents, without seeing much ROI, may be understandably reluctant to double down on new investment in multi-agent AI. It’s natural to want to wait until GenAI has become more mature within the organization before going on to the next stage of technology development. However, early examples suggest that waiting for AI “maturity” without simultaneously addressing operating model and adoption challenges simply delays value and can lead to repeating the same pilot-to-production traps with the next wave.

REASONS TO MOVE NOW TO MULTI-AGENT AI

For the first time in AI history, a system appears capable of augmenting entire processes — rather than just individual steps. This represents a natural evolution of the agent archetypes that have already begun to emerge (see Figure 2).

Indeed, there are strong reasons to move toward multi-agent AI sooner rather than later, even if GenAI has not yet shown its full value:

The foundational investment is the same. Companies implementing AI tools have already invested in critical foundations, such as improving data quality and structure; modernizing integration architecture; and upskilling teams on prompting, monitoring, and decision oversight to focus on human-agent collaboration. These foundations are the same for multi-agentic AI, so additional investment is a win-win.
Multi-agent technology is already mature. Following rapid progress over the last two years, global ecosystems are already offering enterprise-ready multi-agent AI tools. Major platforms from vendors like Microsoft, Google Cloud, AWS, Anthropic, LangChain, and CrewAI offer built-in agentic and automation layers. Although integrating the technology in the core of the organization requires careful implementation, plug-and-play integration frameworks and APIs are quick to deploy, accessible, and secure. Governance, compliance, and responsible AI tooling have become increasingly standardized.
Multi-agent can finally deliver on AI’s ROI promise. Because multi-agent AI enables real process change, it can help companies climb out of the “trough” to deliver the expected business benefits from their AI investments. To stay competitive and avoid playing catch-up later, companies need to move now.

TAKING A SYSTEMS APPROACH

Implementing multi-agent AI in an enterprise requires focusing on data, processes, organization, and governance. Arthur D. Little suggests the following three steps.

1. Map out processes that add the most measurable value

Look for processes that are multi-role, rely on multiple data and information sources, involve both rules and judgment for decision-making, and are high-volume (thus inherently costly). Choose processes that yield observable, reversible outcomes so that risks associated with failures can be managed.

Remember to factor in change readiness: clear process ownership, stakeholder alignment, frontline willingness to adopt new ways of working, and the ability to measure adherence and outcomes.

Without these, even well-designed agents will remain trapped in pilots or parallel-run mode. Benefits are greater when processes connect across organizational silos rather than being concerned with individual tasks. Good examples include customer management, IT operations, marketing, commercial/contracting functions, supply chain planning, and financial administration. Mapping should include clarifying workflows, roles, handoffs/interfaces, governance, and risks. The Bank of New York (BNY) is a good example of how multi-agents can be embedded into processes like payment validations and code repairs (see sidebar).

BNY deploys Google’s Gemini 3

BNY is deploying more than 100 digital employees into its tech workflow to simplify processes and improve efficiency and seamlessness. The bank’s internal AI platform, Eliza, already supports more than 120 automated tasks. For example, AI agents have been deployed in client onboarding, where staff juggle document collection, verify ID and tax forms, locate key details, look up risk information, and log everything into internal systems. Gemini 3, which debuted in November 2025, can interpret text, images, tables, PDFs, and audio together, so employees can load mixed financial materials and ask the model to interpret and synthesize the important parts. Because a key aspect of the deployment is safety and security, clear boundaries are being set around what the technology can see, decide, and escalate. Each agent must pass an internal model-risk review before it goes live, and systems are governed by tight access controls that determine what information they are allowed to use. Once deployed, the team monitors the agent’s performance daily and incorporates those results in a continuous feedback loop.

2. Embed agents into workflows & adapt operating model

Having identified which processes and roles are most suitable for AI agents, analyze and redefine them to be suitable for enhanced automation. In doing this, consider the end-to-end process, rethink it with AI in mind, and build a multi-agent team rather than applying individual AI tools. Be sure to define perceive, reason, act, and learn functions, as well as guardrails, escalation, and autonomy boundaries. In most cases, the operating model around the process (ways of working, resourcing, organizational structures, and governance arrangements) will need to be revisited. Structured change management will likely include training and certification for new human-agent collaboration tasks, updated procedures, incentives aligned to adoption (not workarounds), and clear accountability foroutcomes when work is coproduced by agents and humans.

3. Establish guardrails & maintain AI as an additional workforce

It’s important to recognize that AI still makes mistakes and is poor at signaling uncertainty. For example, a 2025 study by Columbia University’s Tow Center showed an average 60% error rate when eight leading AI search and LLM systems were asked to find facts about a newspaper article. LLMs optimize for plausibility, not truth. In an agentic context, these errors can become material business risks, threatening control, compliance, and brand integrity. Multi-agent systems add a further challenge: brittleness can compound. Each agent has a non-zero probability of error, and when multiple agents pass work products, assumptions, or retrieved facts downstream, small errors can propagate and amplify. This is why more agents do not automatically mean more reliability — reliability must be engineered throughout the system.

Just like with humans, processes and guardrails can be designed to cope with AI agent errors. Practical measures are already emerging. Verification and validation layers, many of which can be automated, should become standard. Agents should cross-check their own outputs, compare reasoning paths, validate against structured data, and flag inconsistencies. Domain experts (humans in the loop) remain essential in the operating model to define what “correct” looks like, anticipate edge cases, and adjudicate exceptions when automated checks fail. In practice, this means designing for graceful degradation: restrict autonomy for high-impact actions, introduce approval gates at critical decision points, and use redundancy where it matters.

Examples include independent guardian agents designed to perform verification and validation, rule-based validators, or deterministic checks against source systems. Organizations must develop capabilities, such as hallucination spotting, automated validation pipeline design, scenario analysis (“What could this agent get wrong?”), and operational rules for when to trust the system and when to intervene. This means agentic AI requires onboarding, training, and iteration (similar to human workforces).

In the same vein, it’s important to tailor abilities to the job — an LLM might not be required for every case. For increased reliability and reduced probabilistic deviations from desired outcomes, small language models (SLMs) can play a role in constrained, high-control tasks. SLMs offer tighter controllability, domain specificity, lower hallucination rates in constrained tasks, and the ability to verify outputs against internal systems with higher precision.

Conclusion

ACT NOW & FOCUS ON SYSTEMS, NOT MODELS

It’s tempting to put off multi-agent AI until traditional AI has proved its worth, but we recommend a different path. Moving now does not mean adding more technology; it means accelerating the operating model and change management work required to embed AI into business processes and decision rights and drive adoption at the frontline. Multi-agent AI can unlock step changes in productivity, speed, and decision quality, precisely because it enables end-to-end orchestration rather than isolated task automation. However, multi-agent architectures can compound errors if unmanaged, so engineered reliability (e.g., guardrails, validation rules, escalation paths, and monitoring) must be treated as part of the system, not an afterthought. Winning enterprises will be those that redesign processes and mechanisms and build robust, resilient systems around imperfect models, just as they already do for humans.

By Joeri Samyn, Michael Majster, Dr. Thomas Thiele, Emilio Lapiello, Matteo Piscopo