AI Adoption Outpaces Infrastructure as Capacity Limits Emerge as Key Bottleneck, Says Datadog

As AI adoption accelerates, operational complexity – not model intelligence – is becoming the primary barrier to reliable AI at scale, according to new data from Datadog, Inc the AI-powered observability and security platform.

Published on:

23 Apr 2026, 6:19 am

3 min read

Datadog’s State of AI Engineering 2026 report, based on real-world data from thousands of organizations running AI in production, highlights a compounding complexity challenge as AI systems scale. Nearly seven in ten companies (69%) now use three or more models alongside increasingly complex agent workflows. Around 5% of AI model requests fail in production, with nearly 60% of those failures caused by capacity limits – leading to slowdowns, errors, and broken experiences in AI-powered applications.

Additional key findings:

· Multi-model is now the norm: OpenAI remains the most widely used provider at 63% share, alongside rising adoption of Google Gemini and Anthropic Claude which grew by 20 and 23 percentage points, respectively.

· Agent framework adoption doubled year-over-year, accelerating development but also introducing more moving parts into production systems.

· The amount of data sent to AI models per request is also rising: the average number of tokens more than doubled for ‘median use’ teams (50th percentile of usage volume) and quadrupled for heavy users (90th percentile).

“AI is starting to look a lot like the early days of cloud,” said Yanbing Li, Chief Product Officer at Datadog. “The cloud made systems programmable but much more complex to manage. AI is now doing the same thing to the application layer. The companies that win won’t just build better models - they’ll build operational control around them. In this new era, AI observability becomes as essential as cloud observability was a decade ago.”

Speed Requires Control

Competitive pressure is accelerating AI deployment across startups and large enterprises alike. But as systems scale, speed without control creates risk. Failures are increasingly driven by system design, including fragmented workflows, excessive retries, and inefficient routing.

"The next wave of agent failures won't be about what agents can't do but what teams can't observe,” said Guillermo Rauch, CEO at Vercel, the company behind Next.js and a leading platform for building AI-powered web applications. “We built agentic infrastructure at Vercel because agents need the same production feedback loops as great software. Unlike traditional software, agents have control flow driven by the LLM itself, making observability not just useful, but essential.”

“Innovation alone isn’t enough,” added Li. “To scale AI with confidence, organizations need real-time visibility across the entire stack – from GPU utilization to model behavior to agent workflows. Visibility and operational control are what allow teams to move fast without sacrificing reliability or governance. At scale, how you operate AI may matter more than the models you choose.”

India specific quote by Yadi Narayana, Field CTO, APJ, Datadog - "In India, adoption is progressing at a significantly faster pace. Teams are implementing multi-model and agent-based architectures early in their journey, enabling rapid innovation but also introducing substantial hidden complexity. The accumulation of technical debt is a key challenge: new models are integrated quickly, while legacy patterns persist, making systems increasingly difficult to manage. Reliability risks, including rate limiting, are amplified given the scale at which many platforms operate.

The critical challenge in India is not whether AI will scale, but how effectively it will perform at scale. As token usage and agent complexity grow, costs and stability can escalate if not addressed early. There is a clear opportunity to establish stronger foundations from the outset adopting modular agent design, improving telemetry, and implementing tighter usage controls so that scale is achieved without compromising reliability or margins."

^{𝐒𝐭𝐚𝐲 𝐢𝐧𝐟𝐨𝐫𝐦𝐞𝐝 𝐰𝐢𝐭𝐡 𝐨𝐮𝐫 𝐥𝐚𝐭𝐞𝐬𝐭 𝐮𝐩𝐝𝐚𝐭𝐞𝐬 𝐛𝐲 𝐣𝐨𝐢𝐧𝐢𝐧𝐠 𝐭𝐡𝐞}^{WhatsApp Channel now!} ^👈📲

^{𝑭𝒐𝒍𝒍𝒐𝒘 𝑶𝒖𝒓 𝑺𝒐𝒄𝒊𝒂𝒍} ^{𝑴𝒆𝒅𝒊𝒂 𝑷𝒂𝒈𝒆𝐬} 👉 ^Facebook^,^{LinkedIn, Twitter, Instagram}

Artificial Intelligence

Cloud Computing

Tech Trends

AI Engineering

AI Observability

Data dog