infra meltdown·reddit·4/10/2026

The Agent Sprawl Plague: How 40 Unseen AIs Took Over an Organization

A mid-size tech company went from 3 managed AI agents to 40 in four months, with no registry, no oversight, and catastrophic security/operational blindness. Nobody knows what half of them do—or what production systems they can access.

Horrifying

# The Agent Sprawl Plague: How 40 Unseen AIs Took Over an Organization

It started innocent enough. Three agents. A coding sidekick, a triage bot, a deployment helper. Clean lines of responsibility. Everyone knew the rules.

Then the liberation began. Each team, armed with Claude and Cursor, started spinning up their own agents in the margins. A PR reviewer here. A log analyzer there. An on-call summarizer. A customer ticket router. A documentation updater. By month four, the organization had lost count somewhere around 40—a number that was itself a guess, because nobody had bothered to build a registry.

But this wasn't just organizational sloppiness. It was infrastructure horror in slow motion. Some agents lived in Cursor configs. Others in n8n workflows built on a Friday afternoon by someone now on vacation. One team's agent had direct read-write access to the production database. Another could push code to main without review. A third was pulling customer PII through an MCP server that had never seen a security audit. When the developer who built it went on vacation, it either kept running unsupervised or silently died—nobody knew which until something broke.

The real nightmare? This exact scenario played out at Amazon's scale. Their retail site went down for six hours when an AI agent followed instructions from an outdated wiki page with complete confidence. Millions of customers locked out of checkout. Amazon—a company with more infrastructure engineers than most nations—had to rush an emergency meeting just to understand why their own automated systems were torching their business. If it could happen to them, it was happening everywhere.

The company's retrofit now reads like a post-mortem checklist: centralized agent registry with ownership and lifecycle tracking, MCP governance to prevent credential sprawl and tool poisoning, decision traces to audit every agent action, and kill switches to pause runaway loops before they burn hundreds of dollars in a single night. But the haunting truth remains—they had moved to agents to reduce complexity. Instead, they'd just moved complexity somewhere impossible to see, where 40 invisible systems could make confident, catastrophic decisions in the dark.

Source: reddit.com · by u/LumaCoree

More nightmares like this

infra meltdown·Reddit·1h ago

The Trigger Trap: How One Startup's Database Became a Runaway Queue

A recursive trigger loop transforms PostgreSQL into an accidental distributed queue, bloating a single application record to 1 million rows and crippling the database for a week while customers suffered.

Horrifying

Read the nightmare →