AWS Downtime Shakes Business World: ‘Outages Now Cascade Not Just Across Services, But Entire Economies,’ Says Cockroach Labs CEO
Cockroach Labs’ CEO comments on why the business world keeps getting walloped by massive cloud service outages.
A massive widespread outage affecting Amazon Web Services rocked the business world Monday morning, leaving some of the world’s largest businesses temporarily in the digital dark.
The outage shut down access to many major online platforms including Snapchat, Roblox, Fortnite, online broker Robinhood, the McDonald’s app, Signal and other online services, The Associated Press reported.
Amazon officially confirmed the outage in a post early Monday on its website, attributing the downtime to “a potential root cause for error rates for the DynamoDB APIs in the US-EAST-1 Region.”
“We continue to investigate the root cause for the network connectivity issues that are impacting AWS services such as DynamoDB, SQS, and Amazon Connect in the US-EAST-1 Region. We have identified that the issue originated from within the EC2 internal network. We continue to investigate and identify mitigations,” read the latest update on the issue from the cloud services giant.
Now one CEO says that this latest massive cloud outage proves why IT teams should not be overly dependent on one service provider.
“Businesses are past treating events like this as one-offs. If your stack depends on a single region or control path, you’ve designed a single point of business failure. Minutes of disruption cascade into checkout stalls, payments fail, ads go dark, support explodes. Production-ready today means failure-tolerant by default, so a regional event becomes a non-event for customers and cash flow,” said Spencer Kimball, co-founder and CEO of Cockroach Labs, in a statement shared with MES Computing.
Following a widespread Google Cloud outage in June, which rendered every Google cloud service app in addition to other services that run on Google Cloud like apps Discord, Spotify and Snapchat inaccessible, Kimball told MES Computing that there are inherent flaws in existing cloud architecture.
“What we’re seeing today is the failure of systems that were never architected for real-world chaos—systems that assumed downtime was rare, that traffic was human-paced, and that regional dependency wasn’t a liability,” Kimball, who was also a former Google engineer, said at the time.
The rise of AI is also putting cloud infrastructure to the test.
Massive cloud outages will “continue to increase, especially as we see more AI capabilities being introduced into the enterprise,” Bob Venero, CEO of Future Tech Enterprise, Fort Lauderdale, Fla., told MES Computing’s sister site, CRN.
Kimball shared similar thoughts after the Google Cloud crash.
“AI runs 24x7. Agents don’t sleep. Outages now cascade not just across services, but entire economies. And yet most infrastructure is still optimized for peak concurrency and steady-state assumptions. That’s the gap—and we’re watching it rupture in real time,” Kimball said.
“Resilience isn’t a feature you layer on. It’s an architectural commitment. Performance under adversity—not in perfect conditions—is the real benchmark now. If your system can’t absorb failure without taking your customers down with it, you’re not production-ready in 2025—especially not in the AI era,” he added.