On 20 October 2025, when half of the world was engrossed in Diwali celebrations, the digital sphere experienced a major mishap. Amazon Web Services (AWS), Amazon’s cloud computing platform, had suffered a technical issue, and in the minutes following, it took down a host of websites dependent on its hosting service. While cloud computing outage is a common phenomenon, this one was big – it took down prominent services like Snapchat, Perplexity, Amazon’s e-commerce website, Prime Video, Signal, Atlassian, Apple’s digital services, and more.
The outage was a severe one and led to the crippling of thousands of major online platforms for over 15 hours. The failure, which originated in the company’s critical US-EAST-1 region in Northern Virginia – one of AWS’s oldest servers, swiftly took down services ranging from social media and gaming to banking and government portals. While resources were deployed and the situation was taken under control, the outage did expose a weakness – the internet relies heavily on ‘Big Tech’.
AWS outage: What exactly happened
For those who want to understand the technicalities behind the issue, here’s a simpler explanation. The outage, triggered by a Domain Name System (DNS) resolution failure affecting the DynamoDB database service endpoint during a routine software update, immediately disabled critical functions for an enormous part of the digital economy.
The list of affected platforms was staggering – Snapchat, Roblox, Signal, Zoom, Coinbase, Venmo, Etsy, and even Amazon’s own retail site and smart devices like Ring doorbells and Alexa experienced widespread connectivity issues. Major airlines, including Delta and United, faced operational challenges, while financial institutions like Lloyds Bank saw disruptions in the UK. Downdetector, the website that monitors outages, recorded millions of user complaints globally.
That’s almost 30 per cent of the internet!
AWS commands approximately 30% of the global cloud infrastructure market, leading the charge alongside Microsoft Azure and Google Cloud, which together power more than 60% of the public cloud. In the pursuit of cost efficiency and scale, most IT firms have consolidated their digital workloads onto these big names. When a core service at the dominant cloud provider fails, the effect is a cascading, global blackout, proving that a single technical glitch can be felt from a bank in London to a person streaming content in Sydney.
Server farms need to run healthier
The root cause of the outage — an internal technical error during a standard update to a core database API — highlights the responsibility resting on the shoulders of IT firms that manage these colossal server farms. Cloud infrastructure is a vast network of physical, power-intensive data centers. Maintaining these highly complex ecosystems demands relentless efforts in two critical areas:
1. Software and configuration control: Routine software changes, like the one that failed the DNS in this AWS incident, must pass through multi-layered, redundancy-checked deployment processes. An error in a single line of code can instantly cripple services across the globe.
2. Redundancy and resiliency: While the cloud promises high availability, the incident highlights that Availability Zones — separate data centers designed to isolate failure — can still be linked by shared internal dependencies, allowing a localised error to propagate across an entire geographic region. Maintenance schedules must prioritise genuine isolation and immediate rollback capabilities.
What damage could a major cloud outage do?
The AWS outage provided a taste of what a more severe or prolonged cloud failure could bring upon us.
Economic collapse and financial loss: This is probably the biggest loss caused by a server outage today. For businesses, downtime equals lost revenue. Analysts often estimate the cost of an outage in the millions of dollars per hour, resulting from lost transactions, suspended manufacturing, and halted trading. Financial platforms, exchanges, and retail services all come to a standstill.
Disruption to critical services: Beyond consumer apps, a major cloud failure compromises essential national services. The recent outage affected government tax services in the UK and educational platforms used by universities, preventing students from accessing course materials and submitting assignments. In the worst-case scenario, utilities, healthcare systems, and national security infrastructure reliant on these cloud backbones could be compromised.
Can anything be done to mitigate such downtime errors in future?
As the world starts to rely more heavily on internet infrastructure, it falls upon lawmakers to come up with policies that prevent market concentration with ‘Big Tech’. Smarter policies governing this sector to mitigate such issues to a larger extent, and coupled with a robust maintenance procedure and better data center infrastructure, an outage like AWS US-EAST-1 can be taken care of without having to wait for hours to get things back online.