A DNS decision failure in Amazon Internet Companies’ US-EAST-1 area triggered cascading outages throughout 142 companies between 11:49 PM PDT on October 19 and three:01 PM PDT on October 20, 2025. In accordance with Amazon Staff in their official statement, the incident impacted each AWS companies and Amazon.com operations, together with AWS Assist features.
The technical failure demonstrated the concentrated dependencies inside digital infrastructure that helps roughly 30% of worldwide cloud computing operations. Gaming platforms Fortnite and Roblox skilled interruptions. Social purposes together with Snapchat confronted service degradation. Monetary companies akin to Coinbase and Robinhood reported points affecting tens of millions of customers throughout a number of time zones.
Subscribe PPC Land publication ✉️ for comparable tales like this one. Obtain the information each day in your inbox. Freed from advertisements. 10 USD per 12 months.
Engineers at Amazon recognized the foundation trigger at 12:26 AM PDT on October 20. In accordance with Amazon Workers, the investigation decided that DNS decision points for regional DynamoDB service endpoints triggered the occasion. The corporate applied preliminary mitigations by 2:24 AM PDT, although full restoration required extra hours of systematic intervention.
The incident revealed technical dependencies inside AWS infrastructure. After resolving the DynamoDB DNS situation at 2:24 AM, companies started recovering, however a subset of inner subsystems continued experiencing impairment. In accordance with Amazon Workers, the interior subsystem of EC2 chargeable for launching situations remained affected as a consequence of its dependency on DynamoDB.
To facilitate restoration, AWS briefly throttled particular operations together with EC2 occasion launches. This resolution affected downstream companies counting on EC2 infrastructure. Community Load Balancer well being checks turned impaired, leading to community connectivity points throughout a number of companies akin to Lambda, DynamoDB, and CloudWatch. Engineers recovered the Community Load Balancer well being checks at 9:38 AM PDT.
The throttling measures prolonged past compute companies. In accordance with Amazon Workers, the crew briefly restricted processing of SQS queues through Lambda Occasion Supply Mappings and asynchronous Lambda invocations. These limitations aimed to forestall extra system pressure whereas engineers labored by restoration procedures.
By 12:28 PM PDT, many AWS clients and companies skilled vital restoration. Engineers steadily diminished throttling of EC2 new occasion launch operations whereas addressing remaining affect. Full service restoration occurred at 3:01 PM PDT, concluding a disruption spanning greater than 15 hours from preliminary detection to finish restoration.
Some companies continued processing backlogs past the official decision time. In accordance with Amazon Workers, AWS Config, Redshift, and Join required extra hours to work by gathered messages. The corporate dedicated to sharing an in depth post-event abstract following the incident.
The outage affected organizations throughout sectors. Lifeless by Daylight acknowledged consciousness of AWS points affecting gamers’ skill to entry the sport on varied platforms. Genshin Impression reported issues with Epic companies, noting points with top-up features and login capabilities. Morning Brew reported that monitoring web site Downdetector obtained over 8 million studies across the globe through the incident.
The outage price companies tens of millions in misplaced income, with estimates suggesting substantial losses for e-commerce and promoting platforms through the 15-hour disruption window. In accordance with ParcelHero, when comparable outages occurred in 2024 at CrowdStrike, Fortune 500 firms skilled $5.4 billion in losses. The October 20 incident affected over 1,000 firms globally, creating comparable financial affect throughout sectors.
For digital promoting operations, the disruption highlighted infrastructure focus dangers. AWS powers vital parts of promoting know-how infrastructure, with firms like VideoAmp operating proprietary measurement methodologies on AWS Clear Rooms for privacy-enhanced analytics. Amazon’s advertising business has grown substantially, with AWS demonstrating 19% progress to $108 billion in 2024 and serving because the technical basis for a lot of promoting know-how implementations.
The incident additionally affected retailers using Amazon’s cloud-based advertising solutions. Macy’s had introduced plans to implement Amazon Retail Advert Service, constructed on AWS infrastructure, for sponsored product ads. The October 20 outage demonstrated potential vulnerabilities in such dependencies. Amazon Publisher Cloud, launched in 2023, depends on AWS Clear Rooms infrastructure to allow publishers to plan programmatic offers and activate them in Amazon DSP. Through the outage, these promoting know-how methods skilled degradation alongside different AWS companies.
The technical nature of the failure originated inside what Amazon Workers described as DNS decision points. Area Identify System infrastructure interprets human-readable addresses into machine-readable IP addresses, forming a elementary part of web connectivity. When DNS decision fails for important service endpoints like DynamoDB, dependent methods can’t find needed sources, triggering cascading failures throughout interconnected companies.
AWS maintains the US-EAST-1 Area as considered one of its oldest and most utilized knowledge heart places, serving clients all through North America and globally. The focus of companies inside this area meant {that a} single level of failure—DNS decision for DynamoDB endpoints—created widespread affect throughout unrelated purposes and platforms.
The restoration course of required systematic intervention throughout a number of layers of infrastructure. At 2:01 AM PDT, Amazon Workers recognized a possible root trigger associated to DNS decision of the DynamoDB API endpoint in US-EAST-1. Engineers labored on a number of parallel paths to speed up restoration, acknowledging that the difficulty affected different AWS Companies within the area. World companies or options counting on US-EAST-1 endpoints akin to IAM updates and DynamoDB World tables additionally skilled points.
By 3:35 AM PDT, the underlying DNS situation had been absolutely mitigated. In accordance with Amazon Workers, most AWS Service operations had been succeeding usually, although some requests skilled throttling whereas working towards full decision. Companies continued processing backlogs of occasions together with CloudTrail and Lambda. Requests to launch new EC2 situations or companies launching EC2 situations akin to ECS in US-EAST-1 nonetheless skilled elevated error charges.
Purchase advertisements on PPC Land. PPC Land has normal and native advert codecs through main DSPs and advert platforms like Google Advertisements. By way of an public sale CPM, you possibly can attain trade professionals.
The corporate beneficial that customers flush DNS caches if nonetheless experiencing points resolving DynamoDB service endpoints in US-EAST-1. This steerage mirrored the distributed nature of DNS caching throughout web infrastructure, the place stale or incorrect DNS data can persist in native caches even after upstream decision.
At 4:48 AM PDT, Amazon Workers offered steerage for minimizing affect from ongoing EC2 launch points. The corporate beneficial EC2 Occasion launches that weren’t focused to particular Availability Zones, permitting EC2 flexibility in deciding on applicable AZs. The impairment in new EC2 launches affected companies akin to RDS, ECS, and Glue. Amazon beneficial that Auto Scaling Teams be configured to make use of a number of AZs in order that Auto Scaling may handle EC2 occasion launches mechanically.
Engineers pursued additional mitigation steps to get well Lambda’s polling delays for Occasion Supply Mappings for SQS. In accordance with Amazon Workers, AWS options relying on Lambda’s SQS polling capabilities akin to Group coverage updates skilled elevated processing instances.
At 5:48 AM PDT, Amazon Workers confirmed restoration of processing SQS queues through Lambda Occasion Supply Mappings. The crew labored by processing the backlog of SQS messages in Lambda queues. This represented one milestone within the broader restoration course of affecting a number of interconnected methods.
The incident generated substantial dialogue throughout social platforms. One person famous that when web companies expertise downtime, customers migrate to X to find out what occurred. The focus of studies on various platforms throughout AWS outages has change into a sample, with customers looking for details about service standing when main purposes fail.
For advertising professionals, the incident underscored dependencies inside digital promoting infrastructure. The focus of promoting know-how companies on cloud platforms signifies that infrastructure failures can disrupt marketing campaign supply, measurement, and optimization throughout a number of channels concurrently. Organizations constructing advertising know-how stacks face selections about infrastructure dependencies and contingency planning for service disruptions.
Technical structure selections made years earlier influenced the October 20 incident’s scope and length. In accordance with Amazon Workers, the underlying DNS situation affected DynamoDB service endpoints particularly, however the cascading affect reached companies with dependencies on DynamoDB performance. This architectural coupling meant {that a} single service’s DNS failure may compromise broader platform operations.
The incident affected 142 AWS companies throughout a number of classes. Resolved companies included AWS Account Administration, AWS Amplify, AWS AppConfig, AWS AppSync, AWS Utility Migration Service, AWS B2B Knowledge Interchange, AWS Batch, AWS Billing Console, AWS Shopper VPN, AWS Cloud WAN, AWS CloudFormation, AWS CloudHSM, AWS CloudTrail, AWS CodeBuild, AWS Config, AWS Management Tower, AWS DataSync, AWS Database Migration Service, AWS Deadline Cloud, and AWS Direct Join.
Extra affected companies encompassed AWS Listing Service, AWS Elastic Beanstalk, AWS Elastic Catastrophe Restoration, AWS Elastic VMWare Service, AWS Elemental, AWS Finish Person Messaging, AWS Firewall Supervisor, AWS World Accelerator, AWS Glue, AWS HealthImaging, AWS HealthLake, AWS HealthOmics, AWS IAM Id Middle, AWS Id and Entry Administration, AWS IoT Analytics, AWS IoT Core, AWS IoT Machine Administration, AWS IoT Occasions, AWS IoT FleetWise, AWS IoT Greengrass, and AWS IoT SiteWise.
The excellent listing prolonged to AWS Lake Formation, AWS Lambda, AWS Launch Wizard, AWS License Supervisor, AWS NAT Gateway, AWS Community Firewall, AWS Organizations, AWS Outposts, AWS Parallel Computing Service, AWS Associate Central, AWS Fee Cryptography, AWS Personal Certificates Authority, AWS Useful resource Teams, AWS Secrets and techniques Supervisor, AWS Safety Incident Response, AWS Safety Token Service, AWS Web site-to-Web site VPN, AWS Step Capabilities, AWS Storage Gateway, AWS Assist API, AWS Assist Middle, and AWS Programs Supervisor.
Additional affected companies included AWS Programs Supervisor for SAP, AWS Switch Household, AWS Rework, AWS Transit Gateway, AWS VPCE PrivateLink, AWS Verified Entry, AWS WAF, AWS WickrGov, Amazon API Gateway, Amazon AppFlow, Amazon AppStream 2.0, Amazon Athena, Amazon Aurora DSQL Service, Amazon Bedrock, Amazon Chime, Amazon CloudFront, Amazon CloudWatch, Amazon CloudWatch Utility Insights, Amazon Cognito, Amazon Comprehend, Amazon Join, Amazon DataZone, Amazon DocumentDB, Amazon DynamoDB, Amazon EC2 Occasion Join, and Amazon EMR Serverless.
The outage additionally impacted Amazon ElastiCache, Amazon Elastic Compute Cloud, Amazon Elastic Container Registry, Amazon Elastic Container Service, Amazon Elastic File System, Amazon Elastic Kubernetes Service, Amazon Elastic Load Balancing, Amazon Elastic MapReduce, Amazon EventBridge, Amazon EventBridge Scheduler, Amazon FSx, Amazon GameLift Servers, Amazon GameLift Streams, Amazon GuardDuty, Amazon Interactive Video Service, Amazon Kendra, Amazon Kinesis Knowledge Streams, Amazon Kinesis Firehose, and Amazon Kinesis Video Streams.
Extra companies experiencing points included Amazon Location Service, Amazon MQ, Amazon Managed Grafana, Amazon Managed Service for Apache Flink, Amazon Managed Service for Prometheus, Amazon Managed Streaming for Apache Kafka, Amazon Managed Workflows for Apache Airflow, Amazon Neptune, Amazon OpenSearch Service, Amazon Pinpoint, Amazon Polly, Amazon Q Enterprise, Amazon Fast Suite, Amazon Redshift, Amazon Rekognition, Amazon Relational Database Service, Amazon SageMaker, Amazon Safety Lake, Amazon Easy E-mail Service, Amazon Easy Notification Service, Amazon Easy Queue Service, Amazon Easy Storage Service, Amazon Easy Workflow Service, Amazon Textract, Amazon Timestream, Amazon Transcribe, Amazon Translate, Amazon VPC IP Deal with Supervisor, Amazon VPC Lattice, Amazon WorkMail, Amazon WorkSpaces, Amazon WorkSpaces Skinny Shopper, EC2 Picture Builder, and Site visitors Mirroring.
The breadth of affected companies demonstrated the interconnected nature of cloud infrastructure. Companies spanning compute, storage, networking, database, analytics, machine studying, safety, and software growth all skilled degradation or failure through the incident window.
For patrons utilizing AWS companies, the incident highlighted the significance of multi-region structure and failover capabilities. Organizations counting on single-region deployments confronted full service unavailability through the outage window. These with multi-region architectures may probably route site visitors to unaffected areas, although many companies keep dependencies on US-EAST-1 for international operations.
The restoration timeline demonstrated the complexity of restoring interconnected cloud infrastructure. Preliminary DNS mitigation at 2:24 AM PDT didn’t instantly restore full performance. Engineers spent subsequent hours addressing cascading results all through dependent methods. The gradual discount of throttling mechanisms and systematic restoration of particular person service elements prolonged restoration properly into the afternoon PDT.
The incident generated over 6.5 million person studies globally to monitoring platform Downdetector, in accordance with The Guardian. This quantity represented one of many largest reporting occasions for a cloud infrastructure outage, demonstrating how deeply AWS dependencies had penetrated digital companies. Stories peaked round 7:50 AM ET, with concentrations in North America and Europe the place enterprise operations confronted most disruption.
Banks throughout the UK skilled service degradation. Lloyds Banking Group, Financial institution of Scotland, and Halifax reported points affecting buyer entry to on-line banking companies. Airways together with United and Delta skilled system disruptions, with United implementing backup methods to handle know-how disruptions affecting its app, web site, and inner methods.
Universities reported cascading failures throughout instructional know-how platforms. Rutgers College documented impacts to Canvas, Kaltura, Smartsheet, Adobe Artistic Cloud, Cisco Safe Endpoint, and ArcGIS. The tutorial sector’s heavy reliance on cloud-based studying administration methods meant that instruction, assignments, and pupil communications confronted interruption throughout peak tutorial hours.
Monetary companies platforms offered pressing communications to customers involved about fund security. Coinbase knowledgeable clients that “All funds are secure” regardless of service unavailability. The cryptocurrency change’s message mirrored broader anxieties about monetary system reliability when underlying infrastructure fails. Related issues affected fee platforms together with Venmo, the place customers expressed frustration about incapacity to entry funds.
Design and productiveness instruments skilled widespread disruption. Canva reported considerably elevated error charges impacting performance, attributing points to “a significant situation with our underlying cloud supplier.” The graphic design platform serves tens of millions of customers throughout enterprise, training, and inventive sectors. Service restoration occurred steadily all through the day, with full entry returning for many customers by night.
Synthetic intelligence companies constructed on AWS infrastructure confronted outages. Perplexity CEO Aravind Srinivas confirmed on X that “The foundation trigger is an AWS situation. We’re engaged on resolving it.” The incident demonstrated how rising AI purposes constructed on cloud infrastructure inherit the reliability traits—and vulnerabilities—of their internet hosting platforms.
Inside Amazon operations skilled disruption alongside exterior clients. In accordance with Reddit studies from Amazon workers, warehouse and supply operations confronted system unavailability at many websites. Staff obtained directions to face by in break rooms and loading areas throughout shifts. The Anytime Pay app, permitting workers instant entry to earned wages, went offline. Vendor Central, the hub utilized by Amazon’s third-party sellers to handle companies, additionally skilled outages.
The incident sparked discussions about cloud infrastructure focus. Betsy Cooper, director of the Aspen Institute’s Coverage Academy, famous that whereas massive cloud suppliers provide robust cybersecurity protections and comfort, the draw back emerges when points happen. “All of us have an incentive to make use of the massive firms, as a result of they’re so ubiquitous and it is simpler for us to entry all of our knowledge in a single place,” in accordance with Cooper’s feedback to NPR. “That is nice till one thing goes mistaken, and then you definately actually see simply how dependent you might be on a handful of these firms.”
Mike Chapple, IT professor on the College of Notre Dame’s Mendoza School of Enterprise and former Nationwide Safety Company laptop scientist, defined the technical nature of the failure. “DynamoDB is not a time period that almost all customers know,” in accordance with Chapple’s assertion. “Nevertheless, it is without doubt one of the record-keepers of the fashionable Web.” Chapple famous that early studies indicated the issue wasn’t with the database itself, with knowledge showing secure. “As a substitute, one thing went mistaken with the data that inform different methods the place to seek out their knowledge.”
The outage drew comparisons to the July 2024 CrowdStrike incident, when a defective software program improve brought on Microsoft Home windows methods to go darkish globally. That occasion grounded 1000’s of flights and affected hospitals and banks, revealing fragility in international know-how infrastructure. The October 20 AWS incident, whereas stemming from completely different technical causes, demonstrated comparable patterns of cascading failure throughout interconnected methods.
Some platforms used the outage for aggressive positioning. Elon Musk promoted X’s stability through the incident, responding “Not us” to posts displaying affected companies. Musk emphasised X’s lack of “AWS dependencies” and promoted the platform’s encrypted messaging capabilities as options to affected companies like Sign.
Financial analysts estimated tens of millions in misplaced productiveness and income. E-commerce delays, buying and selling disruptions, and app failures created measurable monetary affect throughout sectors. Small companies and creators reliant on cloud-based instruments expressed frustration over disrupted workflows. Players and streamers reported misplaced progress and leisure entry throughout peak utilization hours.
Amazon dedicated to producing an in depth post-event abstract following the incident. Such summaries usually present technical evaluation of root causes, contributing elements, and applied modifications to forestall recurrence. The trade awaits this documentation to grasp the precise technical failures and architectural selections that contributed to the widespread affect.
The incident occurred throughout a interval when cloud infrastructure reliability has become increasingly critical to digital advertising operations. Amazon DSP’s October 2025 integration with Microsoft Monetize as a most well-liked accomplice highlighted the programmatic promoting ecosystem’s dependence on AWS infrastructure for real-time bidding and advert supply.
Organizations evaluating cloud dependencies might reassess focus dangers following the October 20 incident. Whereas cloud platforms present scalability and operational effectivity, single-provider dependencies create potential single factors of failure. The DNS situation affecting US-EAST-1 demonstrated how infrastructure-level failures can cascade throughout seemingly unbiased purposes and companies.
AWS maintains roughly 30% of the worldwide cloud computing market, in accordance with Synergy Analysis Group. This market focus signifies that AWS infrastructure helps a considerable portion of web companies, from e-commerce platforms to monetary companies, leisure purposes, and enterprise software program. Microsoft Azure and Google Cloud characterize different main suppliers, however the market stays concentrated amongst these three firms.
The US-EAST-1 area’s age and dimension contributed to the outage’s affect. As AWS’s authentic and largest net companies location, the Virginia knowledge heart hosts legacy methods and serves as a main hub for a lot of organizations. The focus of important companies inside this single area meant {that a} DNS decision failure affecting DynamoDB endpoints may compromise operations for 1000’s of unrelated purposes.
Multi-cloud methods current theoretical options to focus dangers, however implementation challenges restrict adoption. In accordance with trade discussions following the outage, price and complexity of sustaining parallel infrastructure throughout a number of cloud suppliers stay prohibitive for a lot of organizations, notably smaller companies. Hybrid cloud approaches provide partial mitigation, however require vital architectural planning and operational overhead.
The incident occurred throughout a interval of fast cloud infrastructure enlargement. Amazon announced ongoing AWS investments to help synthetic intelligence mannequin deployment and expanded developer entry. The corporate’s deal with AI infrastructure—together with the Nova household of basis fashions out there by AWS—will increase the potential affect of infrastructure failures on rising AI-powered companies and purposes.
The incident’s decision required coordination throughout a number of engineering groups addressing completely different elements of the infrastructure stack. In accordance with Amazon Workers updates all through the day, groups labored in parallel on DNS decision, community connectivity, compute occasion launches, and service-specific restoration procedures. This coordinated response mirrored the organizational complexity required to handle large-scale cloud infrastructure.
Buyer expertise through the outage diversified primarily based on particular service dependencies and architectural selections. Organizations utilizing companies closely depending on DynamoDB skilled instant affect. These relying totally on compute companies might have skilled delayed results as EC2 launch throttling took impact. The cascading nature of the failure meant that affect manifested otherwise throughout use instances and configurations.
The incident generated tens of millions of person studies to monitoring platforms. In accordance with Morning Brew, Downdetector obtained over 8 million studies globally, indicating widespread user-perceived affect past the technical metrics reported by Amazon. This quantity of studies demonstrated the extent to which fashionable digital companies rely upon AWS infrastructure.
Some organizations acknowledged the AWS dependency publicly through the incident. Gaming firms posted standing updates referencing AWS points. Monetary companies platforms famous connectivity issues. The transparency from affected organizations contrasted with earlier eras when infrastructure dependencies typically remained opaque to finish customers.
The incident’s affect on promoting know-how operations remained tough to quantify in real-time. Marketing campaign supply interruptions, measurement gaps, and bidding failures possible occurred throughout platforms constructed on AWS infrastructure. The extent of promoting supply affect through the 15-hour disruption window represents a knowledge level for trade discussions about infrastructure reliability necessities for mission-critical advertising operations.
Subscribe PPC Land publication ✉️ for comparable tales like this one. Obtain the information each day in your inbox. Freed from advertisements. 10 USD per 12 months.
Timeline
- October 19, 11:49 PM PDT: AWS begins experiencing elevated error charges for companies in US-EAST-1 Area
- October 20, 12:11 AM PDT: AWS begins investigating elevated error charges and latencies for a number of companies
- October 20, 12:26 AM PDT: Engineers establish DNS decision points for regional DynamoDB service endpoints as root trigger
- October 20, 12:51 AM PDT: AWS confirms elevated error charges affecting a number of companies and Assist Case creation
- October 20, 2:01 AM PDT: AWS identifies potential root trigger associated to DNS decision of DynamoDB API endpoint
- October 20, 2:24 AM PDT: Preliminary DNS mitigation utilized, companies start recovering
- October 20, 2:27 AM PDT: Vital indicators of restoration noticed throughout most affected companies
- October 20, 3:35 AM PDT: Underlying DNS situation absolutely mitigated, most operations succeeding usually
- October 20, 5:48 AM PDT: Restoration of SQS queue processing through Lambda Occasion Supply Mappings confirmed
- October 20, 9:38 AM PDT: Community Load Balancer well being checks recovered
- October 20, 12:28 PM PDT: Many AWS clients and companies seeing vital restoration
- October 20, 3:01 PM PDT: All AWS companies return to regular operations
- April 2025: Amazon CEO justifies AI investments in shareholder letter, AWS growth reaches $108 billion
- July 2025: VideoAmp expands AWS partnership for privacy-enhanced measurement
- September 2025: Macy’s announces Amazon partnership for retail media using AWS infrastructure
- October 2023: Amazon launches Publisher Cloud on AWS Clean Rooms for programmatic deals
Subscribe PPC Land publication ✉️ for comparable tales like this one. Obtain the information each day in your inbox. Freed from advertisements. 10 USD per 12 months.
Abstract
Who: Amazon Internet Companies skilled the outage, affecting tens of millions of customers throughout platforms together with Snapchat, Fortnite, Roblox, Coinbase, Robinhood, and quite a few AWS clients. AWS engineers labored to resolve the incident whereas Amazon Workers offered standing communications.
What: A DNS decision failure for regional DynamoDB service endpoints within the US-EAST-1 Area triggered cascading outages throughout 142 AWS companies and dependent purposes. The incident brought on elevated error charges, latencies, and repair unavailability throughout compute, storage, networking, database, and software companies. Restoration required systematic mitigation of DNS points, throttling of operations, and gradual restoration of service performance.
When: The incident started at 11:49 PM PDT on October 19, 2025, with root trigger identification at 12:26 AM PDT on October 20. Preliminary DNS mitigation occurred at 2:24 AM PDT, with full service restoration at 3:01 PM PDT on October 20, spanning greater than 15 hours from detection to finish restoration.
The place: The failure occurred in Amazon Internet Companies’ US-EAST-1 Area, considered one of AWS’s main knowledge heart places serving clients all through North America and globally. The outage affected companies depending on US-EAST-1 infrastructure, together with international companies with US-EAST-1 dependencies.
Why: DNS decision points for regional DynamoDB service endpoints prevented dependent methods from finding needed sources. The architectural coupling between companies meant {that a} single service’s DNS failure compromised broader platform operations. Restoration complexity stemmed from cascading results all through interconnected infrastructure, requiring systematic restoration of particular person elements whereas managing throttling to forestall extra system pressure.
Source link