Friday, September 17

Fastly Says Single Client Caused Error That Caused Massive Disruption | Internet


An internet blackout that took down some of the world’s largest websites on Tuesday was ultimately caused by a single customer updating their settings, infrastructure provider Fastly revealed.

A bug in Fastly’s code introduced in mid-May lay dormant until Tuesday morning, according to Nick Rockwell, the company’s head of engineering and infrastructure. When the unidentified customer updated its configuration, it triggered the failure, which ultimately removed 85% of the company network.

“On May 12, we began a software deployment that introduced a bug that could be caused by specific customer settings under specific circumstances,” Rockwell said. “At the beginning of June 8, a customer pushed a valid configuration change that included the specific circumstances that triggered the error, causing 85% of our network to return errors.

“We detect the outage within a minute, then we identify and isolate the cause and disable the settings. In 49 minutes, 95% of our network was working normally. “

Rockwell added: “Although there were specific conditions that triggered this blackout, we should have anticipated it. We provide mission critical services and treat any actions that may cause service issues with the highest sensitivity and priority. We apologize to our customers and those who depend on them for the disruption and we sincerely thank the community for their support. “

The content delivery network (CDN) operated by Fastly is one of the largest on the Internet, along with similar networks operated by Akamai, Cloudflare and Amazon’s CloudFront. They all operate on the same principle: that the Internet is faster and more stable if users can connect to servers physically close to them, optimized to handle a large amount of traffic.

In typical times, doing so not only reduces load times, but also allows CDN operators, experienced in running the Internet infrastructure, to shoulder the burden of handling security threats, unexpected traffic spikes, and high bills. of bandwidth. But the outage highlighted the risks associated with concentrating critical Internet infrastructure in the hands of a few companies.

Contrary to intuition, the disruption and rally led to a surge in Fastly’s share price, which rose 12% over the course of Tuesday. The increase may have been because the company had demonstrated an effective incident response plan, or simply because the outage had made investors more aware of the scale of Fastly’s business and the size of its customer base.

The effects will not have been so rosy for Fastly’s customers. On Amazon alone, for example, the 80-minute outage could have lost the company $ 32 million in sales, according to a calculation by SEO Reboot agency.

“Although it appears they were not inactive for long, the impact it would have had will be enormous, especially on e-commerce sites,” said Naomi Aharony, the agency’s managing director. “Given that our investigation estimates that Amazon could potentially have lost $ 6,803 every second it was inactive, it is clear that you will want to do an investigation to find out what happened.”

Few Fastly customers were able to switch to a backup system in time to recover from the outage, in part because doing so is generally considered higher risk than simply waiting for the vendor to fix problems. For example, according to public documents, gov.uk has a backup contract with Amazon to provide CDN services, but requires manual intervention to make the change.


www.theguardian.com

Leave a Reply

Your email address will not be published. Required fields are marked *

Share