Apart from the three platforms, there have been reports that Facebook's internal email system was also down.
Global Facebook, Instagram, WhatsApp outage: On Monday, Facebook, Instagram and WhatsApp faced a global outage for a whopping six hours, rendering 3.5 billion users unable to access the social media platforms or the messaging app. Not only that but according to Downdetector, a group that tracks web and sites which are down, this was the largest ever outage it tracked. Apart from the three platforms, there have been reports that Facebook’s internal email system was also down, and employees at the Menlo Park campus in California could also not access the offices and conference rooms that needed a security badge. So what exactly happened that caused this massive disruption?
Facebook has pinned the blame for this issue on “faulty configuration change” in a statement, and has assured that there has been no user data compromise during this event. “Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt,” it said.
While Facebook has not shared any more details about what the faulty configuration changes were or even who implemented them and why, it is believed that the issue most likely was connected to the domain name system or the DNS. Basically, DNS is best described as the phone book of the internet. Everything on the internet has an internet protocol or IP address that is usually too complex for users to remember on their fingertips. Hence comes the domain name – like facebook.com or financialexpress.com. These domain names are of no use if they are not connected to an IP address of a website. You know when you sometimes type [yourname].com just to see if any website will turn up, but get a message saying there is no website with that domain name? It’s because the domain name is not linked to any IP address. What DNS does here is act as a directory linking the domain name to an IP address. Basically, it can be thought of as a file listing out which domain name leads to which IP address.
It seems like an error linked to Facebook’s DNS records was the reason behind the outage, since a DNS error can lead to the web browser or smartphone on the user’s end no longer being able to navigate to Facebook services. It is believed that the problem originated in the Border Gateway Protocol or the BGP – like the postal services for the internet. Any data entered by the user needs to reach the end destination, and it is the BGP that determines the best available paths for the data to travel.
It is believed to be the cause because a few minutes before the Facebook outage, there were a large number of changes that were made to the BGP routes of Facebook as per public records cited by Cloudfare Chief Technology Officer John Graham-Cumming. There has been no statement from Facebook on why these changes were made, however.
In any case, Cloudfare CEO Matthew Prince then later tweeted on Monday that the BGP routes of Facebook were getting republished, which could mean that the paths to get to the service were being restored and eventually, late at night (in India), the Facebook services were successfully up and running again.