How did Amazon bring down some of the world’s largest social networks?
Some 12 hours after Amazon’s server failure brought down large swathes of the social web the company has still not fixed the problem.
The outage has brought down the social bookmarking website Reddit, the social location sharing website Foursquare, the social questions and answers website Quora, the Twitter newspaper website paper.li, the Twitter web client Hootsuite (as well as their ow.ly and ht.ly link services), the social commenting system LiveFyre, CRM application ContactMe and possibly the publishing industry’s Publishers Weekly website as well as many others.
Error messages on various social networking sites.
Although better known for its online retail arm Amazon also manages a cloud server business called Amazon Web Services (AWS). This service allows web companies to rent server space from Amazon at a fraction of the cost of setting up and managing a server farm of their own. Amazon have an impressive list of high profile sites which depend on their AWS servers, including the technology giant Ericsson, the British newspaper The Guardian, the US paper The Washington Post, the social review site Yelp, US media network PBS, the social sharing platform ShareThis and others (not all of whom what been affected by the outage).
So, how do companies use AWS? Well, a retail site using AWS could rent extra server capacity from Amazon around the Christmas/Holiday season to cope with the increased demand placed on their website. This would allow the site to meet the extra demand without having to buy an expensive server which would only be used for a short period of time each year. Microsoft and Google both offer similar businesses; these were unaffected by today’s outage.
Today’s server outage began at about 6 am GMT (1:00 am PDT) and was caused by “connectivity issues” in Amazon’s North Virginia data farm. At 1:40 am Amazon reported that, “We are currently investigating latency and error rates with EBS volumes and connectivity issues reaching EC2 instances in the US-EAST-1 region.” The company’s latest release, issued 11 hours later at 5:30pm GMT (~6 pm PDT) said that they had seen some improvements but were still working on the problem. At the time of writing the sites listed above are still inaccessible.
Amazon have not released much information about the cause of the failure, the official Amazon (@amazon) and Amazon AWS (@awscloud) Twitter accounts have been silent today. Although they have reassured their clients and web users that “[the servers] that are affected by this event have not experienced data loss. We anticipate recovering them.” The latest information is available on the official AWS forum, although much of this is technical.
This is not the first time Amazon’s AWS servers have gone down, in 2008 a server outage brought down young start-up Twitter. Amazon’s European server farms, based in Ireland, were unaffected by the outage