Why The Cloud Is Not As Safe As It Sounds
Amazon Web Services is the world's largest hosting location, so even minor server issues make it feel like the entire Internet is breaking. But why?
Last week during Hurricane Sandy, 20 feet of water filled the basement of BuzzFeed's Internet service provider, Datagram, drowning BuzzFeed and other major sites in the process. Thanks to an incredible engineering team that quickly rebuilt the site from scratch and moved it over to Amazon Web Services (AWS), BuzzFeed was back up the next morning.
The speed and flexibility of hosting with Amazon is one of its biggest draws, but just because we're sitting "safely" in the cloud now doesn't mean AWS is immune to hurricanes or other server disasters. A few weeks ago, AWS experienced some difficulties, taking down what seemed like half the Internet in the process. The issue, once again, was traced back to one of Amazon's massive data centers located in northern Virginia (which is actually on the ground and not in any sort of cloud at all) that lately has been prone to power outages and "performance issues," temporarily killing major sites like Netflix, Instagram, and Reddit.
For smaller Web hosts, minor server issues would probably go unnoticed — but when just one of the dozens of AWS servers has an issue, it feels like the entire Internet is about to collapse. As of September, Amazon became the world's largest hosting location with 118,000 Web-facing computers supporting 6.8 million websites, according to U.K. research firm Netcraft, and continues to grow. As AWS becomes more and more massive, a tiny hiccup can take down your site for an entire day — so why host with Amazon?
Before AWS, Web hosting was kind of like renting a storage unit. You'd call up the Web host and say, "I need five servers for my cool new website dogswatchingdogs.com," and they would quote you a rental fee. Depending on the size and complexity of your site, it could either be automated or take a few weeks, but eventually you'd be up and rolling with a hosting package tailored to your needs. Scaling was for smaller sites — you just rent more space or more servers. But scaling to extreme sizes was much harder — either you had to switch hosts or do your own hosting.
The worst part? In the event of an outage, you had to make sure you could quickly and easily reconfigure a lot of physical hardware from your end, or switch to a different host when yours went down. Your ability to recover your site in an extreme outage depended on often small crews of people managing far-flung data centers.
Switching or moving to your own data center was a chore. "When a customer was worried about staying online when their data center disappeared, the customer was responsible for making sure another data center had the same data and configuration — or at least something close enough to stay in business," Jeremiah Peschka, a cloud computing expert, told me.
AWS changed a lot of this. Now, with a few clicks and commands, you can have servers available the same day (or in the middle of the night when a hurricane hits), paying for only the amount of computing you need. AWS has a dozen or so different types of hosting on demand, ready (almost) instantaneously. There are really fancy servers, like the Elastic Compute Cloud (EC2), which are good for complicated Web applications, or there's Simple Storage Service (S3), which works for static content like music and images. (Ars Technica has a thorough rundown on all the various types.)
"[AWS] is very much a design-your-own data center," said Peschka. "It's exactly the same old boring data center, it's just somebody else is managing it for you."
In essence, Amazon decided to take care of all the shitty parts of Web hosting, and over the past decade has made AWS a very flexible, scalable system too. If I know dogswatchingdogs.com is really busy from 8 a.m. to 5 p.m., I can have a lot of servers active during that time period, but then I can tell AWS to turn them off after 5 p.m. Above being faster, more reliable, and cheaper than other hosting options, AWS made things easy.
But with this ease of entry came an increased feeling of reliability in Web hosting services, aided by notions of the high and mighty CLOUD. What people forget, though, is that Web hosting hasn't really changed much in the last decade in terms of how susceptible we are to natural disasters or flaws in the service — like a default location in northern Virginia. Of Amazon's seven EC2 server farms scattered across the globe, when anyone signs up for AWS, they're automatically set up with this East Coast location, the one that continues to have problems, Peschka said.
"People started developing more apps and never bothered to change the default region, adding more hardware when they needed it," he told me. For this reason, the servers at this particular data center might be disproportionately susceptible to failure. (AWS continually ignored requests to comment on this.)
In any case, if Amazon's servers disappeared tomorrow, a huge chunk of the Internet would undoubtedly be gone. The idea of disaster feels different now that dogswatchingdogs.com doesn't have to worry about these physical servers; Amazon's got that all under control. By Amazon making everything just work in their data center, there's a lot less plumbing to worry about from the customer's view — but this also means that if 20 feet of water began flooding an Amazon data center, BuzzFeed would probably not have been able to waltz down there and move our servers elsewhere.
Our engineers are working to figure out what the best long-term solution is for hosting our site — since both local servers in Manhattan and Amazon's cloud clearly aren't hurricane-proof — but as more and more AWS issues keep cropping up, maybe we'll start to realize how reliant we are on something that, for the most part, is still just a bunch of giant machines hanging out in Virginia.