SE Podcast #36 – We Got Hit by a Hurricane




The Stack Overflow Podcast show

Summary: So as you may have heard in the news, the east coast got hit pretty hard by Hurricane Sandy - in particular, our datacenter in Lower Manhattan was almost knocked entirely offline.  If not for the incredible efforts of Fog Creek Software, Squarespace, and Peer1 (the datacenter) there would have certainly been days of outages for everyone involved. We've got a ton of people from Stack, Fog Creek and Squarespace on to tell the CRAZY story of exactly what happened last week! Guests include: David Fullerton**, VP Engineering at Stack Exchange; Geoff Dalgas & Nick Craver, both core developers at Stack Exchange; Alex Miller; Michael Pryor; Mendy Berkowitz, lead sysadmin for Fog Creek; Babak Ghaheremanpour, longtime Creeker; Anthony Casalena**, CEO and founder of Squarespace. We're planning on telling the whole story of Hurricane Sandy - it's roughly in chronological order here We are from New York, and all of our offices and equipment are located there. Hurricane Sandy recently hit us, as you may have heard. We go back all the way to Monday night, 10/29. Nick got the first communications from Peer1, our datacenter, which was warning everyone that the power was going out for everything south of 34th street. Monday night, we thought all was safe in sound. Stack Exchange had some failover plans in place, however, as you heard about on a previous podcast. On the Fog Creek side, things were still relatively calm. They were basically blindsided, because the datacenter was confident that they had generator fuel for "like, days". Then the storm hit. There was wind and a little bit of rain. Everything in Zone A got flooded basically immediately, as predicted, but if you didn't live in Zone A you didn't really notice. Michael Pryor's foreshadowing. He saw a Hacker News post saying that Internap, another datacenter, was down - and started making plans to protect Fog Creek if Peer1 went down. Suddenly, we get word that the generator only has thirty minutes of fuel left. Mike Mazzei was the only Peer1 staffer there at the time, and he was stretched pretty thin. He is basically a super hero and ended up saving the day. Anthony managed to get exactly one email on Tuesday morning, and it happened to be about running out of fuel in the middle of the day (where he had previously thought they had a few days of fuel to spare). "Let me tell you what it looked like when I showed up." Michael describes the scene on Broad St. for us. Based on flawed information from the NOC, Fog Creek makes plans to shut everything down at 10:45AM. Bradford was the only sysadmin who was awake and connected. He said we had to start doing a controlled shutdown Mike has the idea that if we can get the fuel up to the generator, we can keep everything online. Someone from Squarespace found empty 55-gallon drums on Craigslist and brought them down to the datacenter. The first attempt is pushing these barrels of diesel up the stairs. The building's major task was getting the water pumped out of the basement, so at first Fog Creek and Squarespace and Peer1 were able to work on the fuel issue relatively unfettered. Fog Creek decides to bring their servers back up, since they had people on the ground in the datacenter now to monitor the situation  The bucket brigade begins! Michael goes home and sleeps for three hours. He then heads back to Peer1 and checks the generator tank which is only a quarter full... Joel tells us about trying to raise the alarm with incommunicado sysadmins Mendy and Sven and get them back online Sven starts working on with some others was moving Trello onto AWS Michael tells us about how lucky he got with the Fog Creek fishtank during last year's power outage. Another example of how we were very lucky to be accidentally prepared for this event. Everyone laughs at us for having datacenters in Manhattan, but the clear benefit is that we had the physical ability to make things happen because t...