[ Note: since publishing this article we have since created and deployed Proactive Nameservers (basically failover at the nameserver level) which automates some of the concepts outlined in this post. ]
Greetings from St. Lucia, where I'm here with the family for an end-of-summer vacation. I wanted to post about this topic before I left but I didn't get to it, but this article over at CircleID reminded me. The article discusses the ramifications and effects of the large, possibly record-setting DOS attack against DNSMadeEasy last weekend. (To clarify: DNSMadeEasy is a separate company, unrelated to easyDNS)
The article states "An attack on DNS is an attack on The Internet" and this much is true. As we always quip around here, "DNS is something nobody notices until it stops working".
I have to admit that in the early days of easyDNS I was oblivious to the possibility of DOS attacks. It simply never occurred to me. We were able to proclaim 100% DNS uptime since launching in1998 for a glorious 5 years and then on April 14th, 2003, it all ended as we got hit with a DOS that pancaked all four single-node nameservers and every domain on the system went dark for about 75 minutes. I nearly had a nervous breakdown, and then over the summer I thought long and hard about the ramifications and at the time surmised that the DNS hosting model was doomed.
Then we started looking at DNS anycasting but it took us another 5 years to get there. In the meantime we had another outage from another DOS: about an hour on Sept. 14/2005. We added Prolexic DDoS mitigation within weeks of that attack and are happy to report we haven't had an outage since.
In the intervening years we also moved ourselves to a DNS Anycast architecture. While it is significantly harder to bring down an anycast architecture with a DOS attack, it can still happen. Usually instead of a complete and utter outage, you get "regional outages", which is basically a euphemism to deflect assertions of downtime: "Some users may experience regional outages….like North America and Europe" (credit to Steven Job for that bit of humour).
Some DNS Providers guarantee you that they will never go down and assert 100% DNS uptime in face of prior DOS attacks. In reality, every single DNS provider in existence for more than 5 years has had downtime. If the DOS attack that hit DNSMadeEasy last week really was 40 or 50 GIGS, and if it would have hit us, I hesitate to say "we would have stayed up". In 2006 we got hit with an attack that was 20 to 25 gigs, and we didn't go down completely ("Some customers may have experienced regional outages"), but we sure felt it. Prolexic withstood the attacks and at the end of it we had to write a few enormous cheques to our providers to cover the bandwidth.
But I have long since backed away from my 2003 trepidations that the centralized DNS hosting model was doomed, for a few reasons:
- DNS Anycast changes the game and drastically raises the bar for a DOS attack so that even if the resources can be mustered to do it, the duration of an outage is usually decreased as more numerous network carriers become aware of the problem and act to corral it.
- DDoS Mitigation strategies have also improved. These days I think we are pretty well under a continuous state of low intensity DOS attack in one form or another. By low intensity I mean it doesn't bring us down anymore, but these attacks are about 10 to 20 times more powerful than the 2003 attack that did us in, so:
- The DOS attacks that DNS providers routinely mitigate every day would probably level many non-professional, non-dedicated DNS setups.
- The other benefits to using an specialized DNS hosting provider outweigh the isolated risks of DOS attacks. A good example of this is DNS Anycast: the DNS best practice that is simply not-viable for many organizations to implement on their own. Commercial DNS providers make viable through their economies of scale.
But this is the internet. If you elect to take part in it, there are certain unpleasant realities that will come home to roost. Like if you own a domain name, sooner or later it'll get joe-jobbed in a spam mailout. So to eventually you will get caught in the crossfire of a DOS attack against some target that has nothing to do with you but it's big enough to mess up one of your infrastructure suppliers. Like an empty bottle thrown at random into a crowd.
On the DNS side of things there are a few steps you can take to either not go down, even if your DNS provider does, or to make any impact minimal.
- Use a DNS provider that allows third-party zone transfers. Either one that lets your slave your DNS zone from a primary nameserver outside of their own system (basically using a DNS provider as secondary DNS), or one that lets you designate other nameservers outside their system that can slave your DNS zone from it. Ideally, both.
- Use two DNS providers. If you have the ability to setup point #1 above with multiple DNS providers, then you are pretty redundant right there. I got an email from a large web services company after the DNSMadeEasy DOS who uses both theirs and our services. He said they experienced no downtime and using two DNS providers was still a lot less expensive than their previous setup.
- Or, just use any third party nameserver, even one of your own. Have it slave your zone from your DNS host (or have your DNS host slave from it). Unless you are the actual target of the DOS, then, like a jet that can fly as long as one engine is firing, you'll be fine for the brief time your DNS provider may be down (or experiencing regional outages).
Being connected to the internet has varying degrees of importance to different organizations. For some, no downtime is acceptable (i.e. for DNS providers or web hosts, it's very very bad). Other organizations take a couple years to notice that their domain name expired.
Depending on the seriousness of your web presence you may want to also consider additional measures and be aware of a few things.
- Many top-level rootzones (.com, .net, .org, .biz and .info) make modifications in near realtime. People may be accustomed to things like nameserver delegation modifications to take a day to kick in. In fact a lot of user-interface verbiage probably still says as much ("please allow 24 to 48 hours for your nameserver delegation to take effect"). In these rootzones it's closer to 3 to 5 minutes. Use that. The bad guys (spammers, botnets, etc) "fast-flux" their nameservers all the time to thwart tracing and reporting. It's a tactic you can take back from the black-hats and you can fast-flux your nameservers to provide a moving target in a DOS situation.
- Warm spares: have your DNS mirrored on third party nameservers, but do not add them to your nameserver delegation. If your DNS provider goes down, you then temporarily swap in your warm spares.
- For web hosts or other infrastructure suppliers that run DNS for their clients: do the above, except when you need to make a switch to your warm spares, you change the rootzone glue record for your nameservers: this way you do not need to make changes to each customer domain's nameserver delegation. The caveat here is you tend to only buy time: if the DOS is targeting you and you change-up your nameserver glue, the DOS may eventually (or sooner) follow you to the new IPs. Having said that, you can keep doing this and you may be able to diffuse the attack.
- Another overlooked fact: you can round-robin a nameserver glue record. We've tried it and don't find it near as effective as DNS anycast, but in a DOS situation, if you can add more warm spares to your nameserver glue records, then do it. Again, this diffuses the attack. "Regional outages" may indeed be a euphemism but it really is better than "everything is down hard".
- Here's one we learned the hard-way: don't have your nameservers in the same netblocks as your web interface and data storage, especially if you provide infrastructure services. If your nameservers are going to get clobbered you at least want to be able to get email and maybe provide a modicum of critical services to your users, something you can't do if your entire operation is within the same /24 that has been null routed by your upstream providers.
If I can perhaps add some comments to this theme: I would not wish a nameserver outage on any DNS provider. And you can believe me, when it happens, the people inside that company are tearing their hair out, suffering extreme mental anguish and pulling out all the stops to restore services. When I see a DNS provider taken out by a DOS attack and chatter on twitter, etc along the lines of "XYZDNS is down #fail #fail #fail" I want to thwap those people upside the head. Get a life. Do you think your DNS provider is out on the golf course while his business is being taken apart by a DOS?
While I am a businessman and we are a for-profit company, I do not relish gaining business at the expense of a DNS provider who's down because of a DOS attack. I'd rather gain customers on price, service offerings, customer support, our good looks, anything but a competitor going down because of a DOS. I guess because I've been there, I know how it feels. (Not all DNS providers take this view, in fact some of them pounce with glee when the opportunity presents itself, firing up the telemarketing crew to cold call the fallen provider's customers. If you're a customer of ours you have perhaps received such a call in the past).
DOS attacks are criminal acts. Get pissed off at the criminals who undertake them, not the people who are on the front lines of having to deal with them. Use these tips to stay online regardless of who your DNS provider is. I'm not advocating you stop using your existing DNS provider, but rather you modify your tactics so that instead of your DNS host becoming your single DNS host, it becomes more of a "DNS infrastructure management" role, that you use to setup and maintain multiple DNS structures (combining in-band nameservers from your DNS host with out-of-band nameservers outside their cloud), and warm spares.
Update (Jan 08/2012):
Since we posted this, we have since rolled out easyRoute53, a management UI for Amazon AWS Route53 DNS. We heard back from a few members after the most recent DDoS attack (unfortunately, against us), that they were able to use this to export their DNS to Route53 and add or switch to Route53 for the duration of the attack. See easyRoute53.com or this blog post for more details.
Update (Dec 06/2012):
Since we posted this we have most recently announced Proactive Nameservers, a system that basically automates and implements the "warm spares" approach we outline above. Think of it as "hot swappable nameservers" or "nameserver failover". See http://proactivenameservers.com.