On 13/07/2011 02:43, Doug Barton wrote:
On 07/12/2011 00:20, Emile Aben wrote:
We don't have all the answers, but it appears not to be related to a misconfigured zone
Thank you for satisfying my idle curiosity. :) I did not mean to imply that your report was in any way deficient at describing what you think the problem was actually caused by. My curiosity about this particular issue was raised for 2 reasons, one being (as I said previously) history of previous incidents. The other is given that if this were a DDOS attempt it's a rather weak one (on several levels) I can't help finding that unlikely. (Which again, is not a criticism of your analysis, merely a disturbing lack of pieces falling neatly into previously-known patterns.)
I agree that this is a bit strange, my guess would be that this was a test of capabilities of some kind. Not too strange to pick the root-servers as a target, since it is relatively well instrumented.
I did note this from your scrubbed zone file:
<domain>.com. 7200 IN NS ns1.<nsdomain>. <domain>.com. 7200 IN NS ns2.<nsdomain>. <domain>.com. 7200 IN NS ns3.<nsdomain>. <domain>.com. 7200 IN NS ns4.<nsdomain>.
Are we to conclude from that that <nsdomain> is different from <domain>.com? If so, and <nsdomain> is misconfigured somehow, that would
Yes, it was a different domain, not in COM. We asked folks that operate COM and they didn't see the same query-storm for this domain though. If these were all 'normal' resolvers dealing with a misconfigured zone, I'd expect them to follow the delegation chain. Also when spot-checking some 20 source IPs for these queries we didn't find these did any other queries to K-root then for things in <domain>.com. But again, we don't have the definite answer and are not excluding any possible explanation, so thanks for inquiring deeper into this.
start to look more like misconfiguration patterns that we've seen in the past; particularly if <nsdomain> is not in COM, and therefore the COM zone has no glue for those hostnames.
I also note that 2 hours seems to be a ridiculously short TTL for NS records, which would seem to put a little more weight on the "possible misconfiguration" side of the balance. One could imagine a moderately popular game site receiving the CN equivalent of being slashdotted, and previously-painless minor misconfigurations suddenly causing much larger problems.
I just looked at the query load for www.<domain>.com on 20110628, and before 16:28 UTC (0:28 Chinese Standard time) we have 2 queries for this domain, then it all starts: #queries timestamp 1 1309252434 1 1309274472 8603 1309278521 9630 1309278522 11277 1309278523 14123 1309278524 12271 1309278525 12457 1309278526 12118 1309278527 12369 1309278528 12234 1309278529 12402 1309278530 12202 1309278531 12469 1309278532 12138 1309278533 12149 1309278534 ... (continues to be in 10-12kps range for a while) So either the misconfiguration started at around 16:28 UTC, or this wasn't a misconfiguration. The third possibility, already misconfigurated+CN-slashdotted, I think is not impossible but unlikely, both because of it being past midnight at the ASes that were a major source of queries, and the very sudden increase in load. regards, Emile Aben RIPE NCC