DNS Lameness Statistics and Notifications
Dear Colleagues, In January 2007, the RIPE Document ripe-400 was created to address lameness in the reverse DNS tree. This RIPE Document can be found online at: http://www.ripe.net/ripe/docs/ripe-400.html We have been monitoring name servers for some time now and statistics from the data gathered are published online each month at: http://www.ripe.net/info/stats/dns-lameness/ In October 2008, we sent out notifications to a small number of randomly selected server administrators. Replies to these messages indicated a need for further refinements to our probes and the interpretation of the results. Since then, we have made these improvements and fixed some bugs in our code. We now have better data and are ready to begin sending email alerts about lame delegations to name server operators. The email messages will point to the DNS Lameness FAQs, which explain the possible problems and give some pointers to help solve them. The FAQs can be found online at: http://www.ripe.net/info/stats/dns-lameness/faq.html We will begin sending small batches of email alerts from Thursday, 26 February 2009, using data collected over the month of February. This process will continue until all alerts have been sent out. The alerts will show which zone was checked and what error conditions were detected for its name servers. If you have any questions, please contact <dns-help@ripe.net>. Regards, Anand Buddhdev DNS Services Manager RIPE NCC
Anand, On Tue, 2009-02-24 at 14:02 +0100, Anand Buddhdev wrote:
We have been monitoring name servers for some time now and statistics from the data gathered are published online each month at: http://www.ripe.net/info/stats/dns-lameness/
I'm curious about the numbers here, and the meaning of the counts of servers. That is, if I have: 2.0.192.in-addr.arpa NS ns1.example.com NS ns2.example.net 3.0.192.in-addr.arpa NS ns1.example.com NS ns2.example.net ns1.example.com A 192.0.2.1 ns2.example.com A 192.0.3.1 Does this count as 2 servers or 4 servers? Further, if I have: 2.0.192.in-addr.arpa NS ns1.example.com NS ns2.example.net 3.0.192.in-addr.arpa NS ns1.example.com NS ns2.example.net ns1.example.com A 192.0.2.1 A 192.0.3.1 ns2.example.com A 192.0.2.2 A 192.0.3.2 How many servers does this get counted as? It might also be informative to show the amount of address space that is affected by bad servers. It could be that the overall 6% of servers that are lame only affects 1% of the space... or it could affect 50%. Another useful metric may be to look at the amount of traffic that arrives to the RIPE NCC parents and gets directed to lame servers. I think this is should give a reasonable guesstimate of how lameness affects actual users. The NCC can look at the answers they send, and since they know both the NS-sets they are answering with as well as the lameness for each of the servers in those answers, this information can be used to determine the likely effect of lameness on users. So, for example, if a user gets an NS-set where 1 of 4 servers is lame, we can estimate that they will have a 25% chance of sending a query to a lame server and have to retry. If a user gets an NS-set where 2 of 4 servers are lame, then they have a 50% chance of sending a query to a lame server, and a 33% chance of their retry going to a lame server as well. Combining a bit of analysis with actual traffic measurement could help us to understand what the actual impact of lameness on Internet users is(*). I suppose the NCC would need to be careful about how it publishes results, as LIRs or DNS operators might be sensitive about someone publishing how much DNS traffic they get. I doubt it actually matters, but people may still get upset. -- Shane (*) And perhaps help me in my quest to rid the world of the evil of reverse DNS completely. ;)
Shane Kerr wrote: Hello Shane,
On Tue, 2009-02-24 at 14:02 +0100, Anand Buddhdev wrote:
We have been monitoring name servers for some time now and statistics from the data gathered are published online each month at: http://www.ripe.net/info/stats/dns-lameness/
I'm curious about the numbers here, and the meaning of the counts of servers.
That is, if I have:
2.0.192.in-addr.arpa NS ns1.example.com NS ns2.example.net 3.0.192.in-addr.arpa NS ns1.example.com NS ns2.example.net
ns1.example.com A 192.0.2.1 ns2.example.com A 192.0.3.1
Does this count as 2 servers or 4 servers?
If the 2 servers are lame for both zones, it counts as 4 servers.
Further, if I have:
2.0.192.in-addr.arpa NS ns1.example.com NS ns2.example.net 3.0.192.in-addr.arpa NS ns1.example.com NS ns2.example.net
ns1.example.com A 192.0.2.1 A 192.0.3.1 ns2.example.com A 192.0.2.2 A 192.0.3.2
How many servers does this get counted as?
If none of the addresses of both servers return authoritative answers to queries for the two zones, then this counts as 8. We query every IP address of every name server of a zone, and each unique combination of zone and name server IP address counts as a server.
It might also be informative to show the amount of address space that is affected by bad servers. It could be that the overall 6% of servers that are lame only affects 1% of the space... or it could affect 50%.
Another useful metric may be to look at the amount of traffic that arrives to the RIPE NCC parents and gets directed to lame servers. I think this is should give a reasonable guesstimate of how lameness affects actual users.
The NCC can look at the answers they send, and since they know both the NS-sets they are answering with as well as the lameness for each of the servers in those answers, this information can be used to determine the likely effect of lameness on users.
So, for example, if a user gets an NS-set where 1 of 4 servers is lame, we can estimate that they will have a 25% chance of sending a query to a lame server and have to retry. If a user gets an NS-set where 2 of 4 servers are lame, then they have a 50% chance of sending a query to a lame server, and a 33% chance of their retry going to a lame server as well.
Combining a bit of analysis with actual traffic measurement could help us to understand what the actual impact of lameness on Internet users is(*).
Thank you for these suggestions. They are very useful indeed, and we will consider making use of some of these ideas for future analyses. For the time being, however, we're focussing on the email alerts. Administrators of zones and name servers are beginning to receive email alerts about lame delegations. We hope that many people will act on these messages, and fix their servers.
I suppose the NCC would need to be careful about how it publishes results, as LIRs or DNS operators might be sensitive about someone publishing how much DNS traffic they get. I doubt it actually matters, but people may still get upset.
This is true, and it's why we do not make our detailed results available publicly. Regards, -- Anand Buddhdev DNS Services Manager, RIPE NCC
participants (2)
-
Anand Buddhdev
-
Shane Kerr