Shane Kerr wrote: Hello Shane,
On Tue, 2009-02-24 at 14:02 +0100, Anand Buddhdev wrote:
We have been monitoring name servers for some time now and statistics from the data gathered are published online each month at: http://www.ripe.net/info/stats/dns-lameness/
I'm curious about the numbers here, and the meaning of the counts of servers.
That is, if I have:
2.0.192.in-addr.arpa NS ns1.example.com NS ns2.example.net 3.0.192.in-addr.arpa NS ns1.example.com NS ns2.example.net
ns1.example.com A 192.0.2.1 ns2.example.com A 192.0.3.1
Does this count as 2 servers or 4 servers?
If the 2 servers are lame for both zones, it counts as 4 servers.
Further, if I have:
2.0.192.in-addr.arpa NS ns1.example.com NS ns2.example.net 3.0.192.in-addr.arpa NS ns1.example.com NS ns2.example.net
ns1.example.com A 192.0.2.1 A 192.0.3.1 ns2.example.com A 192.0.2.2 A 192.0.3.2
How many servers does this get counted as?
If none of the addresses of both servers return authoritative answers to queries for the two zones, then this counts as 8. We query every IP address of every name server of a zone, and each unique combination of zone and name server IP address counts as a server.
It might also be informative to show the amount of address space that is affected by bad servers. It could be that the overall 6% of servers that are lame only affects 1% of the space... or it could affect 50%.
Another useful metric may be to look at the amount of traffic that arrives to the RIPE NCC parents and gets directed to lame servers. I think this is should give a reasonable guesstimate of how lameness affects actual users.
The NCC can look at the answers they send, and since they know both the NS-sets they are answering with as well as the lameness for each of the servers in those answers, this information can be used to determine the likely effect of lameness on users.
So, for example, if a user gets an NS-set where 1 of 4 servers is lame, we can estimate that they will have a 25% chance of sending a query to a lame server and have to retry. If a user gets an NS-set where 2 of 4 servers are lame, then they have a 50% chance of sending a query to a lame server, and a 33% chance of their retry going to a lame server as well.
Combining a bit of analysis with actual traffic measurement could help us to understand what the actual impact of lameness on Internet users is(*).
Thank you for these suggestions. They are very useful indeed, and we will consider making use of some of these ideas for future analyses. For the time being, however, we're focussing on the email alerts. Administrators of zones and name servers are beginning to receive email alerts about lame delegations. We hope that many people will act on these messages, and fix their servers.
I suppose the NCC would need to be careful about how it publishes results, as LIRs or DNS operators might be sensitive about someone publishing how much DNS traffic they get. I doubt it actually matters, but people may still get upset.
This is true, and it's why we do not make our detailed results available publicly. Regards, -- Anand Buddhdev DNS Services Manager, RIPE NCC