Re: [dns-wg] a historical perspective on DNS lameness
Ed, On Thu, Oct 8, 2009 at 6:23 PM, Edward Lewis <Ed.Lewis@neustar.biz> wrote:
1. RFC 1912 has this definition:
A lame delegations exists when a nameserver is delegated responsibility for providing nameservice for a zone (via NS records) but is not performing nameservice for that zone (usually because it is not set up as a primary or secondary for the zone).
I would agree with the definition, I'm not sure whether I would 100% agree with the 'usually because' I think it can be for a number of reasons and I don't think anybody has looked into this in an detail. Misconfiguration or communication issues could equally be to blame as 'no configuration'
Note - a remote party can only detect the situation if the server in question responds to a query, that is, if:
the server has no IP address, the IP address is unreachable, the query drops on the way there, the response drops, the server simply won't answer, or etc., the remote party does not have the evidence to conclude lameness.
A common confusion made is that misconfigured delegations are lame, but that is not the case. The distinction will be important later on.
2. There is no specified protocol response to indicate that a server is not set up to answer the question, nothing in the RFC's, You can say this is the root cause of the failure - an old, old version of BIND decided to return a referral "upwards" (as to the root or maybe the TLD) to be kind of helpful. This kind act turned out to be the problem when...
3. An implementation of an iterative resolver was released that "believed" the upwards referral and would follow it. (The implementation was widely distributed, legend said it was a MS release in about 2000. But I can't confirm that.) What this meant is that a lot of resolvers would query the root, then a TLD, then a SLD and then be referred back to the root and keep on cycling. That was the operational impact that lead to...
4. ARIN policy 2002-1 which told ARIN staff to go off an stamp out lame servers. (This is where I came into the picture.) The purpose was to stop this cycle of activity, but by the time the policy got in place the software was patched pretty much out of existence and the operational pain lessened.
Thats quite interesting I had no idea that was the reason behind the ARIN policy I thought it was for a similar reason to the RIPE one which basically was a desire to ensure the reverse tree was as clean and functional as possible.
5. At RIPE 45 I presented ( http://www.ripe.net/ripe/meetings/ripe-45/presentations/ripe45-dns-testing-l...) which has some interesting bullets and slides that are still pertinent today on the process of dealing with lameness. When I presented this, ARIN had a high lame rate and RIPE a low one, due largely to early registration mishaps and the difference in the timing of ARIN and RIPE "cutting" a delegation for a new assignment.
6. At some time RIPE adopted a lame inspection policy. I don't have a personal recollection of this. http://www.ripe.net/ripe/docs/ripe-400.htmlis the policy definition, I had left ARIN by then and wasn't as aware of what was going on then in the RIPE region.
Indeed I was the author of said document (following input from the RIPE community of course) I do seem to remember discussing this with you at the time the policy was going through the WG, but I think it was more related to what methodology you had used to detect the lameness rather than the reason for doing it.
So - what perspective I am trying to say here is - "lame" per se refers to name servers that led software into a loop which is why they were a problem for ARIN. This is why today you probably won't measure the impact of "misconfigured" delegations (which is what RIPE-400 calls lame) by inspecting traffic. The question of the value of RIPE-400 then turns into being one of database cleanliness.
I'm not sure I really understand your point here, If an LIR has entered some data into the DB which it believes to be correct but it (or its friends) have misconfigured (or not configured) the servers then I would regard the DB to be clean and that the server's need fixing. Either way I would regard notification of the problem to the LIR to be a good thing. Brett
At 15:55 +0100 10/13/09, B C wrote:
I'm not sure I really understand your point here
The message was in reaction to Shane Kerr's presentation at RIPE 59. Temporary URL: http://www.ripe.net/ripe/meetings/ripe-59/presentations/uploads//presentatio... J. Lame Delegation Analysis for the RIPE Region - Shane Kerr, ISC - 25' Falling Trees (or If a DNS Server is Lame but Nobody Queries It, Should I Get an Email?) Queries to the RIPE NCC server for reverse DNS were recorded and compared against which name servers are lame. Based on this, we can see which queries are affected by lameness, and how badly. -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Edward Lewis NeuStar You can leave a voice message at +1-571-434-5468 As with IPv6, the problem with the deployment of frictionless surfaces is that they're not getting traction.
participants (2)
-
B C
-
Edward Lewis