DNS lameness notifications
Dear colleagues, I was wondering if there is a detailed description of what and how is tested in order to check the lameness of a server (i.e. how are the names resolved, timeouts and retransmits of the queries, checks made,....)? Any pointer would be welcome. The background is that I got notifications ("Unable to resolve nameserver ") which are most probably wrong, unless the resolving algorithm is very delicate... Best regards, Gilles -- Fondation RESTENA - DNS-LU 6, rue Coudenhove-Kalergi L-1359 Luxembourg tel: (+352) 424409 fax: (+352) 422473
We got a few notices aswell. We are monitoring our DNS servers from several locations and we had no outages during this period. --- Nuno Vieira nfsi telecom, lda. nuno.vieira@nfsi.pt Tel. (+351) 21 949 2300 - Fax (+351) 21 949 2301 http://www.nfsi.pt/ ----- "Gilles Massen" <gilles.massen@restena.lu> wrote:
Dear colleagues,
I was wondering if there is a detailed description of what and how is tested in order to check the lameness of a server (i.e. how are the names resolved, timeouts and retransmits of the queries, checks made,....)? Any pointer would be welcome.
The background is that I got notifications ("Unable to resolve nameserver ") which are most probably wrong, unless the resolving algorithm is very
delicate...
Best regards, Gilles
-- Fondation RESTENA - DNS-LU 6, rue Coudenhove-Kalergi L-1359 Luxembourg tel: (+352) 424409 fax: (+352) 422473
On Mar 09, Gilles Massen <gilles.massen@restena.lu> wrote:
The background is that I got notifications ("Unable to resolve nameserver ") which are most probably wrong, unless the resolving algorithm is very delicate... Many people got those, I can definitely confirm that there are some unexplained false positives.
-- ciao, Marco
Gilles Massen wrote: Hello Gilles,
I was wondering if there is a detailed description of what and how is tested in order to check the lameness of a server (i.e. how are the names resolved, timeouts and retransmits of the queries, checks made,....)? Any pointer would be welcome.
This is currently not documented. However, I can provide a quick explanation here. The first phase of the lameness checks involves generating a canonical list of name servers for a zone. The process gathers all the name servers, and queries each name once for A and AAAA records, with a 3-second timeout. Once it has a complete list of zone and nameserver address pairs, it queries each address for a SOA record for the zone, with a 3-second timeout. If a particular address yields no response, it is queued, and queried up to 4 more times at varying intervals.
The background is that I got notifications ("Unable to resolve nameserver ") which are most probably wrong, unless the resolving algorithm is very delicate...
We are aware that around 1% of the servers we have in our list did not resolve to addresses, which resulted in these false positives. We are taking steps to ensure that we eliminate as many of these as possible in future probes. -- Anand Buddhdev DNS Services Manager, RIPE NCC
Anand Buddhdev wrote:
Gilles Massen wrote:
Hello Gilles,
I was wondering if there is a detailed description of what and how is tested in order to check the lameness of a server (i.e. how are the names resolved, timeouts and retransmits of the queries, checks made,....)? Any pointer would be welcome.
This is currently not documented.
It needs to be documented for at least 2 reasons that come immediately to mind. First, it will allow operators to perform the same tests that you(pl) are performing, thus hopefully catching problems before they happen. Second, it will allow for peer review of the procedure. Given that there are a non-trivial number of operators reporting that your system is generating false positives, I think both of these goals are relevant. In an effort to be helpful (not critical) I will ask some questions about the details you've left out below. Hopefully fleshing this process out can help everyone involved.
However, I can provide a quick explanation here.
The first phase of the lameness checks involves generating a canonical list of name servers for a zone.
How is this done? IOW, what sources are queried to generate the list?
The process gathers all the name servers, and queries each name once for A and AAAA records, with a 3-second timeout.
What source is queried? If a query times out are other authoritative sources for that name tried? I would say off hand that 3 seconds is probably not long enough for a timeout, but it might be acceptable if more than one source is tried (for example, if you're looking for the address records for ns1.example.com and you try all of the name servers for example.com).
Once it has a complete list of zone and nameserver address pairs,
I think I know what you mean by this, but I'm not sure. Could you please flesh this out?
it queries each address for a SOA record for the zone, with a 3-second timeout. If a particular address yields no response, it is queued, and queried up to 4 more times at varying intervals.
That sounds a little more reasonable, although if it were me I would double the timeout on each successive query. hope this helps, Doug
Re Anand, anandb@ripe.net (Anand Buddhdev) wrote:
The background is that I got notifications ("Unable to resolve nameserver ") which are most probably wrong, unless the resolving algorithm is very delicate...
We are aware that around 1% of the servers we have in our list did not resolve to addresses, which resulted in these false positives. We are taking steps to ensure that we eliminate as many of these as possible in future probes.
Actually, I personally would be more interested in more detailed error messages, especially in the case of those false positives. We dream of our stuff being always resolvable... Yours, Elmar. -- --[ bins@denic.de ]-----------------------------[ http://www.denic.de/ ]-- DENIC eG | Elmar K. Bins | Networking, Security Kaiserstr. 75-77 | AS8763, AS31529 | Tel +49 69 27235 0 D-60329 Frankfurt am Main | EKB2 @ RIPE | Fax +49 69 27235 239 -------------------------------------------------------------------------- Eingetr. Nr. 770 im Genossenschaftsregister Amtsgericht Frankfurt am Main Vorstand: Sabine Dolderer, Dr. Jörg Schweiger, Marcus Schäfer, Carsten Schiefner. Vorsitzender des Aufsichtsrats: Elmar Knipp
* Elmar K. Bins:
We are aware that around 1% of the servers we have in our list did not resolve to addresses, which resulted in these false positives. We are taking steps to ensure that we eliminate as many of these as possible in future probes.
Actually, I personally would be more interested in more detailed error messages, especially in the case of those false positives. We dream of our stuff being always resolvable...
I second that, especially if DNSSEC validation is involved. The current message format is good enough for trivial cases, but it's a bit terse in case of complex failure scenarios. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
Hello, Anand. On Tue, 2009-03-10 at 00:36 +0100, Anand Buddhdev wrote:
We are aware that around 1% of the servers we have in our list did not resolve to addresses, which resulted in these false positives.
Rather than false positives, doesn't that indicate a different kind of lameness -- of the delegation, rather than of the server? /Niall
Hi Niall,
Rather than false positives, doesn't that indicate a different kind of lameness -- of the delegation, rather than of the server?
Actually it depends on the test. If I interpret Anand's comment correctly, and only 1 query per name is made, then you'd only need one server to be slow or one packet to be lost (and isn't that what udp is for? :)) to become flagged as 'unresolvable'. But I'd like to believe that the DNS can do better than that. Besides, flagging too many nameserver as unresolvable means that the actual lameness test won't be performed, so the statistics could end up to be seriously tainted... Best, Gilles -- Fondation RESTENA - DNS-LU 6, rue Coudenhove-Kalergi L-1359 Luxembourg tel: (+352) 424409 fax: (+352) 422473
On Tue, 2009-03-10 at 10:42 +0100, Gilles Massen wrote:
Actually it depends on the test. [...]
Gilles, I think we're not disagreeing, but just identifying different edge and corner cases. I'm sure Anand knows how to merge hint-streams. 8-) /Niall
Hello, Also a test page like SIDN has for the nameservers and some other registries have should be nice so we can check what RIPE registers. So a web frontend for the current test so people can test it when they want to test it and not waiting for a new email from RIPE to see if it is changed correctly. With kind regards, Mark Scholten Stream Service -----Original Message----- From: dns-wg-admin@ripe.net [mailto:dns-wg-admin@ripe.net] On Behalf Of Niall O'Reilly Sent: dinsdag 10 maart 2009 14:03 To: Gilles Massen Cc: dns-wg; Anand Buddhdev Subject: Re: [dns-wg] Re: DNS lameness notifications On Tue, 2009-03-10 at 10:42 +0100, Gilles Massen wrote:
Actually it depends on the test. [...]
Gilles, I think we're not disagreeing, but just identifying different edge and corner cases. I'm sure Anand knows how to merge hint-streams. 8-) /Niall
Stream Service wrote:
Also a test page like SIDN has for the nameservers and some other registries have should be nice so we can check what RIPE registers. So a web frontend for the current test so people can test it when they want to test it and not waiting for a new email from RIPE to see if it is changed correctly.
To be just a little bit more precise: the SIDN testpage currently only checks for domains with an.nl extension. We are in the process of renewing our DNScheck. It might in the future even be able to check more than just .nl We are working closely together with our fine collegues from .SE regarding these improvements and hence I would like to recommend their great dnscheck-page: http://dnscheck.iis.se/ It should be able to test al zones, include in-addr.arpa. But bear in mind it is new software, which may not be entirely free of errors. Feel free to contribute to it, in order to improve it! :-) http://opensource.iis.se/trac There is also a mailinglist. Best regards, -- Marco Davids SIDN
Also a test page like SIDN has for the nameservers and some other registries have should be nice so we can check what RIPE registers. There is already <http://www.db.ripe.net/cgi-bin/delcheck/delcheck2.cgi> for this purpose. jaap
At 0:36 +0100 3/10/09, Anand Buddhdev wrote:
We are aware that around 1% of the servers we have in our list did not resolve to addresses, which resulted in these false positives. We are taking steps to ensure that we eliminate as many of these as possible in future probes.
Servers that cannot be reached are not "lame" per se. The origin of the concern over lameness (excepting the desire for complete correctness by some) was in an old resolver implementation that didn't know that a referral to the root was an indication of lameness and not a referral to be followed. When this old resolver queried a server without a response (including not finding an IP address for the server's domain name), it stopped asking and thus was not a problem. When the resolver got a response and it was a referral to the root problems ensued. In at least three RFCs lame servers are defined to be servers that are not authoritative for the zones they are thought to be authoritative for. To detect that, the querier has to get a response. So, for a server to be considered lame by a resolver (that's the "eye of the beholder" here), the server must have an IP address, be reachable, and respond. For the lame testing I've written, I've always listed servers by IP address and not domain name. That is because some registrants would list multiple NS records with different domains, all pointing to the same IP address. In one instance, three NS records each pointed to three IP addresses, but all sets of the three IP addresses were identical. That is: silly.example. NS ns1.silly.example. silly.example. NS ns2.silly.example. silly.example. NS ns1.silly.example. ns1.silly.example. AAAA ::1 ns1.silly.example. AAAA ::2 ns1.silly.example. AAAA ::3 ns2.silly.example. AAAA ::1 ns2.silly.example. AAAA ::2 ns2.silly.example. AAAA ::3 ns3.silly.example. AAAA ::1 ns3.silly.example. AAAA ::2 ns3.silly.example. AAAA ::3 So, my recommendation when implementing a lame server test - stick to the IP addresses and test them. If there is no IP address available, don't try to test - the problem may lie in the forward map, getting to the forward map, etc., and may just be too confusing to explain. -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Edward Lewis NeuStar You can leave a voice message at +1-571-434-5468 Getting everything you want is easy, if you don't want much.
participants (12)
-
Anand Buddhdev
-
Doug Barton
-
Edward Lewis
-
Elmar K. Bins
-
Florian Weimer
-
Gilles Massen
-
Jaap Akkerhuis
-
Marco Davids
-
md@Linux.IT
-
Niall O'Reilly
-
Nuno Vieira - nfsi telecom
-
Stream Service