Request to test TCP and UDP to the root servers separately.
My day job has had some interesting discussions lately about TCP to the root servers based on a few odd behaviors we've had reported to us over the past few years. There aren't any surprises in this data, it's been known for a long time some people think DNS is UDP only for instance, some firewalls are broken, and so on. What we realized though is that there is no wide-spread measurement of how often TCP is a problem. Immediately my thought went to the Atlas probe network. It's distributed enough it could give an interesting indication of how often TCP DNS fails compared to UDP DNS. Perhaps the rates are so similar it's a non-issue, perhaps TCP fails much more often. Would it be of interest to the Atlas community to have the probes try and query "." via both UDP and TDP from each of the 13 root servers (perhaps v4 and v6 separately), report back that data, and then generate a report on the results? I would be happy to assist in preparing or presenting a report on the results! -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
On 2/24/12 21:41 , Leo Bicknell wrote:
Would it be of interest to the Atlas community to have the probes try and query "." via both UDP and TDP from each of the 13 root servers (perhaps v4 and v6 separately), report back that data, and then generate a report on the results? I would be happy to assist in preparing or presenting a report on the results!
The probes are already measuring this. I don't think we made any graphs of the results.
philip.homburg@ripe.net:
The probes are already measuring this. I don't think we made any graphs of the results.
Do you also measure whether the path to root servers (other DNS servers?) allow for fragmented UDP packets? If you announce an EDNS(0) buffer size of 4K and ask for some DNSSEC related records, you often receive fragmented UDP messages. It would be interesting to see whether those fragments make it back to the client. If not, there will be timeouts and TCP resends, which I call bad (YMMV). The query produced by the command dig @X.root-servers.net . ANY +dnssec ticles the root servers to send a 3966-byte message, which triggers interesting behaviour in my neck of the woods - such as fragment reordering and consequential reassembly failure - so it's important to not only look at the IP result of the query, but to actually look at the arriving packets. Just a thought ... Cheers, /Lars-Johan Liman #---------------------------------------------------------------------- # Lars-Johan Liman, M.Sc. ! E-mail: liman@netnod.se # Senior Systems Specialist ! Tel: +46 8 - 562 860 12 # Netnod Internet Exchange, Stockholm ! http://www.netnod.se/ #----------------------------------------------------------------------
On 2/25/12 21:18 , Lars-Johan Liman wrote:
philip.homburg@ripe.net:
The probes are already measuring this. I don't think we made any graphs of the results.
dig @X.root-servers.net . ANY +dnssec At the moment we don't have EDNS or ANY. That should get added whenever we get around to doing DNSSEC. ticles the root servers to send a 3966-byte message, which triggers interesting behaviour in my neck of the woods - such as fragment reordering and consequential reassembly failure - so it's important to not only look at the IP result of the query, but to actually look at the arriving packets.
Why does packet reordering lead to reassembly failures? At the moment we also don't have code for capturing fragments. That may be non-trivial to add.
[Changing my From: address to the one used in my subscription, to avoid sender filters.] philip.homburg@ripe.net:
Why does packet reordering lead to reassembly failures?
That's what I would like to know too. :-) I have noticed it (with tcpdump), but I haven't had the the time to drill into what essentially has to be a kernel reassembly problem, or incorrect coding of the fragment headers. There are lots of small ones and zeroes, and deep hacking is needed to reach a solid conclusion, and time is a scarece resource.
At the moment we also don't have code for capturing fragments. That may be non-trivial to add.
Understood. May I suggest a middle ground test, which might be easier for you to implement, and which might still give some useful feedback. First send a nomal UDP DNS query, without DNSSEC and without any EDNS(0) features, and aim for a purposely a small DNS response. That will test normal UDP connectivity. (This you already do, but ensure that the expected response is small.) Then send a TCP DNS query. That will test normal TCP connectivity. (This you obviously already do.) Finally, send a UDP DNS query, with EDNS(0) with a large (but reasonable - maybe 4k?) buffer size, the DO bit set, and many "bells and whistles", to purposely tickle the server to produce a large answer. If you don't receive a parseable DNS response for this, something along the path generates problems for you (fragmenting unit, intermediate firewall, or reassebmly unit), and that in itself is interesting information (me thinks). It may even be that the responding server sets the IP "DF" (don't fragment) bit, which unfortunately seems to be default in some Linux distributions. Or maybe you test this already? ;-) Cheers, /Liman
In a message written on Sat, Feb 25, 2012 at 10:49:04AM +0100, Philip Homburg wrote:
On 2/24/12 21:41 , Leo Bicknell wrote:
Would it be of interest to the Atlas community to have the probes try and query "." via both UDP and TDP from each of the 13 root servers (perhaps v4 and v6 separately), report back that data, and then generate a report on the results? I would be happy to assist in preparing or presenting a report on the results!
The probes are already measuring this. I don't think we made any graphs of the results.
I'm not sure a graph would be the best representation of this data. The interesting questions to me are: - Compare Failure Levels - What is the overall level of UDP failure? - What is the overall level of UDP+EDNS0 failure? - What is the overall level of TCP failure? - Exploring the differences in the ecosystem - Is there a statistical difference in the failure rate between IPv4 and IPv6? (Are we getting better or worse) - Is there a statistical difference in the failure rate between two different root servers? (Does the root server deployment model matter) - What is the difference in response time of TCP queries compared to UDP queries? (What's the penalty to fall back to TCP) -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
Hello, This is an older thread, but since we made some enhancements in the meantime, it could be worth picking it up again. On 2012.02.26. 16:36, Leo Bicknell wrote:
In a message written on Sat, Feb 25, 2012 at 10:49:04AM +0100, Philip Homburg wrote:
On 2/24/12 21:41 , Leo Bicknell wrote:
Would it be of interest to the Atlas community to have the probes try and query "." via both UDP and TDP from each of the 13 root servers (perhaps v4 and v6 separately), report back that data, and then generate a report on the results? I would be happy to assist in preparing or presenting a report on the results!
The probes are already measuring this. I don't think we made any graphs of the results.
In case you've missed the recent announcement: we're now visualising more of the "latest DNS root measurements" here: https://atlas.ripe.net/contrib/root_anycast.html?msm_id=1 Basically, we do queries from all the probes for the serial in the SOA to all root servers, over IPv4 and IPv6, using UDP and TCP. The above visualisation now shows these too. We also have all the historical results of these. Taking this into account:
I'm not sure a graph would be the best representation of this data. The interesting questions to me are:
- Compare Failure Levels - What is the overall level of UDP failure? - What is the overall level of UDP+EDNS0 failure? - What is the overall level of TCP failure?
TCP+UDP are possible, EDNS0 not yet.
- Exploring the differences in the ecosystem - Is there a statistical difference in the failure rate between IPv4 and IPv6? (Are we getting better or worse) - Is there a statistical difference in the failure rate between two different root servers? (Does the root server deployment model matter) - What is the difference in response time of TCP queries compared to UDP queries? (What's the penalty to fall back to TCP)
These are also possible. It's not trivial, as there are some artefacts to be taken into consideration (anycast instance switches, some nodes not responding to some queries, etc.), but it is possible in general. We're not yet ready with the Atlas data sharing facility, but the data is available on request. Let us know if you're really interested in doing this analysis. We can also discuss this at the upcoming RIPE meeting, with whoever is there :-) Cheers, Robert
In a message written on Mon, Mar 19, 2012 at 05:51:34PM +0100, Robert Kisteleki wrote:
In case you've missed the recent announcement: we're now visualising more of the "latest DNS root measurements" here: https://atlas.ripe.net/contrib/root_anycast.html?msm_id=1
Yep, awesome stuff. A third data set would be very interesting for the map, which is for all probes that have both IPv4 and IPv6 connectivity plot "IPv6 RTT - IPv4 RTT". In an ideal world it would be a bunch of zero's, but we don't live in that world yet!
We're not yet ready with the Atlas data sharing facility, but the data is available on request. Let us know if you're really interested in doing this analysis. We can also discuss this at the upcoming RIPE meeting, with whoever is there :-)
I would be interested in digging in deeper, there are some specific questions we would like to know the answers to as the operators of F-Root, and the community may be interested in some of the same questions. I'm not going to be at RIPE this time around, so you'll have to let me know what to do in e-mail. :) -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
participants (4)
-
Lars-Johan Liman
-
Leo Bicknell
-
Philip Homburg
-
Robert Kisteleki