Trying to measure Quad9 latency
Hi all, After the recent launch of the Quad9 service I try to discover if it had a best response time them my actual resolver, that is google public DNS. To do this I launch a comparing measurement using atlas that would run during one day to the same address from the probes in my ISP. The address that I was trying has a very high TTL so I was expecting to always hit the cache of the servers and them get a good comparison base. I know that there are other factors involve in server response time but to my home connection I fill that this where enough. When I got the results from the measurement they were not what I expected.... The measurements to Quad9 had high value of REFUSAL. Does any one as a clue why??? The measurement result is available at https://atlas.ripe.net/measurements/10269508 Best regards, -- Eduardo Duarte Gestão e Desenvolvimento de Projetos l Project Development Management *DNS.PT* Rua Latino Coelho, n.º 13, 5.º piso | 1050-132 Lisboa | Portugal Tel: (+351) 211 308 200 Fax: (+351) 211 312 720 dns.pt <http://www.dns.pt> | dnssec.pt <http://www.dnssec.pt> | 3em1.pt <https://www.3em1.pt> | facebook.com/dns.pt <https://www.facebook.com/dns.pt> | pt.linkedin.com/in/dnspt <http://pt.linkedin.com/in/dnspt> Aviso de Confidencialidade/Disclaimer: Este e-mail foi escrito de acordo com o novo acordo ortográfico. Esta mensagem é exclusivamente destinada ao seu destinatário, podendo conter informação CONFIDENCIAL, cuja divulgação está expressamente vedada nos termos da lei. Caso tenha recepcionado indevidamente esta mensagem, solicitamos-lhe que nos comunique esse mesmo facto por esta via devendo apagar o seu conteúdo de imediato. This message is intended exclusively for its addressee. It may contain CONFIDENTIAL information protected by law. If this message has been received by error, please notify us via e-mail and delete it immediately. [ Antes de imprimir esta mensagem pense no ambiente. Before printing this message, think about environment ]
On Wed, Nov 22, 2017 at 06:27:42PM +0000, Eduardo Duarte <eduardo.duarte@dns.pt> wrote a message of 289 lines which said:
When I got the results from the measurement they were not what I expected.... The measurements to Quad9 had high value of REFUSAL.
You did not set the RD (Recursion Desired) bit. Most resolvers refuse these queries, to avoid cache snooping. Compare with #10290443, which have: Recursion desired True
On Nov 22, 2017, at 2:05 PM, Stephane Bortzmeyer <bortzmeyer@nic.fr> wrote:
On Wed, Nov 22, 2017 at 06:27:42PM +0000, Eduardo Duarte <eduardo.duarte@dns.pt> wrote a message of 289 lines which said:
When I got the results from the measurement they were not what I expected.... The measurements to Quad9 had high value of REFUSAL.
You did not set the RD (Recursion Desired) bit. Most resolvers refuse these queries, to avoid cache snooping.
Compare with #10290443, which have:
Recursion desired True
I also created this measurement, which is a ping one to measure latency to identify what countries receive a poor response. https://atlas.ripe.net/measurements/10291137/#!probes - jared
On 11/23/2017 02:23 AM, Jared Mauch wrote:
I also created this measurement, which is a ping one to measure latency to identify what countries receive a poor response. https://atlas.ripe.net/measurements/10291137/#!probes
All those measurements will only show part of the story though: Quad9 seem to do some _extremely_ weird nonsense. We were measuring ICMP-latency just from different machines on the same subnet and saw _huge_ differences in the responses:
rtt min/avg/max/mdev = 15.088/15.202/15.366/0.155 ms rtt min/avg/max/mdev = 4.066/4.134/4.265/0.106 ms
Those times are reproducible, a machine will always get the same 15 or 10 or 4 ms ping. So their anycasted 9.9.9.9 seems to internally redirect queries based on source-IP-hash, and if you're unlucky, they redirect you to a server at the other end of the continent, that has a latency that is worse than if you used a DNS in another european country. What this means for your measurement is that if your probes IP changed it might suddenly get a response time that is worse by a factor, even if it was still on the exact same network. -- Michael Meier, Zentrale Systeme Friedrich-Alexander-Universitaet Erlangen-Nuernberg Regionales Rechenzentrum Erlangen Martensstrasse 1, 91058 Erlangen, Germany Tel.: +49 9131 85-28973, Fax: +49 9131 302941 michael.meier@fau.de www.rrze.fau.de
On Thu, Nov 23, 2017 at 09:56:02AM +0100, Michael Meier <michael.meier@fau.de> wrote a message of 29 lines which said:
Quad9 seem to do some _extremely_ weird nonsense. We were measuring ICMP-latency just from different machines on the same subnet and saw _huge_ differences in the responses:
Well, a DNS server does not HAVE TO handle ICMP like you want/expect. If you want the RTT of a DNS server, send it DNS requests.
2017-11-23 9:56 GMT+01:00 Michael Meier <michael.meier@fau.de>:
Those times are reproducible, a machine will always get the same 15 or 10 or 4 ms ping. So their anycasted 9.9.9.9 seems to internally redirect queries based on source-IP-hash, and if you're unlucky, they redirect you to a server at the other end of the continent, that has a latency that is worse than if you used a DNS in another european country.
Hi All, I think Quad9 uses round robin dns with server pools (3?) behind 9.9.9.9 anycast address: My findigs: $ for i in $(seq 0 9); do echo -n "$i "; dig +short @9.9.9.9 hostname.bind txt CH; sleep 3; done 0 "res300.ams.rrdns.pch.net" 1 "res300.ams.rrdns.pch.net" 2 "res100.ams.rrdns.pch.net" 3 "res200.ams.rrdns.pch.net" 4 "res200.ams.rrdns.pch.net" 5 "res200.ams.rrdns.pch.net" 6 "res300.ams.rrdns.pch.net" 7 "res200.ams.rrdns.pch.net" 8 "res300.ams.rrdns.pch.net" 9 "res300.ams.rrdns.pch.net" ... Quand9 dns server are not created equal: $ for i in $(seq 0 9); do echo -n "$i "; dig +short @9.9.9.9 version.bind txt CH; sleep 3; done 0 "Q9-U-5.0" 1 "Q9-P-5.0" 2 "Q9-U-5.0" 3 "Q9-P-5.0" 4 "Q9-P-5.0" 5 "Q9-U-5.0" 6 "Q9-P-5.0" 7 "Q9-P-5.0" 8 "Q9-P-5.0" 9 "Q9-P-5.0" E. -- | ENRICO ARDIZZONI | Responsabile Ufficio Reti e Sistemi | Università degli Studi di Ferrara
Hi, I think there may be some confusion here. Let me try to clarify a bit. * For RTT measurements, use DNS queries (ICMP may get lower priority).
I think Quad9 uses round robin dns with server pools (3?) behind 9.9.9.9 anycast address:
My findigs:
$ for i in $(seq 0 9); do echo -n "$i "; dig +short @9.9.9.9 <http://9.9.9.9> hostname.bind txt CH; sleep 3; done
0 "res300.ams.rrdns.pch.net <http://res300.ams.rrdns.pch.net>" 1 "res300.ams.rrdns.pch.net <http://res300.ams.rrdns.pch.net>" 2 "res100.ams.rrdns.pch.net <http://res100.ams.rrdns.pch.net>" 3 "res200.ams.rrdns.pch.net <http://res200.ams.rrdns.pch.net>" 4 "res200.ams.rrdns.pch.net <http://res200.ams.rrdns.pch.net>" 5 "res200.ams.rrdns.pch.net <http://res200.ams.rrdns.pch.net>" 6 "res300.ams.rrdns.pch.net <http://res300.ams.rrdns.pch.net>" 7 "res200.ams.rrdns.pch.net <http://res200.ams.rrdns.pch.net>" 8 "res300.ams.rrdns.pch.net <http://res300.ams.rrdns.pch.net>" 9 "res300.ams.rrdns.pch.net <http://res300.ams.rrdns.pch.net>"
Typically anycast services are build as in Figure 1[1]: a anycast service (such as Quad9) is distributed across sites. In your example, the site is AMS. On each site, they may use a load balancer that sends the queries (section 3.5 on [1]) to individual servers (res100, res200, and res300 in this case). How, from your measurements, you reach AMS all the time. You can not control for that, because that is what BGP does: matches you the "closest" site (closest meaning in terms of BGP distance between you and quad9). If you want to see other anycast sites from Quad9, you'll need to measure from other vantage points (using Atlas for example). And anycast is quite stable during normal operations[2]: once you reach a site, you'll tend to stick to it -- unless there's a DDoS or routing manipulations, as in [1].
... Quand9 dns server are not created equal:
$ for i in $(seq 0 9); do echo -n "$i "; dig +short @9.9.9.9 <http://9.9.9.9> version.bind txt CH; sleep 3; done
Diversity of version , maybe. For resiliency. /giovane [1] https://www.isi.edu/~johnh/PAPERS/Moura16b.pdf [2] https://www.isi.edu/%7ejohnh/PAPERS/Wei17b.pdf
Hi, On 22-11-17, Eduardo Duarte wrote:
Hi all,
After the recent launch of the Quad9 service I try to discover if it had a best response time them my actual resolver, that is google public DNS.
To do this I launch a comparing measurement using atlas that would run during one day to the same address from the probes in my ISP. The address that I was trying has a very high TTL so I was expecting to always hit the cache of the servers and them get a good comparison base. I know that there are other factors involve in server response time but to my home connection I fill that this where enough.
Instead of asking for a regular name, you can query a special name like "version.bind" in the CHAOS class. These queries are always answered directly, so it simulates a 100% cache hit and allows you to measure the RTT towards a resolver. To test with dig: $ dig @9.9.9.9 CH version.bind TXT See https://atlas.ripe.net/measurements/9740262/ for a real measurement using this technique. Baptiste
When I got the results from the measurement they were not what I expected.... The measurements to Quad9 had high value of REFUSAL.
Does any one as a clue why???
The measurement result is available at https://atlas.ripe.net/measurements/10269508
Best regards,
Hi! Thank you for the pointers Stephane and Baptiste! Already running new measures! Best regards, Eduardo Duarte Gestão e Desenvolvimento de Projetos l Project Development Management *DNS.PT* Rua Latino Coelho, n.º 13, 5.º piso | 1050-132 Lisboa | Portugal Tel: (+351) 211 308 200 Fax: (+351) 211 312 720 dns.pt <http://www.dns.pt> | dnssec.pt <http://www.dnssec.pt> | 3em1.pt <https://www.3em1.pt> | facebook.com/dns.pt <https://www.facebook.com/dns.pt> | pt.linkedin.com/in/dnspt <http://pt.linkedin.com/in/dnspt> Aviso de Confidencialidade/Disclaimer: Este e-mail foi escrito de acordo com o novo acordo ortográfico. Esta mensagem é exclusivamente destinada ao seu destinatário, podendo conter informação CONFIDENCIAL, cuja divulgação está expressamente vedada nos termos da lei. Caso tenha recepcionado indevidamente esta mensagem, solicitamos-lhe que nos comunique esse mesmo facto por esta via devendo apagar o seu conteúdo de imediato. This message is intended exclusively for its addressee. It may contain CONFIDENTIAL information protected by law. If this message has been received by error, please notify us via e-mail and delete it immediately. [ Antes de imprimir esta mensagem pense no ambiente. Before printing this message, think about environment ] Baptiste Jonglez wrote on 22-11-2017 23:14:
Hi,
Hi all,
After the recent launch of the Quad9 service I try to discover if it had a best response time them my actual resolver, that is google public DNS.
To do this I launch a comparing measurement using atlas that would run during one day to the same address from the probes in my ISP. The address that I was trying has a very high TTL so I was expecting to always hit the cache of the servers and them get a good comparison base. I know that there are other factors involve in server response time but to my home connection I fill that this where enough. Instead of asking for a regular name, you can query a special name like "version.bind" in the CHAOS class. These queries are always answered
On 22-11-17, Eduardo Duarte wrote: directly, so it simulates a 100% cache hit and allows you to measure the RTT towards a resolver.
To test with dig:
$ dig @9.9.9.9 CH version.bind TXT
See https://atlas.ripe.net/measurements/9740262/ for a real measurement using this technique.
Baptiste
When I got the results from the measurement they were not what I expected.... The measurements to Quad9 had high value of REFUSAL.
Does any one as a clue why???
The measurement result is available at https://atlas.ripe.net/measurements/10269508
Best regards,
Hi all, this is all very interesting, and I'd appreciate it if one or more of you would like to share your findings with the rest of the community - for example, by describing your method & conclusions in the RIPE Labs article. Thanks, Vesna On 23/11/2017 00:54, Eduardo Duarte wrote:
Hi!
Thank you for the pointers Stephane and Baptiste! Already running new measures!
Best regards, Eduardo Duarte
participants (8)
-
Baptiste Jonglez
-
Eduardo Duarte
-
Enrico Ardizzoni
-
Giovane C. M. Moura
-
Jared Mauch
-
Michael Meier
-
Stephane Bortzmeyer
-
Vesna Manojlovic