DNSmon "not indicative of what happens to normal traffic" claims the root ops
On 8 December 2015 at 16:07:02, Stephane Bortzmeyer (bortzmeyer@nic.fr) wrote: http://root-servers.org/news/events-of-20151130.txt Thanks Stéphane for your reactivity, as usual :) “Such test traffic may not be indicative of what happens to normal traffic or user experience”. Not entirely sure what this means behind, or is it just them trying to minimise the impact? Cheers, -- Nico
On 08/12/2015 16:16, Nico CARTRON wrote:
On 8 December 2015 at 16:07:02, Stephane Bortzmeyer (bortzmeyer@nic.fr <mailto:bortzmeyer@nic.fr>) wrote:
Thanks Stéphane for your reactivity, as usual :)
“Such test traffic may not be indicative of what happens to normal traffic or user experience”.
Not entirely sure what this means behind, or is it just them trying to minimise the impact?
It could be a bit of expectation management on their part. I agree with the statement that DNSMON and other similar tools do not provide direct insight into end user experience, but that is also not their goal. DNSMON at least is deliberately designed to measure from stable vantage points (RIPE Atlas anchors, formerly TTM boxes), and makes no attempt to simulate how recursive resolvers and end user operating systems may behave. In fact, it avoids such attempts, even to the point that it never retries a failed query, which most clients would do. I would argue that this is a feature of the system, providing as it does a nice clear signal that functions as a good metric for traffic between recursive resolvers and the authoritative servers. For reference, this is what DNSMON "saw" during the reported time period, which seems to correspond to the times in the report: https://atlas.ripe.net/dnsmon/?dnsmon.session.color_range_pls=0-7-7-100-100&dnsmon.session.exclude-errors=true&dnsmon.type=zone-servers&dnsmon.zone=root&dnsmon.startTime=1448798400&dnsmon.endTime=1449021600&dnsmon.ipVersion=both&dnsmon.selectedRows=198.41.0.4,2001:503:ba3e::2:30,192.228.79.201,2001:500:84::b,192.33.4.12,2001:500:2::c,199.7.91.13,2001:500:2d::d,192.203.230.10,192.5.5.241,2001:500:2f::f,192.112.36.4,128.63.2.53,2001:500:1::803f:235,192.36.148.17,2001:7fe::53,192.58.128.30,2001:503:c27::2:30,193.0.14.129,2001:7fd::1,199.7.83.42,2001:500:3::42,202.12.27.33,2001:dc3::35&dnsmon.filterProbes=true&dnsmon.session.color_range_rtt=0-60-60-250-5000&dnsmon.session.color_range_relative-rtt=0-26-26-298-1000 Regards, Chris Amin RIPE NCC Developer
Hi Chris, On 8 December 2015 at 16:35:29, Chris Amin (camin@ripe.net) wrote: On 08/12/2015 16:16, Nico CARTRON wrote:
http://root-servers.org/news/events-of-20151130.txt Thanks Stéphane for your reactivity, as usual :) “Such test traffic may not be indicative of what happens to normal
On 8 December 2015 at 16:07:02, Stephane Bortzmeyer (bortzmeyer@nic.fr <mailto:bortzmeyer@nic.fr>) wrote: traffic or user experience”. Not entirely sure what this means behind, or is it just them trying to minimise the impact?
It could be a bit of expectation management on their part. I agree with the statement that DNSMON and other similar tools do not provide direct insight into end user experience, but that is also not their goal. DNSMON at least is deliberately designed to measure from stable vantage points (RIPE Atlas anchors, formerly TTM boxes), and makes no attempt to simulate how recursive resolvers and end user operating systems may behave. In fact, it avoids such attempts, even to the point that it never retries a failed query, which most clients would do. I would argue that this is a feature of the system, providing as it does a nice clear signal that functions as a good metric for traffic between recursive resolvers and the authoritative servers. For reference, this is what DNSMON "saw" during the reported time period, which seems to correspond to the times in the report: […] Fully agreed, but I still don’t see why they are downplaying the event like this. Tools such as DNSMON are useful, and of course need to be taken with a grain of salt and not blindly believed “as it”. A lot of users/eyeballs have noticed problems, so pretending this did not happen is not really… constructive. (OK, they did not pretend this did not happen, but downplaying is kind of the same to me). Cheers, -- Nico
On 8.12.15 16:45 , Nico CARTRON wrote:
Fully agreed, but I still don’t see why they are downplaying the event like this.
I would call it "putting into perspective". The DNS kept working.
Tools such as DNSMON are useful, and of course need to be taken with a grain of salt and not blindly believed “as it”.
Again: dnsmon is *not* a tool to measure DNS service. You know that, I know that and most people on this list know that. Most of the people reading a public statement probably *do not* know that. So it needs to be pointed out. Otherwise we will get headlines of doom for no good reason.
A lot of users/eyeballs have noticed problems, so pretending this did not happen is not really… constructive.
Provide data please!
(OK, they did not pretend this did not happen, but downplaying is kind of the same to me).
See above. Daniel one of "them"
Fully agreed, but I still don’t see why they are downplaying the event like this.
as shumon tweeted, this was the intersection of what the root ops were willing to say. given that, i am surprised they said anything at all. i guess they had to say something, and this was about as little that could be said to be 'something'; impressively content free. an interesting measurement experiment (not atlas) might be to see the smallest number of PR departments needed to remove all useful content from a post-mortem. :) randy
Hi Randy, On 9 December 2015 at 20:33:46, Randy Bush (randy@psg.com) wrote:
Fully agreed, but I still don’t see why they are downplaying the event like this.
as shumon tweeted, this was the intersection of what the root ops were willing to say. given that, i am surprised they said anything at all. i guess they had to say something, and this was about as little that could be said to be 'something'; impressively content free. an interesting measurement experiment (not atlas) might be to see the smallest number of PR departments needed to remove all useful content from a post-mortem. :) well, that was the point I was trying to make: when you do a PR, people usually expect to learn something. In that case, I got the feeling that I learnt absolutely nothing, while the PR was kind of finger-pointing to solutions such as Atlas, telling that they do not necessarily represent real life. While this is true (as pointed out by Daniel), I found it quite awkward… Cheers, -- Nico
On 8.12.15 16:16 , Nico CARTRON wrote:
On 8 December 2015 at 16:07:02, Stephane Bortzmeyer (bortzmeyer@nic.fr <mailto:bortzmeyer@nic.fr>) wrote:
Thanks Stéphane for your reactivity, as usual :)
“Such test traffic may not be indicative of what happens to normal traffic or user experience”.
Not entirely sure what this means behind, or is it just them trying to minimise the impact?
Real resolvers - use caching, - retry queries, - can use all authoritative servers for a zone, - perform recursion, and (again) - use caching. The typical TTL for caching in the root zone is 48 hours. dnsmon does not do any of this. it measures the responsiveness of particular servers. This means dnsmon is a diagnostic tool for DNS root name server operators and not a diagnostic tool for the DNS service. Daniel inventor of dnsmon (the first version) co-inventor of RIPE Atlas advisor for for k.root-servers.net operations (one of "them")
On 2015/12/08 16:45 , Daniel Karrenberg wrote:
Real resolvers - use caching, - retry queries, - can use all authoritative servers for a zone, - perform recursion, and (again) - use caching.
The typical TTL for caching in the root zone is 48 hours.
I wonder if in the case of local DNSSEC validating resolvers behind DNSSEC-unware resolvers in CPEs, this model is still valid.
On 8.12.15 16:56 , Philip Homburg wrote:
On 2015/12/08 16:45 , Daniel Karrenberg wrote:
Real resolvers - use caching, - retry queries, - can use all authoritative servers for a zone, - perform recursion, and (again) - use caching.
The typical TTL for caching in the root zone is 48 hours.
I wonder if in the case of local DNSSEC validating resolvers behind DNSSEC-unware resolvers in CPEs, this model is still valid.
At the risk of turning this into another DNS discussion list: Why are you wondering exactly? DNSSEC validating resolvers do cache, don't they? Daniel whose CPE runs unbound
On 2015/12/08 17:08 , Daniel Karrenberg wrote:
I wonder if in the case of local DNSSEC validating resolvers behind DNSSEC-unware resolvers in CPEs, this model is still valid.
At the risk of turning this into another DNS discussion list: Why are you wondering exactly? DNSSEC validating resolvers do cache, don't they?
To give an example, the ssh client I use is linked with getdns. Getdns will try to fetch RRSIG records, etc. from the local resolver. If that fails, getdns will become a full recursive resolver. When ssh starts, the cache of getdns will be empty. And after DNS resolution whatever is cached will not be used anymore.
On 8.12.15 17:15 , Philip Homburg wrote:
... To give an example, the ssh client I use is linked with getdns. Getdns will try to fetch RRSIG records, etc. from the local resolver. If that fails, getdns will become a full recursive resolver.
When ssh starts, the cache of getdns will be empty. And after DNS resolution whatever is cached will not be used anymore.
Linking full recursive resolver into each application is a poor engineering choice. A better choice is to run a caching resolver on the host system, which is quite practical and helps already. Better still is to run a caching resolver on the home router. Making the poor choices will be noticeable in the responsiveness of the application. This will provide push-back. Unfortunately the perceived cause will appear to be the choice for DNSSEC and not the poor engineering choice in implementing DNSSEC. Personally I have chosen to implement DNSSEC by running unbound on my home router. I realise that this is not for everyone at this point in time. However CPE vendors are free to make a choice like this for new equipment or upgrades of existing CPE software. Daniel
participants (6)
-
Chris Amin
-
Daniel Karrenberg
-
Nico CARTRON
-
Philip Homburg
-
Randy Bush
-
Stephane Bortzmeyer