DNS probes: spurious SERVFAIL
Dear list, while troubleshooting DNS issues on the Google network yesterday [1] we found that RIPE Atlas probes did not recover after Google fixed the issue, we kept seeing SERVFAIL answers on the RIPE Atlas probes. We now know that those ATLAS results are bogus. For example google.us against 8.8.4.4 https://atlas.ripe.net/measurements/30456676/#probes google.us against 1.1.1.1 https://atlas.ripe.net/measurements/30464549/#probes zoom.us against 8.8.8.8 https://atlas.ripe.net/measurements/30464774/#probes zoom.us against 1.1.1.1 https://atlas.ripe.net/measurements/30464777/#probes amazon.com on quad9 https://atlas.ripe.net/measurements/30479180/#probes amazon.com on quad1 https://atlas.ripe.net/measurements/30479179/#probes Even amazon.com on quad9 and quad1 has numerous SERVFAIL. To reproduce, just try to resolve zoom.us or google.us against 8.8.8.8 or 1.1.1.1 on your probe. Any recommendations on how we could troubleshoot this? [1] https://puck.nether.net/pipermail/outages/2021-May/013648.html
On Sun, May 30, 2021 at 02:27:18PM +0200, Lukas Tribus <lukas@ltri.eu> wrote a message of 41 lines which said:
while troubleshooting DNS issues on the Google network yesterday [1] we found that RIPE Atlas probes did not recover after Google fixed the issue, we kept seeing SERVFAIL answers on the RIPE Atlas probes. ... To reproduce, just try to resolve zoom.us or google.us against 8.8.8.8 or 1.1.1.1 on your probe.
It works and I don't see one SERVFAIL: % blaeu-resolve --requested 100 --area North-Central --nameserver 8.8.8.8 --type A google.us ... Test #30484765 done at 2021-05-30T12:34:54Z
Hello, On Sun, 30 May 2021 at 14:39, Stephane Bortzmeyer <bortzmeyer@nic.fr> wrote:
To reproduce, just try to resolve zoom.us or google.us against 8.8.8.8 or 1.1.1.1 on your probe.
It works and I don't see one SERVFAIL:
% blaeu-resolve --requested 100 --area North-Central --nameserver 8.8.8.8 --type A google.us ... Test #30484765 done at 2021-05-30T12:34:54Z
That's a major difference to what I'm seeing, and I don't get why the difference is so huge. Just now:1000 probes, area North-Central, A record google.us against 8.8.8.8: Above 90% of ther answers are SERVFAIL (the rest probably is not in yet, or timeout). https://atlas.ripe.net/measurements/30484987/#probes Only 12 (twelve) probes returned NOERROR. Lukas
Lukas, I just tried as well, and I see no issues: sanjeev@T450s-disco:~$ blaeu-resolve --requested 100 --area North-Central --nameserver 8.8.8.8 --type A google.us Nameserver 8.8.8.8 [172.217.17.100] : 1 occurrences [142.250.75.228] : 1 occurrences [216.58.209.68] : 1 occurrences [142.250.184.4] : 3 occurrences [142.250.179.100] : 1 occurrences [142.250.186.68] : 1 occurrences [142.250.179.196] : 5 occurrences [172.217.16.132] : 2 occurrences [142.250.184.36] : 3 occurrences [142.250.185.164] : 4 occurrences [172.217.16.68] : 2 occurrences [172.217.218.103 172.217.218.104 172.217.218.105 172.217.218.106 172.217.218.147 172.217.218.99] : 1 occurrences [142.250.186.36] : 2 occurrences [216.58.209.36] : 1 occurrences [216.58.215.164] : 1 occurrences [172.217.16.36] : 1 occurrences [142.250.187.164] : 2 occurrences [172.217.168.196] : 1 occurrences [74.125.193.103 74.125.193.104 74.125.193.105 74.125.193.106 74.125.193.147 74.125.193.99] : 1 occurrences [142.250.180.4] : 1 occurrences [172.217.18.68] : 1 occurrences [172.217.18.196] : 1 occurrences [172.217.23.228] : 2 occurrences [173.194.222.103 173.194.222.104 173.194.222.105 173.194.222.106 173.194.222.147 173.194.222.99] : 1 occurrences [142.250.150.103 142.250.150.104 142.250.150.105 142.250.150.106 142.250.150.147 142.250.150.99] : 1 occurrences [216.58.207.196] : 1 occurrences [142.250.185.132] : 1 occurrences [172.217.168.36] : 3 occurrences [216.58.198.36] : 1 occurrences [216.58.208.132] : 1 occurrences [172.217.23.196] : 2 occurrences [142.250.179.132] : 2 occurrences [172.217.20.4] : 1 occurrences [142.250.187.196] : 1 occurrences [172.217.168.228] : 1 occurrences [216.58.215.228] : 1 occurrences [142.250.74.4] : 1 occurrences [216.58.212.100] : 1 occurrences [142.250.74.196] : 1 occurrences [TIMEOUT] : 2 occurrences [142.250.185.196] : 1 occurrences [142.250.200.36] : 2 occurrences [216.58.207.164] : 2 occurrences [216.58.204.100] : 1 occurrences [142.250.181.196] : 1 occurrences [216.58.198.4] : 1 occurrences [142.250.200.132] : 1 occurrences [142.250.180.164] : 2 occurrences [216.58.209.4] : 1 occurrences [142.250.180.228] : 1 occurrences [216.58.214.68] : 1 occurrences [142.250.74.132] : 1 occurrences [216.58.213.132] : 1 occurrences [172.217.18.100] : 1 occurrences [172.217.16.228] : 2 occurrences [142.250.184.132] : 2 occurrences [172.217.23.36] : 1 occurrences [74.125.131.103 74.125.131.104 74.125.131.105 74.125.131.106 74.125.131.147 74.125.131.99] : 1 occurrences [142.250.74.100] : 1 occurrences [142.250.184.164] : 1 occurrences [216.58.211.4] : 1 occurrences [216.58.215.132] : 1 occurrences [216.58.208.100] : 1 occurrences [142.250.185.68] : 1 occurrences [172.217.22.132] : 1 occurrences [216.58.214.196] : 1 occurrences [216.58.207.228] : 1 occurrences [172.217.169.164] : 1 occurrences [142.250.185.228] : 1 occurrences [108.177.14.103 108.177.14.104 108.177.14.105 108.177.14.106 108.177.14.147 108.177.14.99] : 1 occurrences [172.217.168.4] : 1 occurrences [216.58.209.164] : 1 occurrences Test #30485569 done at 2021-05-30T13:06:25Z sanjeev@T450s-disco:~$ echo $? 0 https://atlas.ripe.net/measurements/30485569/#probes -- Sanjeev Gupta +65 98551208 http://www.linkedin.com/in/ghane On Sun, May 30, 2021 at 8:50 PM Lukas Tribus <lukas@ltri.eu> wrote:
Hello,
On Sun, 30 May 2021 at 14:39, Stephane Bortzmeyer <bortzmeyer@nic.fr> wrote:
To reproduce, just try to resolve zoom.us or google.us against 8.8.8.8 or 1.1.1.1 on your probe.
It works and I don't see one SERVFAIL:
% blaeu-resolve --requested 100 --area North-Central --nameserver 8.8.8.8 --type A google.us ... Test #30484765 done at 2021-05-30T12:34:54Z
That's a major difference to what I'm seeing, and I don't get why the difference is so huge.
Just now:1000 probes, area North-Central, A record google.us against 8.8.8.8:
Above 90% of ther answers are SERVFAIL (the rest probably is not in yet, or timeout). https://atlas.ripe.net/measurements/30484987/#probes
Only 12 (twelve) probes returned NOERROR.
Lukas
On Sun, 30 May 2021 at 14:49, Lukas Tribus <lukas@ltri.eu> wrote:
Hello,
On Sun, 30 May 2021 at 14:39, Stephane Bortzmeyer <bortzmeyer@nic.fr> wrote:
To reproduce, just try to resolve zoom.us or google.us against 8.8.8.8 or 1.1.1.1 on your probe.
It works and I don't see one SERVFAIL:
% blaeu-resolve --requested 100 --area North-Central --nameserver 8.8.8.8 --type A google.us ... Test #30484765 done at 2021-05-30T12:34:54Z
That's a major difference to what I'm seeing, and I don't get why the difference is so huge.
Just now:1000 probes, area North-Central, A record google.us against 8.8.8.8:
Above 90% of ther answers are SERVFAIL (the rest probably is not in yet, or timeout). https://atlas.ripe.net/measurements/30484987/#probes
Only 12 (twelve) probes returned NOERROR.
Here's the cause: By default, the Atlas UI for measurements has the "RECURSION DESIRED" flag disabled. It is buried under advanced options, as it would be some optional flag. blaeu-resolve set this by default. Thanks, Lukas
participants (3)
-
Lukas Tribus
-
Sanjeev Gupta
-
Stephane Bortzmeyer