Probe doesn't seem to automatically recover from IPv6 connectivity failure
Hello. Probe (v2) doesn't seem to recover from all network failures automatically. It currently connects using IPv4 and reports IPv6 tests as "can't send". I can't ping probe on IPv6 and it seems like it doesn't talk IPv6 at all on that VLAN, although router sends RA's every minute. In web interface it says that it has IPv6 address (all other things under IPv6 are "Undetermined/Unknown") but it doesn't actually participate in network at all. Only router talks IPv6 in that network, according to tcpdump. I think not recovering from IPv6 failures is a bug. Somehow, it still recovers from IPv4 failures. Problem appeared after I had it physically disconnected from Ethernet for some hours. Before that, probe also didn't have connection to internet for a day. I wish I would not need to pull the power every time this happens. Although it's rare occasion that anything is down here. Thanks.
I have noticed the same. I have a IPv4 and IPv6 enabled router. Every time I reboot the router I have to restart the probe. On 16 May 2013, at 23:32, Sulev-Madis Silber <madis555@hot.ee> wrote:
Hello.
Probe (v2) doesn't seem to recover from all network failures automatically. It currently connects using IPv4 and reports IPv6 tests as "can't send". I can't ping probe on IPv6 and it seems like it doesn't talk IPv6 at all on that VLAN, although router sends RA's every minute. In web interface it says that it has IPv6 address (all other things under IPv6 are "Undetermined/Unknown") but it doesn't actually participate in network at all. Only router talks IPv6 in that network, according to tcpdump.
I think not recovering from IPv6 failures is a bug. Somehow, it still recovers from IPv4 failures. Problem appeared after I had it physically disconnected from Ethernet for some hours. Before that, probe also didn't have connection to internet for a day. I wish I would not need to pull the power every time this happens. Although it's rare occasion that anything is down here.
Thanks.
Hello, Me too I've the same problem.
I have noticed the same. I have a IPv4 and IPv6 enabled router. Every time I reboot the router I have to restart the probe.
On 16 May 2013, at 23:32, Sulev-Madis Silber <madis555@hot.ee> wrote:
Hello.
Probe (v2) doesn't seem to recover from all network failures automatically. It currently connects using IPv4 and reports IPv6 tests as "can't send". I can't ping probe on IPv6 and it seems like it doesn't talk IPv6 at all on that VLAN, although router sends RA's every minute. In web interface it says that it has IPv6 address (all other things under IPv6 are "Undetermined/Unknown") but it doesn't actually participate in network at all. Only router talks IPv6 in that network, according to tcpdump.
I think not recovering from IPv6 failures is a bug. Somehow, it still recovers from IPv4 failures. Problem appeared after I had it physically disconnected from Ethernet for some hours. Before that, probe also didn't have connection to internet for a day. I wish I would not need to pull the power every time this happens. Although it's rare occasion that anything is down here.
Thanks.
-- Laurent Wattieaux Gsm: 06 86 67 62 30 Profil Google+ <https://plus.google.com/103905979884595865030>
Thank you for letting us know and also for speaking up to let us know this happens to more than one of you. We will look at this after the holiday weekend. In the meantime it would help to know the type of the routers and details about your IPv6 routing configuration. Thank you again Daniel On 17.05.2013, at 18:32 , Laurent Wattieaux <laurent@laurent-wattieaux.com> wrote:
Hello,
Me too
I've the same problem.
I have noticed the same. I have a IPv4 and IPv6 enabled router. Every time I reboot the router I have to restart the probe.
On 16 May 2013, at 23:32, Sulev-Madis Silber <madis555@hot.ee> wrote:
Hello.
Probe (v2) doesn't seem to recover from all network failures automatically. It currently connects using IPv4 and reports IPv6 tests as "can't send". I can't ping probe on IPv6 and it seems like it doesn't talk IPv6 at all on that VLAN, although router sends RA's every minute. In web interface it says that it has IPv6 address (all other things under IPv6 are "Undetermined/Unknown") but it doesn't actually participate in network at all. Only router talks IPv6 in that network, according to tcpdump.
I think not recovering from IPv6 failures is a bug. Somehow, it still recovers from IPv4 failures. Problem appeared after I had it physically disconnected from Ethernet for some hours. Before that, probe also didn't have connection to internet for a day. I wish I would not need to pull the power every time this happens. Although it's rare occasion that anything is down here.
Thanks.
-- Laurent Wattieaux Gsm: 06 86 67 62 30
<gprofile_button-32.png>
I am using Apple's AirPort Extreme on a modem(FTTC) with native IPv4 and an IPv6 tunnel over tunnelbroker.net. Probe: 2142 On 18 May 2013, at 10:45, Daniel Karrenberg <daniel.karrenberg@ripe.net> wrote:
Thank you for letting us know and also for speaking up to let us know this happens to more than one of you. We will look at this after the holiday weekend. In the meantime it would help to know the type of the routers and details about your IPv6 routing configuration.
Thank you again
Daniel
On 17.05.2013, at 18:32 , Laurent Wattieaux <laurent@laurent-wattieaux.com> wrote:
Hello,
Me too
I've the same problem.
I have noticed the same. I have a IPv4 and IPv6 enabled router. Every time I reboot the router I have to restart the probe.
On 16 May 2013, at 23:32, Sulev-Madis Silber <madis555@hot.ee> wrote:
Hello.
Probe (v2) doesn't seem to recover from all network failures automatically. It currently connects using IPv4 and reports IPv6 tests as "can't send". I can't ping probe on IPv6 and it seems like it doesn't talk IPv6 at all on that VLAN, although router sends RA's every minute. In web interface it says that it has IPv6 address (all other things under IPv6 are "Undetermined/Unknown") but it doesn't actually participate in network at all. Only router talks IPv6 in that network, according to tcpdump.
I think not recovering from IPv6 failures is a bug. Somehow, it still recovers from IPv4 failures. Problem appeared after I had it physically disconnected from Ethernet for some hours. Before that, probe also didn't have connection to internet for a day. I wish I would not need to pull the power every time this happens. Although it's rare occasion that anything is down here.
Thanks.
-- Laurent Wattieaux Gsm: 06 86 67 62 30
<gprofile_button-32.png>
Here it's FreeBSD doing RA and giving out /64 for probe #4415 It's nicely separated in it's own VLAN. Weird to think, /64 only for single device... :) What a F... waste of IPv6 space... P.S.: There is a field in probe's config that says "type of router". Everyone should fill that field up, so no more asking about what kind of router this is you run. On 2013-05-18 12:53, Olaf Maunz wrote:
I am using Apple's AirPort Extreme on a modem(FTTC) with native IPv4 and an IPv6 tunnel over tunnelbroker.net. Probe: 2142
On 18 May 2013, at 10:45, Daniel Karrenberg <daniel.karrenberg@ripe.net> wrote:
Thank you for letting us know and also for speaking up to let us know this happens to more than one of you. We will look at this after the holiday weekend. In the meantime it would help to know the type of the routers and details about your IPv6 routing configuration.
Thank you again
Daniel
On 17.05.2013, at 18:32 , Laurent Wattieaux <laurent@laurent-wattieaux.com> wrote:
Hello,
Me too
I've the same problem.
I have noticed the same. I have a IPv4 and IPv6 enabled router. Every time I reboot the router I have to restart the probe.
On 16 May 2013, at 23:32, Sulev-Madis Silber <madis555@hot.ee> wrote:
Hello.
Probe (v2) doesn't seem to recover from all network failures automatically. It currently connects using IPv4 and reports IPv6 tests as "can't send". I can't ping probe on IPv6 and it seems like it doesn't talk IPv6 at all on that VLAN, although router sends RA's every minute. In web interface it says that it has IPv6 address (all other things under IPv6 are "Undetermined/Unknown") but it doesn't actually participate in network at all. Only router talks IPv6 in that network, according to tcpdump.
I think not recovering from IPv6 failures is a bug. Somehow, it still recovers from IPv4 failures. Problem appeared after I had it physically disconnected from Ethernet for some hours. Before that, probe also didn't have connection to internet for a day. I wish I would not need to pull the power every time this happens. Although it's rare occasion that anything is down here.
Thanks.
-- Laurent Wattieaux Gsm: 06 86 67 62 30
<gprofile_button-32.png>
P.S.: There is a field in probe's config that says "type of router". Everyone should fill that field up, so no more asking about what kind of router this is you run.
Indeed, but we can only ask hosts to fill that up, it's up to them to actually do that. Quick statistics: almost 50% of the currently connected probes have this field filled in somehow. Eyeballing the contents tells me it's mostly sensible data. Regards, Robert
Hello, My router is a D-Link DIR-825 (Hardware Version: B1 Firmware Version: 2.06EU) with IPv6 in IPv4 tunnel over SixXS (Autoconfiguration type : SLAAC+Stateless DHCPv6 and automatic IPv6 address assignment enabled). I am using SixXS DNS cache service for my IPv6. -- Laurent Wattieaux
What the hell is going on there? It's in "can't send" state again. And I somehow remember I could ping it outside of it's /64, now I can't. It doesn't respond. However now it answers to ICMP ping (inside of it's /64) and also talks NDP like it should.
Update. After pulling the plug, ICMP ping suddenly works from everywhere? WHAT? And probe runs all fine tests again... On 2013-05-20 06:33, Sulev-Madis Silber wrote:
What the hell is going on there? It's in "can't send" state again. And I somehow remember I could ping it outside of it's /64, now I can't. It doesn't respond. However now it answers to ICMP ping (inside of it's /64) and also talks NDP like it should.
Hi, We'll work on fixing this issue. Please, if someone has a reilable way of reproducing the condition, send it to atlas-bugs@ripe.net. Thank you! On 2013.05.21. 7:37, Sulev-Madis Silber wrote:
Update.
After pulling the plug, ICMP ping suddenly works from everywhere? WHAT?
And probe runs all fine tests again...
If you power cycle the probe, then it restarts it state too, meaning DHCP leases, SLAAC, whatever. That works all the time, but it's not very convenient... and it should be fixed. Cheers, Robert
On 2013-05-20 06:33, Sulev-Madis Silber wrote:
What the hell is going on there? It's in "can't send" state again. And I somehow remember I could ping it outside of it's /64, now I can't. It doesn't respond. However now it answers to ICMP ping (inside of it's /64) and also talks NDP like it should.
Here it goes again?! Global address unpingable and probe is even marked as disconnected, maybe it will try to reconnect over IPv4 later. Nothing like this happened before. Something is broken there I'm afraid. It's all the same router and everything. On 2013-05-20 06:33, Sulev-Madis Silber wrote:
What the hell is going on there? It's in "can't send" state again. And I somehow remember I could ping it outside of it's /64, now I can't. It doesn't respond. However now it answers to ICMP ping (inside of it's /64) and also talks NDP like it should.
Dear Silber, may be at your location the problem is a bit more frequent? The other place i was watching, Tore's probe, is happy for 3 days. I am not sure this time it is address configuration error. If you look at the graphs of your probe #4415. I see a different pattern now. It is not Yellow, can't sent error, any more. Compare it on the weekly graph, week 19-20 and now. May be some other issue? So far, from my experience, Red is not IPv6 address disappearing error. Only a series of good results and yellow is the sign. For details, you could download the data and look at it more closely. For example, look at the measurement ID 2001 pinging the K-root. Now it is working fine. The last time the error occurred for a few measurements and it seems to have recovered on its own. The timestamp of the error is "time":1369197942 Never the less, we are sure there is an IPv6 RA related problem that affect a few probes. We are trying figure out a solution. However, the harder part is to reproduce it in a controlled environment where we have access and create a solution. The wired thing about this bug is there is one place with two probes on a same router and only one of them occasionally run into this problem. regards, -antony On Thu, May 23, 2013 at 05:09:27AM +0300, Sulev-Madis Silber wrote:
Here it goes again?! Global address unpingable and probe is even marked as disconnected, maybe it will try to reconnect over IPv4 later. Nothing like this happened before. Something is broken there I'm afraid. It's all the same router and everything.
On 2013-05-20 06:33, Sulev-Madis Silber wrote:
What the hell is going on there? It's in "can't send" state again. And I somehow remember I could ping it outside of it's /64, now I can't. It doesn't respond. However now it answers to ICMP ping (inside of it's /64) and also talks NDP like it should.
* Antony Antony
Never the less, we are sure there is an IPv6 RA related problem that affect a few probes.
Just to highlight a data point I've mentioned before: My probe did not even respond to pings to its link local address. The link local address is *not* dependent on, or initialised based on IPv6 RAs; it is always present on all interfaces with an IPv6 stack active. For this reason I strongly suspect that the problem is found deeper down in the stack then ICMPv6 RA processing is. Best regards, Tore Anderson
On Thu, May 23, 2013 at 6:19 PM, Tore Anderson <tore@fud.no> wrote:
Just to highlight a data point I've mentioned before: My probe did not even respond to pings to its link local address. The link local address is *not* dependent on, or initialised based on IPv6 RAs; it is always present on all interfaces with an IPv6 stack active.
Did it fail DAD on the link-local address, then?
Hi, A quick update. DAD seems a good theory to look into. Thanks for the tip Lorenzo! I will try reproduce the issue. I see a similar error message as reported here when two Development probes are configured with the same link local and the same global IPv6 address. This idea has been around but we never got around verifying it thoroughly. We did not want create loops at RIPE NCC office network to verify it. So if it is true, some how the probe triggers Duplicate Address Detection(DAD) and disable it's own IPv6 address. May be due to temporary layer two loop? One puzzle is in some cases the probe disable the address after a while(it can be hours later, say 6 hours after a reboot) and in some other cases right after a reboot. Is the DAD continuously running in the Linux Kernel? or only when configuring an IPv6 address? I will try to figure out. I have herd other annoyances due DAD. regards, -antony On Thu, May 23, 2013 at 09:39:16PM +0900, Lorenzo Colitti wrote:
On Thu, May 23, 2013 at 6:19 PM, Tore Anderson <tore@fud.no> wrote:
Just to highlight a data point I've mentioned before: My probe did not even respond to pings to its link local address. The link local address is *not* dependent on, or initialised based on IPv6 RAs; it is always present on all interfaces with an IPv6 stack active.
Did it fail DAD on the link-local address, then?
* Antony Antony
DAD seems a good theory to look into. Thanks for the tip Lorenzo! I will try reproduce the issue.
Agreed, failing DAD could explain the problem.
This idea has been around but we never got around verifying it thoroughly. We did not want create loops at RIPE NCC office network to verify it. So if it is true, some how the probe triggers Duplicate Address Detection(DAD) and disable it's own IPv6 address. May be due to temporary layer two loop? One puzzle is in some cases the probe disable the address after a while(it can be hours later, say 6 hours after a reboot) and in some other cases right after a reboot.
In my case, there are no physical loops in the network, but since it's connected straight to a CPE (which contains internal software bridges and switch modules and the like) it's certainly not impossible that some packets are being looped during initialisation/bootup. In my experience, CPEs do all sorts of crazy things...
Is the DAD continuously running in the Linux Kernel? or only when configuring an IPv6 address? I will try to figure out. I have herd other annoyances due DAD.
DAD is run when an address is added to an interface, or when the interface transitions to the UP state, I believe. When completed (either successfully or unsuccessfully), it is never retried. As I understand it, anyway. (DAD can be a very nice DoS attack actually!) One theory could be that the CPE rebooted, which made the probe's uplink go DOWN/UP, triggering DAD. If the CPE was retransmitting the probe's own packets back to itself at this point, DAD could have failed, and the probe would be left without IPv6 connectivity. OTOH unplugging/replugging the probe didn't solve the problem for me, but perhaps once the address goes into a DAD-failed state, a link DOWN/UP event won't help either. Or maybe I was too quick, so that the probe didn't register the DOWN/UP events. Tore
Just thoughts. And can't really confirm it at the moment. I am having the feeling that it depends on the simultaneous reboot of the probe and the router and some timed boot sequences. I can not reproduce it at the moment but when the probe's USB was plugged in the router and the entire router unplugged and restarted IPv6 failed. With the probe plugged into a different power source no IPv6 problems occurred when the router was power cycled. Each device at its own reboot fine with me. The problem might be when the probe reboots and the router is an undefined state. Sent from my iPhone On 23 May 2013, at 09:56, Antony Antony <antony@ripe.net> wrote:
Dear Silber,
may be at your location the problem is a bit more frequent? The other place i was watching, Tore's probe, is happy for 3 days. I am not sure this time it is address configuration error. If you look at the graphs of your probe #4415. I see a different pattern now. It is not Yellow, can't sent error, any more. Compare it on the weekly graph, week 19-20 and now. May be some other issue?
So far, from my experience, Red is not IPv6 address disappearing error. Only a series of good results and yellow is the sign.
For details, you could download the data and look at it more closely. For example, look at the measurement ID 2001 pinging the K-root. Now it is working fine. The last time the error occurred for a few measurements and it seems to have recovered on its own. The timestamp of the error is "time":1369197942
Never the less, we are sure there is an IPv6 RA related problem that affect a few probes. We are trying figure out a solution. However, the harder part is to reproduce it in a controlled environment where we have access and create a solution.
The wired thing about this bug is there is one place with two probes on a same router and only one of them occasionally run into this problem.
regards, -antony
On Thu, May 23, 2013 at 05:09:27AM +0300, Sulev-Madis Silber wrote:
Here it goes again?! Global address unpingable and probe is even marked as disconnected, maybe it will try to reconnect over IPv4 later. Nothing like this happened before. Something is broken there I'm afraid. It's all the same router and everything.
On 2013-05-20 06:33, Sulev-Madis Silber wrote:
What the hell is going on there? It's in "can't send" state again. And I somehow remember I could ping it outside of it's /64, now I can't. It doesn't respond. However now it answers to ICMP ping (inside of it's /64) and also talks NDP like it should.
* Daniel Karrenberg
Thank you for letting us know and also for speaking up to let us know this happens to more than one of you. We will look at this after the holiday weekend. In the meantime it would help to know the type of the routers and details about your IPv6 routing configuration.
FYI, in case you want to debug this problem present on a live probe, it appears to be the case for probe 121 at the moment. It's connected behind an AVM Fritzbox using SLAAC. No other devices connected to it have a similar problem. The probe should have had the IPv6 address 2a02:fe0:cf16:70:220:4aff:fec6:cd7a (which actually shows up on the probe's web page), but it does not respond to anything - not to ICMPv6 NS from the local link to the aforementioned address, nor to fe80::220:4aff:fec6:cd7a. No responses from its link-local address is seen when pinging ff02::1 either. All the default measurements says "Cannot send". If you have no use for this debugging opportunity, let me know and I'll reboot the probe and see if its IPv6 connectivity comes back then. Tore
Hi Tore, Could you please unplug the ethernet cable and re-plug again? I want see if it repeats systematically. The last time after a reboot your probe had a Global V6 address for about an hour (~64 minutes) and after that the probe unconfigured all V6 addresses. Olaf Maunz, Now I see IPv6 on you probe V6 measurements are working again. May be it rebooted on its own. I will look into the details later. Laurent Wattieaux, could you please let me know your Probe ID. You could drop me a private mail if you prefer that. We have seen this on couple of probes, however, yet figure out the details and a solution. We know the place were we saw the issue, it repeats. So it is good to have few more cases reported. May be we can use your help to debug and solve the problem. Thanks for all reports. We will increase the priority of chasing this bug. So far the symptom is after a reboot, some time later the probe unconfigure all V6 addresses. The case were we saw it most frequently was using a Fritz Boz 7270; i think. There is no userland client for RA so it is a bit hard to debug. We are trying add code to report the RA announcements seen by the probe. regards, -antony On Sun, May 19, 2013 at 04:57:39PM +0200, Tore Anderson wrote:
* Daniel Karrenberg
Thank you for letting us know and also for speaking up to let us know this happens to more than one of you. We will look at this after the holiday weekend. In the meantime it would help to know the type of the routers and details about your IPv6 routing configuration.
FYI, in case you want to debug this problem present on a live probe, it appears to be the case for probe 121 at the moment. It's connected behind an AVM Fritzbox using SLAAC. No other devices connected to it have a similar problem.
The probe should have had the IPv6 address 2a02:fe0:cf16:70:220:4aff:fec6:cd7a (which actually shows up on the probe's web page), but it does not respond to anything - not to ICMPv6 NS from the local link to the aforementioned address, nor to fe80::220:4aff:fec6:cd7a. No responses from its link-local address is seen when pinging ff02::1 either. All the default measurements says "Cannot send".
If you have no use for this debugging opportunity, let me know and I'll reboot the probe and see if its IPv6 connectivity comes back then.
Tore
* Antony Antony
Could you please unplug the ethernet cable and re-plug again? I want see if it repeats systematically. The last time after a reboot your probe had a Global V6 address for about an hour (~64 minutes) and after that the probe unconfigured all V6 addresses.
Done. Note that I did not unplug the USB cable, so the probe remained powered all the time. Tore
Hi Tore, Yes I was hopping just unplugging the Ethernet cable would fix it temporally. and we can reproduce it again. It seems that was not enough. So lets reboot, unplug the USB an replug it, and if the problem re-appear shortly after that you could try to move to another v6 router? The logs show the laster time it for ~6 hours ors and then the probe dropped its Global V6 address Do you have another V6 router where you can plug it in regards, -antony On Tue, May 21, 2013 at 10:15:01AM +0200, Tore Anderson wrote:
* Antony Antony
Could you please unplug the ethernet cable and re-plug again? I want see if it repeats systematically. The last time after a reboot your probe had a Global V6 address for about an hour (~64 minutes) and after that the probe unconfigured all V6 addresses.
Done. Note that I did not unplug the USB cable, so the probe remained powered all the time.
Tore
* Antony Antony
Hi Tore, Yes I was hopping just unplugging the Ethernet cable would fix it temporally. and we can reproduce it again. It seems that was not enough. So lets reboot, unplug the USB an replug it, and if the problem re-appear shortly after that you could try to move to another v6 router? The logs show the laster time it for ~6 hours ors and then the probe dropped its Global V6 address
Not just the global one, as I mentioned earlier it had stopped responding to link-local pings as well. It seemed like the entire IPv6 stack was dead, to be honest. In any case, I rebooted it now and now it responds fine to IPv6 pings again, both to its global and link-local addresses.
Do you have another V6 router where you can plug it in
Not at home, but I can connect it straight to my ISP's cable modem via a L2 switch (not behind my HGW as it is today). However my ISP does not use SLAAC, only DHCPv6 IA_NA - I don't know if the Atlas probes have gotten support for this nowadays? Otherwise I could always bring the probe to work and connect it there. Tore
participants (8)
-
Antony Antony
-
Daniel Karrenberg
-
Laurent Wattieaux
-
Lorenzo Colitti
-
Olaf Maunz
-
Robert Kisteleki
-
Sulev-Madis Silber
-
Tore Anderson