Probe #1000165 showed as down in tab#status
Dear RIPE NCC crew, Some diag below : # systemctl status atlas mars 02 11:12:43 melusine.eu.org ATLAS[32011]: Do a controller INIT mars 02 11:12:43 melusine.eu.org ATLAS[32011]: Controller init -p 443 atlas@ctr-fsn01.atlas.ripe.net INIT mars 02 11:12:43 melusine.eu.org ATLAS[32011]: 255 controller INIT exit with error Probe seems down since scheduled machine reboot this night. Help for fixing welcome, Merci, Jacques -- GnuPg : 156520BBC8F5B1E3 Because privacy matters. « Quand est-ce qu'on mange ? » AD (c) (tm)
Probe is back. Sorry for the noise. Have a nice day, Jacques -- GnuPg : 156520BBC8F5B1E3 Because privacy matters. « Quand est-ce qu'on mange ? » AD (c) (tm)
Hi Jacques, Thank you for reporting! There was a network maintenance in Hetzner last night. Few days before the ctr-fsn01 was reinstalled. Due to minor network misconfiguration the controller's IPv6 connectivity was not restored. Now the problem is solved. Best regards /vty On 3/2/21 11:17 AM, Jacques Lavignotte wrote:
Dear RIPE NCC crew,
Some diag below :
# systemctl status atlas
mars 02 11:12:43 melusine.eu.org ATLAS[32011]: Do a controller INIT mars 02 11:12:43 melusine.eu.org ATLAS[32011]: Controller init -p 443 atlas@ctr-fsn01.atlas.ripe.net INIT mars 02 11:12:43 melusine.eu.org ATLAS[32011]: 255 controller INIT exit with error
Probe seems down since scheduled machine reboot this night.
Help for fixing welcome,
Merci, Jacques
Hi, On Tue, Mar 02, 2021 at 12:46:05PM +0100, Viktor Naumov wrote:
There was a network maintenance in Hetzner last night. Few days before the ctr-fsn01 was reinstalled. Due to minor network misconfiguration the controller's IPv6 connectivity was not restored.
I know people that have preached "if you do IPv6, *always* remember to monitor IPv4 *and* IPv6, all the time, for all services" for like 15+ years now... And I've been told that you have the largest monitoring system ever at your disposal... :-) Gert Doering -- NetMaster -- have you enabled IPv6 on something today...? SpaceNet AG Vorstand: Sebastian v. Bomhard, Michael Emmer Joseph-Dollinger-Bogen 14 Aufsichtsratsvors.: A. Grundner-Culemann D-80807 Muenchen HRB: 136055 (AG Muenchen) Tel: +49 (0)89/32356-444 USt-IdNr.: DE813185279
Le 02/03/2021 à 12:46, Viktor Naumov a écrit :
the controller's IPv6 connectivity was not restored.
For post mortem analysis : Probe #1000165 is hosted by an IPV4 only machine (no IPV6) J. Anyway : probe is ok for 5 hours, 42 minutes -- GnuPg : 156520BBC8F5B1E3 Because privacy matters. « Quand est-ce qu'on mange ? » AD (c) (tm)
Hi, On 2021-03-02 17:59, Jacques Lavignotte wrote:
Le 02/03/2021 à 12:46, Viktor Naumov a écrit :
the controller's IPv6 connectivity was not restored.
For post mortem analysis :
Probe #1000165 is hosted by an IPV4 only machine (no IPV6)
Well, in this curious case the machine ("controller") itself thought it *should* have IPv6, which indeed was the expected situation. Yet it didn't. Therefore it was very confused about its own state and as a safety measure it told its probes "hang on a minute while I figure out my own situation" :-) We designed the controller-probe protocol so that it can handle case such as this; that is, the probes will try reconnecting, and in really bad cases the system drives them to a different controller. In the meantime they execute what they were asked and store results.
J.
Anyway : probe is ok for 5 hours, 42 minutes
Good, good! Cheers, Robert
So whilst volunteering as NHS vaccination stewards we still awaiting up probe notifications by email which we have asked for before so we can see on our mobile devices. Col
On 2 Mar 2021, at 18:06, Robert Kisteleki <robert@ripe.net> wrote:
Hi,
On 2021-03-02 17:59, Jacques Lavignotte wrote:
Le 02/03/2021 à 12:46, Viktor Naumov a écrit :
the controller's IPv6 connectivity was not restored. For post mortem analysis : Probe #1000165 is hosted by an IPV4 only machine (no IPV6)
Well, in this curious case the machine ("controller") itself thought it *should* have IPv6, which indeed was the expected situation. Yet it didn't. Therefore it was very confused about its own state and as a safety measure it told its probes "hang on a minute while I figure out my own situation" :-)
We designed the controller-probe protocol so that it can handle case such as this; that is, the probes will try reconnecting, and in really bad cases the system drives them to a different controller. In the meantime they execute what they were asked and store results.
J. Anyway : probe is ok for 5 hours, 42 minutes
Good, good!
Cheers, Robert
participants (5)
-
Colin Johnston
-
Gert Doering
-
Jacques Lavignotte
-
Robert Kisteleki
-
Viktor Naumov