[atlas]FYI probe problems after power failure
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Today I had a power failure, and luckily everything came back up when I fixed it. However, a while later, I wondered if the probe had too, and it had not. The status page said it was up, but it hadn't reported data for about 6 hours (grey graphs). I assume I hit one of the problems in the FAQ: - --- Technical background: the probe tries to get current time indication using NTP upon booting, which sometimes fails. This causes the probe to think the curent date is November 1999, which confuses it a lot; it doesn't register properly or it sends measurement data for the twentieth century which is immediately discarded by the system. We plan to convince the probe to insist on vaild NTP or acquire some other time indication. The fix will be deployed in a firmware update. - --- Since the probe boots a lot faster than my gateway, I guess it never retried NTP. Powercycle fixed it indeed. So the lesson here is (at least until there is an update that retries ntpdate, or something), if the machine that provides your Internet connection takes a while to boot up, and they are both powered on at the same time, you may need to powercycle your probe. Jelte -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkz4HX0ACgkQ4nZCKsdOncVlXgCfZe+3FbLdTWTxCvXYZFrmaXjd TsEAn3vqmfjbUmWfueVcziCi2a3Hs6jF =zqNQ -----END PGP SIGNATURE-----
On 12/02/2010 11:28 PM, Jelte Jansen wrote:
Since the probe boots a lot faster than my gateway, I guess it never retried NTP. Powercycle fixed it indeed.
So the lesson here is (at least until there is an update that retries ntpdate, or something), if the machine that provides your Internet connection takes a while to boot up, and they are both powered on at the same time, you may need to powercycle your probe.
FWIW, we deployed our probe in our datacenter on a friday and due to a typo in the firewall config the probe didn't receive any responses from the internet. We fixed the firewall and the probe started doing it's thing (pinging, connecting to ripe's infrastructure etc.). So we did not powercycle the probe as it appeared to be working. Later we noticed the grey bars and were told by Antony that we hit the ntp problem. We didn't want to send someone to the datacentre on the weekend and planed to powercycle the probe the next monday. Just before we unplugged the probe we had another look at the graphs and they were fine. It appears the probe will eventually do another ntp request, maybe every 24 hours? It looks like if it's really really a PITA to get to the probe you can just wait a day or so and it will fix itself. (Btw. the probe still has the "original" 3830 firmware so I don't think it has a fix for the ntp problem)
Jelte
regards, florian -- I remember yesterday, but the memory is in my head now. Was yesterday real? Or is it only the memory that is real?
participants (2)
-
Florian Obser
-
Jelte Jansen