Re: [atlas]Probe flapping

17 Dec 2010

      Also interesting: This event is pretty clearly visible in the "probe up count" graph:
<http://atlas.ripe.net/dynamic/stats/stats.goal.png>

On Dec 17, 2010, at 6:09 AM, Robert Kisteleki wrote:
...
Dear All,
Here's an update to Daniel's message from yesterday.
As Daniel mentioned, on Wednesday evening our system started to migrate the
probes away from a particular controller (ronin, in DE). We have a strong
suspicion on why this happened, but it's not confirmed so I'm not going to
publicly speculate :-) In any case, since we don't yet have enough spare
capacity to handle this situation, another controller was overloaded.
We needed to fix the internal databases on these controllers, which took
some time. We were able to bring the system back to a stable state by the
afternoon.
This morning we revived some probes (25 or so) which were in a limbo -- they
were not properly connected. We forced them to re-connect, so they are fine
now. There are still some of them, like 10 or so, which are not connected
(down) so we can't really help those from here. If your probe was working
properly before Wednesday, but now is down, then please power cycle it
(using the USB power) and it will very likely come back fine.
Probes in the US (and Asia, very likely) were not affected, as they have a
local controller on the west coast, which was not involved. That's because
the system really doesn't like to send European probes to it, it's too far.
Let us know if there's anything else not working properly, so that we can
look into it.
Regards,
Robert
On 2010.12.16. 14:57, Daniel Karrenberg wrote:
...
Intermediate update to keep those interested informed. 
I am writing this to keep the engineers free to work
the problem. I do not know nitty gritty details, so 
this is a general overview.
[...]

Re: [atlas]Probe flapping

Richard L. Barnes