Also interesting: This event is pretty clearly visible in the "probe up count" graph: <http://atlas.ripe.net/dynamic/stats/stats.goal.png> On Dec 17, 2010, at 6:09 AM, Robert Kisteleki wrote:
Dear All,
Here's an update to Daniel's message from yesterday.
As Daniel mentioned, on Wednesday evening our system started to migrate the probes away from a particular controller (ronin, in DE). We have a strong suspicion on why this happened, but it's not confirmed so I'm not going to publicly speculate :-) In any case, since we don't yet have enough spare capacity to handle this situation, another controller was overloaded.
We needed to fix the internal databases on these controllers, which took some time. We were able to bring the system back to a stable state by the afternoon.
This morning we revived some probes (25 or so) which were in a limbo -- they were not properly connected. We forced them to re-connect, so they are fine now. There are still some of them, like 10 or so, which are not connected (down) so we can't really help those from here. If your probe was working properly before Wednesday, but now is down, then please power cycle it (using the USB power) and it will very likely come back fine.
Probes in the US (and Asia, very likely) were not affected, as they have a local controller on the west coast, which was not involved. That's because the system really doesn't like to send European probes to it, it's too far.
Let us know if there's anything else not working properly, so that we can look into it.
Regards, Robert
On 2010.12.16. 14:57, Daniel Karrenberg wrote:
Intermediate update to keep those interested informed. I am writing this to keep the engineers free to work the problem. I do not know nitty gritty details, so this is a general overview.
[...]