For what it's worth, I've seen similar trouble over the last couple days. I've only had my probe hooked up since the 2nd of July, and compared to some of you it's a pretty basic local network. Unfortunately the 2 times it's gone down have been two hours before our store opens for the day so I haven't been here to see what's up. Of course it's very possible that something on our end is just going down and I don't know about it, so I'll keep an eye on it.

Ross Weseloh


On Thu, Jul 17, 2014 at 8:03 AM, Aftab A. Siddiqui <aftabs@cyber.net.pk> wrote:
Hi Wilfred,
Atleast your probes were online for many number of days. Here is the
availability report of my V1 probe 0303. 99.71% Availability.

+---------------------+---------------------+------------+--------------+
| Connected (UTC)     | Disconnected (UTC)  | Connected  | Disconnected |
|---------------------+---------------------+------------+--------------+
| 2014-05-29 23:46:12 | 2014-06-02 06:30:06 |   1d 06:30 |     0d 00:00 |
| 2014-06-02 06:40:35 | 2014-06-03 06:52:14 |   1d 00:11 |     0d 00:10 |
| 2014-06-03 06:59:53 | 2014-06-04 22:11:56 |   1d 15:12 |     0d 00:07 |
| 2014-06-04 22:22:43 | 2014-06-16 15:48:25 |  11d 17:25 |     0d 00:10 |
| 2014-06-16 15:59:17 | 2014-06-17 22:11:24 |   1d 06:12 |     0d 00:10 |
| 2014-06-17 22:22:53 | 2014-06-21 21:13:51 |   3d 22:50 |     0d 00:11 |
| 2014-06-21 21:41:35 | 2014-06-23 15:44:56 |   1d 18:03 |     0d 00:27 |
| 2014-06-23 15:54:55 | 2014-06-29 04:19:02 |   5d 12:24 |     0d 00:09 |
| 2014-06-29 04:53:22 | Still up            |   1d 19:06 |     0d 00:34 |
+---------------------+---------------------+------------+--------------+

It is directly connected to our core router. I was never able to correlate
any of the disconnection times with any network incident.

Best Wishes,

Aftab A. Siddiqui


-----Original Message-----
From: ripe-atlas-bounces@ripe.net [mailto:ripe-atlas-bounces@ripe.net] On
Behalf Of Wilfried Woeber
Sent: Thursday, July 17, 2014 5:49 PM
To: ripe-atlas@ripe.net
Subject: [atlas] some thoughts and question regrding probe "stability"


Hi Folks,

triggered by the discussion related to DNSMON, and an issue (power,
resolved) with one of my V1 probes, I'd like to get some input or start a
disussion or an investigation.

To start with, I am not very clear what the term "stability" w/should mean
in this context, as the probes are supposed to buffer measurment data
locally, at least for a while (true?).

So, here goes...

Obviously, looking at some Atlas Stat pages, there are probes with a 100%
uptime.

Now, looking a the 3 under my supervision (2x V1, 1 Anchor), ref "Connected"
and "Disconnected", there's no chance to get near that value, as all of them
tend to topple over on a regular basis, mostly for a *short* period of time
in the range of 0m(!) to some 30+m.

With respct to the bahaviour of the Anchor, which is mounted in the same
rack as the backbone router it connects to, in a Data Center, we tried to
correlate the (reported) disconnection events with the router and interface
logs for the probe. No luck there, also, no maint works or the like, so I
presume the Anchor didn't reboot or that there were "real" network problems.

Let's compare the most recent dis/connection logs for my 3 pets:

ID 6009
2014-07-14 03:58:03     3d 8h 16m        Still Connected
2014-05-27 03:03:54     48d 0h 46m       2014-07-14 03:50:47    0h 7m
2014-05-20 15:19:02     6d 11h 37m       2014-05-27 02:57:00    0h 6m
2014-05-14 21:16:56     5d 17h 59m       2014-05-20 15:16:22    0h 2m
2014-04-08 16:03:21     36d 5h 1m        2014-05-14 21:05:17    0h 11m

ID 0466
2014-07-13 23:31:05     3d 12h 45m       Still Connected
2014-07-09 23:05:40     3d 23h 54m       2014-07-13 22:59:49    0h 31m
2014-06-16 10:53:21     23d 11h 55m      2014-07-09 22:49:04    0h 16m
2014-05-25 09:03:06     22d 1h 38m       2014-06-16 10:42:00    0h 11m
2014-05-24 20:34:50     11h 54m          2014-05-25 08:29:12    0h 33m

ID 0414
2014-07-07 23:41:23     9d 12h 35m       Still Connected
2014-07-02 03:58:45     5d 19h 31m       2014-07-07 23:29:54    0h 11m
2014-06-13 09:37:50     18d 18h 7m       2014-07-02 03:45:08    0h 13m
2014-06-08 13:22:14     4d 20h 7m        2014-06-13 09:29:38    0h 8m
2014-05-21 08:29:23     18d 4h 45m       2014-06-08 13:15:11    0h 7m

Again, I fail to see some obvious correlation, what am I missing?

Does anyone else see a similar pattern?

How to start debugging, if there's anythig that needs debugging?

Thanks for your ideas and help!
Wilfried