New subject: some thoughts and question regrding probe "stability"

17 Jul 2014

      Hi Folks,

triggered by the discussion related to DNSMON, and an issue (power, resolved)
with one of my V1 probes, I'd like to get some input or start a disussion or
an investigation.

To start with, I am not very clear what the term "stability" w/should mean in
this context, as the probes are supposed to buffer measurment data locally, at
least for a while (true?).

So, here goes...

Obviously, looking at some Atlas Stat pages, there are probes with a 100% uptime.

Now, looking a the 3 under my supervision (2x V1, 1 Anchor), ref "Connected" and
"Disconnected", there's no chance to get near that value, as all of them tend to
topple over on a regular basis, mostly for a *short* period of time in the range
of 0m(!) to some 30+m.

With respct to the bahaviour of the Anchor, which is mounted in the same rack
as the backbone router it connects to, in a Data Center, we tried to correlate
the (reported) disconnection events with the router and interface logs for the
probe. No luck there, also, no maint works or the like, so I presume the Anchor
didn't reboot or that there were "real" network problems.

Let's compare the most recent dis/connection logs for my 3 pets:

ID 6009
2014-07-14 03:58:03	3d 8h 16m	 Still Connected	
2014-05-27 03:03:54	48d 0h 46m	 2014-07-14 03:50:47	0h 7m
2014-05-20 15:19:02	6d 11h 37m	 2014-05-27 02:57:00	0h 6m
2014-05-14 21:16:56	5d 17h 59m	 2014-05-20 15:16:22	0h 2m
2014-04-08 16:03:21	36d 5h 1m	 2014-05-14 21:05:17	0h 11m

ID 0466
2014-07-13 23:31:05	3d 12h 45m	 Still Connected	
2014-07-09 23:05:40	3d 23h 54m	 2014-07-13 22:59:49	0h 31m
2014-06-16 10:53:21	23d 11h 55m	 2014-07-09 22:49:04	0h 16m
2014-05-25 09:03:06	22d 1h 38m	 2014-06-16 10:42:00	0h 11m
2014-05-24 20:34:50	11h 54m	 	 2014-05-25 08:29:12	0h 33m

ID 0414
2014-07-07 23:41:23	9d 12h 35m	 Still Connected	
2014-07-02 03:58:45	5d 19h 31m	 2014-07-07 23:29:54	0h 11m
2014-06-13 09:37:50	18d 18h 7m	 2014-07-02 03:45:08	0h 13m
2014-06-08 13:22:14	4d 20h 7m	 2014-06-13 09:29:38	0h 8m
2014-05-21 08:29:23	18d 4h 45m	 2014-06-08 13:15:11	0h 7m

Again, I fail to see some obvious correlation, what am I missing?

Does anyone else see a similar pattern?

How to start debugging, if there's anythig that needs debugging?

Thanks for your ideas and help!
Wilfried

some thoughts and question regrding probe "stability"

Wilfried Woeber

Aftab A. Siddiqui

Ross Weseloh

Jeroen van der Ham

Philip Homburg

Bryan Socha

Mike.

Philip Homburg

Wilfried Woeber

Philip Homburg

Colin Petrie

Mike.

Robert Kisteleki

Wilfried Woeber

tags

participants (9)