Re: [atlas] Actual measurement interval much larger than planned

16 Oct 2016

      Hi list,

A couple of weeks ago, I asked in this thread why some built-in measurements are missing, or not performed at scheduled interval.
Cristel and Robert very kindly shared what they thought could be the causes: scheduling, probe reboot, updating task list, etc.

The discussion gave me the idea to verify if there are as well missing measurements while the probe is powered and connected to an Atlas controller, i.e. probe seemly works in a good condition.

Here, I would like to share one case among many I’ve observed where built-in measurements are missed continuously for a long time, even when the probe is well connected to a controller.

Let’s look at a time window from '2016-06-16 21:53:20 +0000’ to '2016-06-18 20:16:40 +0000’ for probe 22144.

First, I queried its connection events to Atlas controller (msm_id 7000).
The result says the probe connected to a controller at '2016-06-16 21:54:19 +0000’ and became disconnected at '2016-06-18 20:13:48 +0000’. Between these two moments, the probe is supposed to remain connected, and thus continuously powered.

Then, I queried the built-in ping measurements toward b-root (msm_id 1010) within the time window.
Here below the timestamps at which measurements are performed.
[‘2016-06-16 22:51:08 +0000’, '2016-06-17 02:07:08 +0000’, '2016-06-17 03:11:02 +0000',
 '2016-06-17 04:03:07 +0000’, '2016-06-17 05:23:03 +0000’, '2016-06-17 07:35:17 +0000',
 '2016-06-17 10:51:06 +0000’, '2016-06-17 14:07:04 +0000’, '2016-06-17 15:11:03 +0000’,
'2016-06-17 17:23:06 +0000’, '2016-06-17 18:27:04 +0000’, '2016-06-17 20:39:04 +0000’, 
'2016-06-17 22:51:09 +0000’, '2016-06-17 23:55:03 +0000’, '2016-06-18 02:07:08 +0000',
'2016-06-18 03:11:04 +0000’, '2016-06-18 06:27:05 +0000’, '2016-06-18 08:39:02 +0000',
 '2016-06-18 10:27:13 +0000’, '2016-06-18 10:51:08 +0000’, '2016-06-18 11:55:13 +0000',
 '2016-06-18 12:03:13 +0000’, '2016-06-18 15:11:04 +0000’, '2016-06-18 17:23:05 +0000',
 '2016-06-18 18:27:09 +0000’, '2016-06-18 18:43:13 +0000’, '2016-06-18 19:35:18 +0000',
 '2016-06-18 20:15:06 +0000’]
We can see the intervals between neighbouring measurements are much larger than the planned value 240sec.

I investigated as well other built-in ping measurements, say toward k-root (msm_id 1001). 
Here below are the timestamps:
['2016-06-16 22:51:08 +0000’, '2016-06-17 02:07:07 +0000’, '2016-06-17 03:11:05 +0000',
 '2016-06-17 04:03:08 +0000’, 2016-06-17 05:23:03 +0000’, '2016-06-17 07:35:17 +0000',
 '2016-06-17 10:51:10 +0000’, '2016-06-17 14:07:04 +0000’, '2016-06-17 15:11:04 +0000',
 …]
Very similar phenomenon is observed.

Between the first two measurements in the above lists, there is an interval of more than one hour, which can hardly be explained by measurement secluding issue or temporarily high load.
What’s more, the probe remained connected at those moments, therefore is free of reboot and power-off..
As a reference, probe 12657 has all the measurements coming at due interval within the time window.

What could be the possible causes behind such missing is my doubt. 
And I do appreciate your thinkings on this so that the measurements can be processed and analysed with propre caution.

Thanks.

Regards,
wenqin
...
On 02 Sep 2016, at 12:57, Robert Kisteleki <robert@ripe.net> wrote:
On 2016-09-02 12:20, Wenqin SHAO wrote:
...
Thanks for confirming. The specified frequency is indeed well respected. When there is no data-missing, the interval shift rarely exceed 14s, small compared to 240s the scheduled interval.
What intrigues me is that the exact phase/timing is as well kept after power cut and reboot.
The probes have a crontab-like mechanism to remember what they need to do.
As long as their clock is more or less ok, they will stick to the
pre-allocated times and tasks.
...
By the way, can a measurement be as well skipped, as designed behaviour, due to scheduling issues mentioned by @Cristel?
We're trying to avoid overloading probes, but not everything is under our
full control. Some measurements can pile up; Cristel & Randy & co. had a
paper about the observed (worst-case) behaviour.
Regards,
Robert