Actual measurement interval much larger than planned
Dear list, I encountered some cases where the actual time interval between two neighbouring measurements being much larger than the scheduled value. Have you witnessed similar cases? I appreciate explanations on underlying reasons. Here goes an example: from ripe.atlas.cousteau import AtlasResultsRequest filters = dict(msm_id=1010, probe_ids=[16981], start=1470055258, stop=1470056217) is_success, results = AtlasResultsRequest(**filters).create() if is_success: for mes in results: print mes.get('timestamp', -1) The above code requests some built-in ping measurements (240s interval) made by probe 16981 and prints their timestamps. See below for the execution result: 1470055259 1470055499 1470055978 1470056216 Four measurements are retrieved within the given time range. I noticed that time interval between the 2nd and the 3rd measurement is 479s much larger than the planned value 240s. How come? Many thanks for your attention. Best regards, wenqin
Hi Wenqin, It may be because the probe is busy doing other things. We noticed a high interval between the scheduling and the actual measurement of one-off measurements on busy probes in http://conferences.sigcomm.org/imc/2015/papers/p437.pdf <http://conferences.sigcomm.org/imc/2015/papers/p437.pdf> Cristel
On Sep 1, 2016, at 7:14 PM, Wenqin SHAO <wenqin.shao@telecom-paristech.fr> wrote:
Dear list,
I encountered some cases where the actual time interval between two neighbouring measurements being much larger than the scheduled value. Have you witnessed similar cases? I appreciate explanations on underlying reasons.
Here goes an example:
from ripe.atlas.cousteau import AtlasResultsRequest filters = dict(msm_id=1010, probe_ids=[16981], start=1470055258, stop=1470056217) is_success, results = AtlasResultsRequest(**filters).create() if is_success: for mes in results: print mes.get('timestamp', -1)
The above code requests some built-in ping measurements (240s interval) made by probe 16981 and prints their timestamps. See below for the execution result:
1470055259 1470055499 1470055978 1470056216
Four measurements are retrieved within the given time range. I noticed that time interval between the 2nd and the 3rd measurement is 479s much larger than the planned value 240s. How come?
Many thanks for your attention.
Best regards, wenqin
On 2016-09-01 19:14, Wenqin SHAO wrote:
1470055259 1470055499 1470055978 1470056216
Four measurements are retrieved within the given time range. I noticed that time interval between the 2nd and the 3rd measurement is 479s much larger than the planned value 240s. How come?
479 is almost exactly twice the frequency (240), so it's basically one result skipped. Most likely the probe just didn't execute that measurement (was powered down? rebooting? was refreshing its set of tasks just at the worst time?) Regards, Robert
Hi Cristel, hi Robert, Thanks a lot for your feedback. 1. @Cristel, yes, you’re right, I’m aware of this scheduling issue. Indeed, a big part of such ‘abnormal’ intervals is reasonably ‘short’, 55% are between 2 and 4 planned value, among 510 cases seen in 1074 probes traces over a week’s time. 2. As observed by @Robert, most ‘abnormal’ interval is actually very close to integer times of planned value. If I get your message right, you are indicating that planned measurements can be skipped for the reasons you mentioned and shall kick-off again following the previous timing, even after rebooting? Again many thanks. Regards, wenqin
On 02 Sep 2016, at 09:53, Robert Kisteleki <robert@ripe.net> wrote:
479 is almost exactly twice the frequency (240), so it's basically one result skipped. Most likely the probe just didn't execute that measurement (was powered down? rebooting? was refreshing its set of tasks just at the worst time?)
2. As observed by @Robert, most ‘abnormal’ interval is actually very close to integer times of planned value. If I get your message right, you are indicating that planned measurements can be skipped for the reasons you mentioned and shall kick-off again following the previous timing, even after rebooting?
Kind of. There's a reasonable expectation that probes will measure with the specified frequency, but reality is that for various reasons you'll not see all the results all the time. Regards, Robert
Thanks for confirming. The specified frequency is indeed well respected. When there is no data-missing, the interval shift rarely exceed 14s, small compared to 240s the scheduled interval. What intrigues me is that the exact phase/timing is as well kept after power cut and reboot. By the way, can a measurement be as well skipped, as designed behaviour, due to scheduling issues mentioned by @Cristel? Thanks, wenqin
On 02 Sep 2016, at 11:45, Robert Kisteleki <robert@ripe.net> wrote:
2. As observed by @Robert, most ‘abnormal’ interval is actually very close to integer times of planned value. If I get your message right, you are indicating that planned measurements can be skipped for the reasons you mentioned and shall kick-off again following the previous timing, even after rebooting?
Kind of. There's a reasonable expectation that probes will measure with the specified frequency, but reality is that for various reasons you'll not see all the results all the time.
Regards, Robert
On 2016-09-02 12:20, Wenqin SHAO wrote:
Thanks for confirming. The specified frequency is indeed well respected. When there is no data-missing, the interval shift rarely exceed 14s, small compared to 240s the scheduled interval. What intrigues me is that the exact phase/timing is as well kept after power cut and reboot.
The probes have a crontab-like mechanism to remember what they need to do. As long as their clock is more or less ok, they will stick to the pre-allocated times and tasks.
By the way, can a measurement be as well skipped, as designed behaviour, due to scheduling issues mentioned by @Cristel?
We're trying to avoid overloading probes, but not everything is under our full control. Some measurements can pile up; Cristel & Randy & co. had a paper about the observed (worst-case) behaviour. Regards, Robert
Hi list, A couple of weeks ago, I asked in this thread why some built-in measurements are missing, or not performed at scheduled interval. Cristel and Robert very kindly shared what they thought could be the causes: scheduling, probe reboot, updating task list, etc. The discussion gave me the idea to verify if there are as well missing measurements while the probe is powered and connected to an Atlas controller, i.e. probe seemly works in a good condition. Here, I would like to share one case among many I’ve observed where built-in measurements are missed continuously for a long time, even when the probe is well connected to a controller. Let’s look at a time window from '2016-06-16 21:53:20 +0000’ to '2016-06-18 20:16:40 +0000’ for probe 22144. First, I queried its connection events to Atlas controller (msm_id 7000). The result says the probe connected to a controller at '2016-06-16 21:54:19 +0000’ and became disconnected at '2016-06-18 20:13:48 +0000’. Between these two moments, the probe is supposed to remain connected, and thus continuously powered. Then, I queried the built-in ping measurements toward b-root (msm_id 1010) within the time window. Here below the timestamps at which measurements are performed. [‘2016-06-16 22:51:08 +0000’, '2016-06-17 02:07:08 +0000’, '2016-06-17 03:11:02 +0000', '2016-06-17 04:03:07 +0000’, '2016-06-17 05:23:03 +0000’, '2016-06-17 07:35:17 +0000', '2016-06-17 10:51:06 +0000’, '2016-06-17 14:07:04 +0000’, '2016-06-17 15:11:03 +0000’, '2016-06-17 17:23:06 +0000’, '2016-06-17 18:27:04 +0000’, '2016-06-17 20:39:04 +0000’, '2016-06-17 22:51:09 +0000’, '2016-06-17 23:55:03 +0000’, '2016-06-18 02:07:08 +0000', '2016-06-18 03:11:04 +0000’, '2016-06-18 06:27:05 +0000’, '2016-06-18 08:39:02 +0000', '2016-06-18 10:27:13 +0000’, '2016-06-18 10:51:08 +0000’, '2016-06-18 11:55:13 +0000', '2016-06-18 12:03:13 +0000’, '2016-06-18 15:11:04 +0000’, '2016-06-18 17:23:05 +0000', '2016-06-18 18:27:09 +0000’, '2016-06-18 18:43:13 +0000’, '2016-06-18 19:35:18 +0000', '2016-06-18 20:15:06 +0000’] We can see the intervals between neighbouring measurements are much larger than the planned value 240sec. I investigated as well other built-in ping measurements, say toward k-root (msm_id 1001). Here below are the timestamps: ['2016-06-16 22:51:08 +0000’, '2016-06-17 02:07:07 +0000’, '2016-06-17 03:11:05 +0000', '2016-06-17 04:03:08 +0000’, 2016-06-17 05:23:03 +0000’, '2016-06-17 07:35:17 +0000', '2016-06-17 10:51:10 +0000’, '2016-06-17 14:07:04 +0000’, '2016-06-17 15:11:04 +0000', …] Very similar phenomenon is observed. Between the first two measurements in the above lists, there is an interval of more than one hour, which can hardly be explained by measurement secluding issue or temporarily high load. What’s more, the probe remained connected at those moments, therefore is free of reboot and power-off.. As a reference, probe 12657 has all the measurements coming at due interval within the time window. What could be the possible causes behind such missing is my doubt. And I do appreciate your thinkings on this so that the measurements can be processed and analysed with propre caution. Thanks. Regards, wenqin
On 02 Sep 2016, at 12:57, Robert Kisteleki <robert@ripe.net> wrote:
On 2016-09-02 12:20, Wenqin SHAO wrote:
Thanks for confirming. The specified frequency is indeed well respected. When there is no data-missing, the interval shift rarely exceed 14s, small compared to 240s the scheduled interval. What intrigues me is that the exact phase/timing is as well kept after power cut and reboot.
The probes have a crontab-like mechanism to remember what they need to do. As long as their clock is more or less ok, they will stick to the pre-allocated times and tasks.
By the way, can a measurement be as well skipped, as designed behaviour, due to scheduling issues mentioned by @Cristel?
We're trying to avoid overloading probes, but not everything is under our full control. Some measurements can pile up; Cristel & Randy & co. had a paper about the observed (worst-case) behaviour.
Regards, Robert
Hi Wenqin, that was a weird case where specific probe has time synchronization issues. We saw a lot of debug messages in our raw logs coming from probe complaining about time sync issues. If you have more examples including other probes please send us details and we will happily check it for you. Regards, Andreas On 16/10/16 22:23, Wenqin SHAO wrote:
Hi list,
A couple of weeks ago, I asked in this thread why some built-in measurements are missing, or not performed at scheduled interval. Cristel and Robert very kindly shared what they thought could be the causes: scheduling, probe reboot, updating task list, etc.
The discussion gave me the idea to verify if there are as well missing measurements while the probe is powered and connected to an Atlas controller, i.e. probe seemly works in a good condition.
Here, I would like to share one case among many I’ve observed where built-in measurements are missed continuously for a long time, even when the probe is well connected to a controller.
Let’s look at a time window from '2016-06-16 21:53:20 +0000’ to '2016-06-18 20:16:40 +0000’ for probe 22144.
First, I queried its connection events to Atlas controller (msm_id 7000). The result says the probe connected to a controller at '2016-06-16 21:54:19 +0000’ and became disconnected at '2016-06-18 20:13:48 +0000’. Between these two moments, the probe is supposed to remain connected, and thus continuously powered.
Then, I queried the built-in ping measurements toward b-root (msm_id 1010) within the time window. Here below the timestamps at which measurements are performed. [‘2016-06-16 22:51:08 +0000’, '2016-06-17 02:07:08 +0000’, '2016-06-17 03:11:02 +0000', '2016-06-17 04:03:07 +0000’, '2016-06-17 05:23:03 +0000’, '2016-06-17 07:35:17 +0000', '2016-06-17 10:51:06 +0000’, '2016-06-17 14:07:04 +0000’, '2016-06-17 15:11:03 +0000’, '2016-06-17 17:23:06 +0000’, '2016-06-17 18:27:04 +0000’, '2016-06-17 20:39:04 +0000’, '2016-06-17 22:51:09 +0000’, '2016-06-17 23:55:03 +0000’, '2016-06-18 02:07:08 +0000', '2016-06-18 03:11:04 +0000’, '2016-06-18 06:27:05 +0000’, '2016-06-18 08:39:02 +0000', '2016-06-18 10:27:13 +0000’, '2016-06-18 10:51:08 +0000’, '2016-06-18 11:55:13 +0000', '2016-06-18 12:03:13 +0000’, '2016-06-18 15:11:04 +0000’, '2016-06-18 17:23:05 +0000', '2016-06-18 18:27:09 +0000’, '2016-06-18 18:43:13 +0000’, '2016-06-18 19:35:18 +0000', '2016-06-18 20:15:06 +0000’] We can see the intervals between neighbouring measurements are much larger than the planned value 240sec.
I investigated as well other built-in ping measurements, say toward k-root (msm_id 1001). Here below are the timestamps: ['2016-06-16 22:51:08 +0000’, '2016-06-17 02:07:07 +0000’, '2016-06-17 03:11:05 +0000', '2016-06-17 04:03:08 +0000’, 2016-06-17 05:23:03 +0000’, '2016-06-17 07:35:17 +0000', '2016-06-17 10:51:10 +0000’, '2016-06-17 14:07:04 +0000’, '2016-06-17 15:11:04 +0000', …] Very similar phenomenon is observed.
Between the first two measurements in the above lists, there is an interval of more than one hour, which can hardly be explained by measurement secluding issue or temporarily high load. What’s more, the probe remained connected at those moments, therefore is free of reboot and power-off.. As a reference, probe 12657 has all the measurements coming at due interval within the time window.
What could be the possible causes behind such missing is my doubt. And I do appreciate your thinkings on this so that the measurements can be processed and analysed with propre caution.
Thanks.
Regards, wenqin
On 02 Sep 2016, at 12:57, Robert Kisteleki <robert@ripe.net <mailto:robert@ripe.net>> wrote:
On 2016-09-02 12:20, Wenqin SHAO wrote:
Thanks for confirming. The specified frequency is indeed well respected. When there is no data-missing, the interval shift rarely exceed 14s, small compared to 240s the scheduled interval. What intrigues me is that the exact phase/timing is as well kept after power cut and reboot.
The probes have a crontab-like mechanism to remember what they need to do. As long as their clock is more or less ok, they will stick to the pre-allocated times and tasks.
By the way, can a measurement be as well skipped, as designed behaviour, due to scheduling issues mentioned by @Cristel?
We're trying to avoid overloading probes, but not everything is under our full control. Some measurements can pile up; Cristel & Randy & co. had a paper about the observed (worst-case) behaviour.
Regards, Robert
Hi Andreas, Thank you for showing interest and looking into this case. Here attached is a non-exhaustive list of the probes and corresponding time vicinity where missing happens for built-in ping toward b-root (mom id 1010). Hope it could be useful. Regards, wenqin
On 19 Oct 2016, at 14:28, Andreas Strikos <astrikos@ripe.net> wrote:
Hi Wenqin,
that was a weird case where specific probe has time synchronization issues. We saw a lot of debug messages in our raw logs coming from probe complaining about time sync issues. If you have more examples including other probes please send us details and we will happily check it for you.
Regards, Andreas
On 16/10/16 22:23, Wenqin SHAO wrote:
Hi list,
A couple of weeks ago, I asked in this thread why some built-in measurements are missing, or not performed at scheduled interval. Cristel and Robert very kindly shared what they thought could be the causes: scheduling, probe reboot, updating task list, etc.
The discussion gave me the idea to verify if there are as well missing measurements while the probe is powered and connected to an Atlas controller, i.e. probe seemly works in a good condition.
Here, I would like to share one case among many I’ve observed where built-in measurements are missed continuously for a long time, even when the probe is well connected to a controller.
Let’s look at a time window from '2016-06-16 21:53:20 +0000’ to '2016-06-18 20:16:40 +0000’ for probe 22144.
First, I queried its connection events to Atlas controller (msm_id 7000). The result says the probe connected to a controller at '2016-06-16 21:54:19 +0000’ and became disconnected at '2016-06-18 20:13:48 +0000’. Between these two moments, the probe is supposed to remain connected, and thus continuously powered.
Then, I queried the built-in ping measurements toward b-root (msm_id 1010) within the time window. Here below the timestamps at which measurements are performed. [‘2016-06-16 22:51:08 +0000’, '2016-06-17 02:07:08 +0000’, '2016-06-17 03:11:02 +0000', '2016-06-17 04:03:07 +0000’, '2016-06-17 05:23:03 +0000’, '2016-06-17 07:35:17 +0000', '2016-06-17 10:51:06 +0000’, '2016-06-17 14:07:04 +0000’, '2016-06-17 15:11:03 +0000’, '2016-06-17 17:23:06 +0000’, '2016-06-17 18:27:04 +0000’, '2016-06-17 20:39:04 +0000’, '2016-06-17 22:51:09 +0000’, '2016-06-17 23:55:03 +0000’, '2016-06-18 02:07:08 +0000', '2016-06-18 03:11:04 +0000’, '2016-06-18 06:27:05 +0000’, '2016-06-18 08:39:02 +0000', '2016-06-18 10:27:13 +0000’, '2016-06-18 10:51:08 +0000’, '2016-06-18 11:55:13 +0000', '2016-06-18 12:03:13 +0000’, '2016-06-18 15:11:04 +0000’, '2016-06-18 17:23:05 +0000', '2016-06-18 18:27:09 +0000’, '2016-06-18 18:43:13 +0000’, '2016-06-18 19:35:18 +0000', '2016-06-18 20:15:06 +0000’] We can see the intervals between neighbouring measurements are much larger than the planned value 240sec.
I investigated as well other built-in ping measurements, say toward k-root (msm_id 1001). Here below are the timestamps: ['2016-06-16 22:51:08 +0000’, '2016-06-17 02:07:07 +0000’, '2016-06-17 03:11:05 +0000', '2016-06-17 04:03:08 +0000’, 2016-06-17 05:23:03 +0000’, '2016-06-17 07:35:17 +0000', '2016-06-17 10:51:10 +0000’, '2016-06-17 14:07:04 +0000’, '2016-06-17 15:11:04 +0000', …] Very similar phenomenon is observed.
Between the first two measurements in the above lists, there is an interval of more than one hour, which can hardly be explained by measurement secluding issue or temporarily high load. What’s more, the probe remained connected at those moments, therefore is free of reboot and power-off.. As a reference, probe 12657 has all the measurements coming at due interval within the time window.
What could be the possible causes behind such missing is my doubt. And I do appreciate your thinkings on this so that the measurements can be processed and analysed with propre caution.
Thanks.
Regards, wenqin
On 02 Sep 2016, at 12:57, Robert Kisteleki <robert@ripe.net <mailto:robert@ripe.net>> wrote:
On 2016-09-02 12:20, Wenqin SHAO wrote:
Thanks for confirming. The specified frequency is indeed well respected. When there is no data-missing, the interval shift rarely exceed 14s, small compared to 240s the scheduled interval. What intrigues me is that the exact phase/timing is as well kept after power cut and reboot.
The probes have a crontab-like mechanism to remember what they need to do. As long as their clock is more or less ok, they will stick to the pre-allocated times and tasks.
By the way, can a measurement be as well skipped, as designed behaviour, due to scheduling issues mentioned by @Cristel?
We're trying to avoid overloading probes, but not everything is under our full control. Some measurements can pile up; Cristel & Randy & co. had a paper about the observed (worst-case) behaviour.
Regards, Robert
participants (4)
-
Andreas Strikos
-
Cristel Pelsser
-
Robert Kisteleki
-
Wenqin SHAO