probe congestion?

Paul Vlaar

11 May 2016 11 May '16

12:25 p.m.

Hi all, while running a DNS UDM on a fixed set of (reused from a previous UDM) probes, I noticed the following. When I start 6 UDMs against the same set using the web UI, as a one-off measurement, starting "now", the RTTs for all of the measurements shoots up on all of the probes (500-1000+ ms). When I start the 6 tests individually, the RTTs are much lower, and close to what I'd expect them to be. It appears to me that when multiple UDMs are scheduled on the same probe, these should be run in serial, not parallel, in order to not run into congestion issues. Or am I simply expecting the wrong behaviour, and should I not schedule one-off measurements in parallel in the first place? ~paul

Show replies by date

Bajpai, Vaibhav

11 May 11 May

12:46 p.m.

...

On 11 May 2016, at 12:25, Paul Vlaar <pvlaar@afilias.info> wrote:

while running a DNS UDM on a fixed set of (reused from a previous UDM) probes, I noticed the following. When I start 6 UDMs against the same set using the web UI, as a one-off measurement, starting "now", the RTTs for all of the measurements shoots up on all of the probes (500-1000+ ms). When I start the 6 tests individually, the RTTs are much lower, and close to what I'd expect them to be.

It appears to me that when multiple UDMs are scheduled on the same probe, these should be run in serial, not parallel, in order to not run into congestion issues. Or am I simply expecting the wrong behaviour, and should I not schedule one-off measurements in parallel in the first place?

This is known [a, b]. RTT timestamps are applied in user-space. As such, if a probe is loaded with multiple measurements, the user-space time stamping will be delayed. These delays will be more pronounced on v1/v2 probes. v3 probes reduce the impact of user-space timestamping. As such, v3 probes are more suitable for latency measurements that require high precision accuracy. RIPE atlas system tags can be used to separate probes by h/w version. [a] http://www.sigcomm.org/sites/default/files/ccr/papers/2015/July/0000000-0000... [b] http://conferences.sigcomm.org/imc/2015/papers/p437.pdf

...

~paul

Best, Vaibhav =================================== Vaibhav Bajpai www.vaibhavbajpai.com Room 91, Research I School of Engineering and Sciences Jacobs University Bremen, Germany ===================================

Paul Vlaar

1:13 p.m.

On 11/5/16 12:46, Bajpai, Vaibhav wrote:

...

This is known [a, b]. RTT timestamps are applied in user-space. As such, if a probe is loaded with multiple measurements, the user-space time stamping will be delayed. These delays will be more pronounced on v1/v2 probes.

Wow, this sounds pretty bad. The actual result is really huge, look at the differences here: - as part of a 6-fold one-off UDM using the same probe set: https://atlas.ripe.net/measurements/3793146/#!probes - as a one-off UDM as well, but with manual spacing of 20s in between scheduling the next of the set of 6 UDMs: https://atlas.ripe.net/measurements/3793054/#!probes This seems like a bug to me. The time between scheduling one-off measurements using the same probe is of influence on the RTT? How can I trust the RTT at all now? What if others are using the same probe at the same time? All scheduling, including one-offs are centrally managed, no? Why can't the Atlas scheduler make sure that probes don't get loaded with more than one UDM at a time? I'm no longer trusting any of the RTTs now. Can this also happen with periodical UDMs?

...

v3 probes reduce the impact of user-space timestamping. As such, v3 probes are more suitable for latency measurements that require high precision accuracy. RIPE atlas system tags can be used to separate probes by h/w version.

Aha. I'll have a try using just v3 probes. Thank you for the PDFs BTW, I will read those when I have more time. ~paul

Philip Homburg

2:07 p.m.

On 2016/05/11 13:13 , Paul Vlaar wrote:

...

This seems like a bug to me. The time between scheduling one-off measurements using the same probe is of influence on the RTT? How can I trust the RTT at all now? What if others are using the same probe at the same time?

Statistically speaking the probes are idle, so the chance that your oneoff arrives at the probe at the same time as one from someone else is extremely low. The probe tries to execute 10 oneoffs in parallel. But considering that one average the probes are idle, it may be effective to avoid parallelism and run oneoffs sequentially. Philip

Bajpai, Vaibhav

2:51 p.m.

...

On 11 May 2016, at 13:13, Paul Vlaar <pvlaar@afilias.info> wrote:

On 11/5/16 12:46, Bajpai, Vaibhav wrote:

...
This is known [a, b]. RTT timestamps are applied in user-space. As such, if a probe is loaded with multiple measurements, the user-space time stamping will be delayed. These delays will be more pronounced on v1/v2 probes.

Wow, this sounds pretty bad. The actual result is really huge, look at the differences here:

- as part of a 6-fold one-off UDM using the same probe set:

https://atlas.ripe.net/measurements/3793146/#!probes

- as a one-off UDM as well, but with manual spacing of 20s in between scheduling the next of the set of 6 UDMs:

https://atlas.ripe.net/measurements/3793054/#!probes

This includes three v3 probes. Note, how they show comparable latencies in both measurements unlike v1/v2 probes. Fig. 4 in [a] will tell you the probe ID ranges for each h/w revision.

...

This seems like a bug to me. The time between scheduling one-off measurements using the same probe is of influence on the RTT? How can I trust the RTT at all now?

It’s generally not useful to draw any conclusions from one-off latency measurements. Run them for a longer period to get some representativeness and then look at the distributions of RTTs across the measurement duration.

...

What if others are using the same probe at the same time?

RIPE Atlas does not provide this guarantee. You have to be prepared that your measurements are running in an uncontrolled setting.

...

All scheduling, including one-offs are centrally managed, no? Why can't the Atlas scheduler make sure that probes don't get loaded with more than one UDM at a time? I'm no longer trusting any of the RTTs now. Can this also happen with periodical UDMs?

Careful vantage point selection and longitudinal measurements will give you reasonable RTT results.

...

...
v3 probes reduce the impact of user-space timestamping. As such, v3 probes are more suitable for latency measurements that require high precision accuracy. RIPE atlas system tags can be used to separate probes by h/w version.

Aha. I'll have a try using just v3 probes.

Thank you for the PDFs BTW, I will read those when I have more time.

~paul

[a] http://www.sigcomm.org/sites/default/files/ccr/papers/2015/July/0000000-0000... =================================== Vaibhav Bajpai www.vaibhavbajpai.com Room 91, Research I School of Engineering and Sciences Jacobs University Bremen, Germany ===================================

Paul Vlaar

1:57 p.m.

Same thing here, I just did another test. Ran 7 UDMs in parallel for SOA ripe.net @ manus.authdns.ripe.net, the result of one of those: https://atlas.ripe.net/measurements/3793292/#!probes Compared to running another one, but on its own w/o scheduling others in the same run: https://atlas.ripe.net/measurements/3793293/#!probes I'd call this a bug, or at least something to somehow work around, but not sure how, yet. ~paul

Daniel Karrenberg

2:06 p.m.

One-offs are not intended to be used like this. The emphasis with them is to schedule as quickly as possible *no matter what*. This is for quick look-see work. Not for repeated measurements on the same set of probes. If you want to schedule that much from an indentical set of probes, do not use one-offs. Daniel On 11.05.16 13:57 , Paul Vlaar wrote:

...

Same thing here, I just did another test. Ran 7 UDMs in parallel for SOA ripe.net @ manus.authdns.ripe.net, the result of one of those:

https://atlas.ripe.net/measurements/3793292/#!probes

Compared to running another one, but on its own w/o scheduling others in the same run:

https://atlas.ripe.net/measurements/3793293/#!probes

I'd call this a bug, or at least something to somehow work around, but not sure how, yet.

~paul

3351

Age (days ago)

3351

Last active (days ago)

List overview

Download

6 comments

4 participants

participants (4)

Bajpai, Vaibhav
Daniel Karrenberg
Paul Vlaar
Philip Homburg