packet loss seen by probes to labs.ripe.net? - ripe-atlas - mailman.ripe.net

newer
Outage on one UI server

packet loss seen by probes to labs.ripe.net?

older
[atlas]Feature request: tag...

Wilfried Woeber, UniVie/ACOnet

2 Nov 2011 2 Nov '11

5:19 p.m.

Dear Atlas Team, it looks like many (or even most/all?) of the probes seem to "seen" an increasing rate of packet loss to destination labs.ripe.net (on IPv4). The problem(?) seems to have started around Week 41 and gets worse since. Could the Atlas Team have a look at that? Thanks, Wilfried.

Show replies by date

Robert Kisteleki

2 Nov 2 Nov

6:35 p.m.

On 2011.11.02. 17:19, Wilfried Woeber, UniVie/ACOnet wrote:

Dear Atlas Team,

it looks like many (or even most/all?) of the probes seem to "seen" an increasing rate of packet loss to destination labs.ripe.net (on IPv4).

The problem(?) seems to have started around Week 41 and gets worse since.

Could the Atlas Team have a look at that?

Thanks, Wilfried.

Hi, The preliminary answer is that there is ICMP rate limiting close to the Labs servers. The more probes execute ping measurements, the more often they trigger the limiting. Since the network grew continuously in the past weeks (and before), rising failure rates per probe can be expected. I'll try to confirm this with people who know more about the Labs architecture. But if I'm right (I think I am), then Atlas is measuring precisely what it should... and as a by-product, also its own growth :-) Regards, Robert

Daniel AJ Sokolov (lists)

3 Nov 3 Nov

5:03 a.m.

On 02.11.2011 14:35 wrote Robert Kisteleki:

The preliminary answer is that there is ICMP rate limiting close to the Labs servers. The more probes execute ping measurements, the more often they trigger the limiting. Since the network grew continuously in the past weeks (and before), rising failure rates per probe can be expected.

I'll try to confirm this with people who know more about the Labs architecture. But if I'm right (I think I am), then Atlas is measuring precisely what it should... and as a by-product, also its own growth :-)

So, in other words: The Atlas probes themselves are turning into a DDOS botnet? BR Daniel AJ

Robert Kisteleki

4 Nov 4 Nov

12:05 p.m.

So, in other words: The Atlas probes themselves are turning into a DDOS botnet?

Daniel, We are very conscious that we need to prevent RIPE Atlas from being used in a malicious way, including it being used for DDOS attacks. There are a number of features in the design that prevent that from happening: - probes are low power compared to real bots - the control infrastructure is reasonably secure against the expected threats - user defined measurements can/will be rate limited per destination The "built-in" measurements we are executing currently are rate limited by the rate of deployment: all probes do exactly the same measurements. However, if this becomes a problem, we can change the amount and distribution of these measurements. Regards, Robert

Wilfried Woeber, UniVie/ACOnet

3 Nov 3 Nov

11:19 a.m.

Robert Kisteleki wrote: [...]

I'll try to confirm this with people who know more about the Labs architecture. But if I'm right (I think I am), then Atlas is measuring precisely what it should... and as a by-product, also its own growth :-)

Hi Robert, nice line of thought :-) Actually, from a purely technical point of view, I agree with your interpretation.

From a probe-herder's point of view, this behaviour sent me on a bounty hunt on our end, to find out why there is packet loss between our office network and the labs site :-)

Regards, Robert

I think both ends did learn a little by this :-) Wilfried. PS: please do not take away the Labs (or another dedicated for ATLAS) target "at the NCC", I consider that a nice reference point for connectivity

Philip Homburg

12:55 p.m.

On 11/3/11 11:19 , Wilfried Woeber, UniVie/ACOnet wrote:

Robert Kisteleki wrote: [...]

...
I'll try to confirm this with people who know more about the Labs architecture. But if I'm right (I think I am), then Atlas is measuring precisely what it should... and as a by-product, also its own growth :-) Hi Robert, nice line of thought :-)

Actually, from a purely technical point of view, I agree with your interpretation.

...
From a probe-herder's point of view, this behaviour sent me on a bounty hunt on our end, to find out why there is packet loss between our office network and the labs site :-)

I checked only one probe (the one I have at home) but the traceroute for labs is quite telling. Though I'm surprised that it is RIPE border router that is dropping the packets.

Robert Kisteleki

4 Nov 4 Nov

12:08 p.m.

The preliminary answer is that there is ICMP rate limiting close to the Labs servers. The more probes execute ping measurements, the more often they trigger the limiting. Since the network grew continuously in the past weeks (and before), rising failure rates per probe can be expected.

This has been confirmed by our IT team in the meantime. Regards, Robert

Daniel AJ Sokolov (lists)

5 Nov 5 Nov

9:38 p.m.

New subject: Sudden increase in RTT to a.root-servers.net

Hi, This is not urgent and probably not important. I have noticed that with the turn of the month (31st-> 1st) the RTT from my probe (#1118) to a.root-servers.net has gone up a lot. It went from ~32ms to ~115 ms (with a max of 494 ms). I checked the statistics of a few other public probes ("randomly" picked) and could not find that pattern repeated there. So what does this tell me? Has my ISP downgraded one of their uplinks? There is only three other probes here in Canada and they are not public, so I can't check their numbers. Maybe one of you has a few moments to enlighten me. I try to understand and learn from my probe's stats. :-) TNX Daniel AJ

Paul Wouters

11:09 p.m.

New subject: Sudden increase in RTT to a.root-servers.net

On Sat, 5 Nov 2011, Daniel AJ Sokolov (lists) wrote:

This is not urgent and probably not important.

I have noticed that with the turn of the month (31st-> 1st) the RTT from my probe (#1118) to a.root-servers.net has gone up a lot. It went from ~32ms to ~115 ms (with a max of 494 ms).

I checked the statistics of a few other public probes ("randomly" picked) and could not find that pattern repeated there.

So what does this tell me? Has my ISP downgraded one of their uplinks?

There is only three other probes here in Canada and they are not public, so I can't check their numbers.

Mine should be public? probe ID 223 in downtown Toronto. a.root-servers.net 198.41.0.4 86.274 ms / 86.633 ms / 87.010 ms 2011-11-05 21:59:19 UTC http://zpm00.atlas.ripe.net/atlas/rrd.png?prb_id=223&msm_id=1009&graph=tiny_... I'm using DSL with Teksavvy, and getting a-root from: $ dig +short +norec @a.root-servers.net hostname.bind chaos txt "ans13-lax2" Paul

Daniel AJ Sokolov (lists)

11:19 p.m.

New subject: Sudden increase in RTT to a.root-servers.net

On 05.11.2011 19:09 wrote Paul Wouters:

...
There is only three other probes here in Canada and they are not public, so I can't check their numbers.

Mine should be public? probe ID 223 in downtown Toronto.

Ah, so there are four other probes in Canada. On the map you are so close to #1180 that I didn't see your's. :-)

a.root-servers.net 198.41.0.4 86.274 ms / 86.633 ms / 87.010 ms 2011-11-05 21:59:19 UTC

http://zpm00.atlas.ripe.net/atlas/rrd.png?prb_id=223&msm_id=1009&graph=tiny_...

You are showing an increase of RRT for a.root-servers.net since the turn of the month as well, though it is probably not as strong as mine. As you had some really high spikes it is harder to see on your graphs. But the increase is there, from ~35 to ~77 ms.

I'm using DSL with Teksavvy, and getting a-root from:

$ dig +short +norec @a.root-servers.net hostname.bind chaos txt "ans13-lax2"

"ansXX-lax2" for me as well - the XX is a two digit number that changes with every dig. BR Daniel AJ

Rene Wilhelm

6 Nov 6 Nov

5:49 p.m.

New subject: Sudden increase in RTT to a.root-servers.net

On 11/5/11 9:38 PM, Daniel AJ Sokolov (lists) wrote:

Hi,

This is not urgent and probably not important.

I have noticed that with the turn of the month (31st-> 1st) the RTT from my probe (#1118) to a.root-servers.net has gone up a lot. It went from ~32ms to ~115 ms (with a max of 494 ms).

I checked the statistics of a few other public probes ("randomly" picked) and could not find that pattern repeated there.

So what does this tell me?Has my ISP downgraded one of their uplinks? Such sudden steps in RTT are usually an indication of a probe reaching a different, more distant instance of an anycasted server.

The Atlas RTT map (https://atlas.ripe.net/atlas/rtt_maps.html . selected measurement 'IPv4: a.root-servers.net') shows all probes in the North Eastern US & Canada now have rather high round trip times to a.root-servers.net, 75ms is the lowest found. This is unusual; with global instances of the a root server in Ashburn, Virginia and New York[*] you would expect to see much lower RTTs on at least a good majority of these probes. The public probe #338 (Culpeper, Virginia) sees ping times going up to 78ms at the same time yours increased to 115ms. [**] So it's not your ISP downgrading the service. More likely something related to routing, somehow hosts in the east of the US now end up in the west of the US when talking to a.root-servers.net. -- Rene [*] http://www.root-servers.org/ lists six global sites for a.root-servers.net: two in the east, two in the west of the US, one in Germany and one in Hongkong [**] http://zpm00.atlas.ripe.net/atlas/rrd.png?prb_id=338&msm_id=1009&type=weekly... **

There is only three other probes here in Canada and they are not public, so I can't check their numbers.

Maybe one of you has a few moments to enlighten me. I try to understand and learn from my probe's stats. :-)

TNX Daniel AJ

5256

Age (days ago)

5260

Last active (days ago)

Download

10 comments

6 participants

tags

participants (6)

Daniel AJ Sokolov (lists)
Paul Wouters
Philip Homburg
Rene Wilhelm
Robert Kisteleki
Wilfried Woeber, UniVie/ACOnet