Overuse of software probes
Hello all! Software probes is great improvement for Atlas environment and it made deployments so much easier and more scalable. One of my probes is also software probe on Raspberry Pi. However, me and my team noticed that sometimes software probes are overused and make measurements more cumbersome to interpret. For example we found that there 4 software probes in AWS Johannesburg [1] and 15 software probes in Hostinger Sao Paulo [2]. That's total overkill. These 15 probes in Hostinger Sao Paulo is 21% of all connected probes in Brazil. All of them delivers practically the same results so without additional data filtering (which is not easy) they can dramatically skew final results for Brazil. With this I would like to open discussion how to handle this situation. Please share your thoughts about these questions: * How can we standardize data filtering for such case? * How to proactively and automatically detect overuse of probes? * How can we encourage software probes hosts to avoid deployments in data centers which are already covered by Atlas? * How to give incentives to deploy probes in data centers without existing probes? [1] 4 software probes inside AWS Johannesburg: 1000707<https://atlas.ripe.net/frames/probes/1000707/> 1001397<https://atlas.ripe.net/frames/probes/1001397/> 1002066<https://atlas.ripe.net/frames/probes/1002066/> 1002632<https://atlas.ripe.net/frames/probes/1002632/> [2] 15 software probes inside Hostinger Sao Paulo 1003554<https://atlas.ripe.net/frames/probes/1003554/> 1003555<https://atlas.ripe.net/frames/probes/1003555/> 1003556<https://atlas.ripe.net/frames/probes/1003556/> 1003557<https://atlas.ripe.net/frames/probes/1003557/> 1003558<https://atlas.ripe.net/frames/probes/1003558/> 1003559<https://atlas.ripe.net/frames/probes/1003559/> 1003560<https://atlas.ripe.net/frames/probes/1003560/> 1003561<https://atlas.ripe.net/frames/probes/1003561/> 1003562<https://atlas.ripe.net/frames/probes/1003562/> 1003563<https://atlas.ripe.net/frames/probes/1003563/> 1003564<https://atlas.ripe.net/frames/probes/1003564/> 1003565<https://atlas.ripe.net/frames/probes/1003565/> 1003566<https://atlas.ripe.net/frames/probes/1003566/> 1003567<https://atlas.ripe.net/frames/probes/1003567/> 1003568<https://atlas.ripe.net/frames/probes/1003568/> Regards, Grzegorz Ponikierski
Of course I am wildly guessing here, but the 15 probes in Hostinger Sao Paulo actually look a bit fishy to me. Consecutive probe numbers, all created at roughly the same time, all in the same v4 /24 and v6 /64. To me this looks like an "Atlas credit mining farm" … so more a mis-use than an overuse. The 4 probes in Johannesburg are different. Same ASN but at least different /24s and different ages, so likely also different owners. Just my 2¢ Cheers -Andi
On 20:43 31/03, Andreas Härpfer wrote:
Of course I am wildly guessing here, but the 15 probes in Hostinger Sao Paulo actually look a bit fishy to me. Consecutive probe numbers, all created at roughly the same time, all in the same v4 /24 and v6 /64. To me this looks like an "Atlas credit mining farm" … so more a mis-use than an overuse.
So right there is the disincentive of having probes in the same ASN: they accumulate progressively fewer credits, until they reach zero. Hugo
There are a number of factors that can influence the performance and results of a software probe within the same ASN, within the context of an ASN that is a last mile network operator... I would say that the community is somewhat reliant upon people self-reporting the general street address and physical location of the probe correctly. Within the same ASN in a regional last mile broadband network there might be probes on DOCSIS3 last mile technology, GPON, active ethernet, in datacenter space/colocation, etc. My probe which is singlehomed to a spacex starlink v1 beta terminal for a while was in the same Google ASN as a bunch of terrestrial 10/100GbE fiber stuff but its results were obviously quite different. On Thu, 31 Mar 2022 at 12:28, Hugo Salgado <hsalgado@nic.cl> wrote:
On 20:43 31/03, Andreas Härpfer wrote:
Of course I am wildly guessing here, but the 15 probes in Hostinger Sao Paulo actually look a bit fishy to me. Consecutive probe numbers, all created at roughly the same time, all in the same v4 /24 and v6 /64. To me this looks like an "Atlas credit mining farm" … so more a mis-use than an overuse.
So right there is the disincentive of having probes in the same ASN: they accumulate progressively fewer credits, until they reach zero.
Hugo
-- ripe-atlas mailing list ripe-atlas@ripe.net https://lists.ripe.net/mailman/listinfo/ripe-atlas
On 31. Mar 2022, at 21:28, Hugo Salgado <hsalgado@nic.cl> wrote:
So right there is the disincentive of having probes in the same ASN: they accumulate progressively fewer credits, until they reach zero.
If you also somehow take ASN size into account … 100 probes in the ASN of Deutsche Telekom make a lot more sense than 15 probes within the same /24. BTW, a hard limit of one probe per /24 e.g. also sounds like a useful limitation to me. -Andi
On 31.03.2022 23:38, Andreas Härpfer wrote:
BTW, a hard limit of one probe per /24 e.g. also sounds like a useful limitation to me.
I think ISPs utilizing CGNAT in multiple regions with exit IPs in the same /24 may create an issue. - Ilteris
BTW, a hard limit of one probe per /24 e.g. also sounds like a useful limitation to me.
i have one hard and one soft on the same /24 lan so we can compare them. which would you like me to remove? as the elected legislators in my country are doing so scarily horribly, i am cheered that so many senior engineers are willing to fill the gap :) randy
Hello All, I think we can separate out two issues here: 1. I believe we should keep incentivising (for the lack of a better word) "benevolent" probe hosts - ie. if someone would like to host a probe, either hw or sw, then they should be able to do so and enjoy the benefits. Having said this, with the hw probe distribution we prioritise yet-uncovered networks. This is much harder to do with sw probes... 2. We should dis-incentivise "farming", if that happens. The issue is that we can only recognise this once it happens. Coming up with hard policies on how to regulate this in advance is not easy, but one can come up with some guidelines on how to recognise such cases and what to do with them when they are found. There's of course a benefit in having multiple probes in a network, and presumably the bigger said network is the more probes it could have. Surely constructs like CGNATs make all kinds of corner cases... I'm very happy to see proposals on how to improve the built-in probe selection process. We'll evaluate these and of course implement ones that make sense within the system. At the end of the day there will always be cases that can only be done "manually", ie. the user selecting particular probes by whatever criteria they want to use. Cheers, Robert On 2022-03-31 20:43, Andreas Härpfer wrote:
Of course I am wildly guessing here, but the 15 probes in Hostinger Sao Paulo actually look a bit fishy to me. Consecutive probe numbers, all created at roughly the same time, all in the same v4 /24 and v6 /64. To me this looks like an "Atlas credit mining farm" … so more a mis-use than an overuse.
The 4 probes in Johannesburg are different. Same ASN but at least different /24s and different ages, so likely also different owners.
Just my 2¢
Cheers -Andi
On Fri, 1 Apr 2022 at 02:43, Andreas Härpfer <ah@v6x.org> wrote:
Of course I am wildly guessing here, but the 15 probes in Hostinger Sao Paulo actually look a bit fishy to me. Consecutive probe numbers, all created at roughly the same time, all in the same v4 /24 and v6 /64. To me this looks like an "Atlas credit mining farm" … so more a mis-use than an overuse.
I am a bit confused. Why would someone want to mine Atlas credits? If you need a million, or ten, just ask, and people send you hundreds of millions in minutes. By the time I see the message, the demand has been oversubscribed, and I lose out on all the good karma of donating. Or is there a way to convert these credits into crypto tokens, in which case please do tell me. What is the problem that we are solving? -- Sanjeev Gupta +65 98551208 http://www.linkedin.com/in/ghane
On 4. Apr 2022, at 14:04, Sanjeev Gupta <ghane0@gmail.com> wrote:
I am a bit confused. Why would someone want to mine Atlas credits? If you need a million, or ten, just ask, and people send you hundreds of millions in minutes. By the time I see the message, the demand has been oversubscribed, and I lose out on all the good karma of donating.
[…] I totally agree. As I said, I was only guessing, but I couldn't come up with a better explanation why someone would want to have 15 probes within the same /24. If I am missing out on other cool things that can only be done with this kind of setup, I'd be happy to learn :-) -Andi
Hi, On Mon, Apr 04, 2022 at 02:30:19PM +0200, Andreas Härpfer wrote:
I totally agree. As I said, I was only guessing, but I couldn't come up with a better explanation why someone would want to have 15 probes within the same /24. If I am missing out on other cool things that can only be done with this kind of setup, I'd be happy to learn :-)
They might use the probes to do in-AS measurements... Historic anecdote: at some time, DECIX had multiple TTM boxes ("back then") to monitor their fabric across different locations in Frankfurt. So this might be something similar. Or not, maybe just a student competition "who can get more probes into Atlas" :-) Gert Doering -- NetMaster -- have you enabled IPv6 on something today...? SpaceNet AG Vorstand: Sebastian v. Bomhard, Michael Emmer Joseph-Dollinger-Bogen 14 Aufsichtsratsvors.: A. Grundner-Culemann D-80807 Muenchen HRB: 136055 (AG Muenchen) Tel: +49 (0)89/32356-444 USt-IdNr.: DE813185279
IMHO it’s most likely credit farming. I checked other Hostinger probes and they have repeating pattern: deploy 11-15 software probes in the same subnet, behind the same router, in the same geo location, with incrementing IP and probe ID (mass production of probes). They done it in BR, US, NL, GB, LT, IN, SG and ID (8 countries, 11 cities) with 116 software probes. What they achieved with that? They don’t have to ask for credits because they have 116 software probes to generate tons of them. Alternatively it’s some kind of internal monitoring but used IPs don’t support this guess. Can RIPE Atlas contact Hostinger to explain what are they doing with all these probes? Emile Aben, I would be happy to see code for detecting overly similar probes. It would save a lot of time spend on data filtering. Regards, Grzegorz From: Gert Doering <gert@space.net> Date: Monday 2022-04-04 at 15:51 To: Andreas Härpfer <ah@v6x.org> Cc: RIPE Atlas <ripe-atlas@ripe.net> Subject: Re: [atlas] Overuse of software probes Hi, On Mon, Apr 04, 2022 at 02:30:19PM +0200, Andreas Härpfer wrote: I totally agree. As I said, I was only guessing, but I couldn't come up with a better explanation why someone would want to have 15 probes within the same /24. If I am missing out on other cool things that can only be done with this kind of setup, I'd be happy to learn :-) They might use the probes to do in-AS measurements... Historic anecdote: at some time, DECIX had multiple TTM boxes ("back then") to monitor their fabric across different locations in Frankfurt. So this might be something similar. Or not, maybe just a student competition "who can get more probes into Atlas" :-) Gert Doering -- NetMaster -- have you enabled IPv6 on something today...? SpaceNet AG Vorstand: Sebastian v. Bomhard, Michael Emmer Joseph-Dollinger-Bogen 14 Aufsichtsratsvors.: A. Grundner-Culemann D-80807 Muenchen HRB: 136055 (AG Muenchen) Tel: +49 (0)89/32356-444 USt-IdNr.: DE813185279
IMHO it’s most likely credit farming.
which is amusing, as a note on this list or nanog asking for credits usually requires a pickup truck.
Emile Aben, I would be happy to see code for detecting overly similar probes. It would save a lot of time spend on data filtering.
"Metis: Better Atlas Vantage Point Selection for Everyone" Malte Appel (Internet Initiative Japan), Emile Aben (RIPE NCC), and Romain Fontugne (Internet Initiative Japan) at TMA thursday randy
Looking forward to reading Emile’s paper, but in the meantime: Nick Kernan, a graduate student of mine, wrote a python script for selecting a geographically diverse set of probes from a list of probes. The script implements a simple greedy algorithm, which we came up with after playing with several alternatives, and which we found to work very well in practice. Nick’s script also adds an option to add ASN diversity by specifying a limit to be attempted on the number of probes selected from the same ASN during the greedy selection. See https://github.com/nicholaskernan/probe-filters <https://github.com/nicholaskernan/probe-filters>. Feel free to use (attribution would be appreciated). And the RIPE folks, if you find this generally useful, feel free to incorporate it into the API. —Misha
On Jun 27, 2022, at 4:02 PM, Randy Bush <randy@psg.com> wrote:
IMHO it’s most likely credit farming.
which is amusing, as a note on this list or nanog asking for credits usually requires a pickup truck.
Emile Aben, I would be happy to see code for detecting overly similar probes. It would save a lot of time spend on data filtering.
"Metis: Better Atlas Vantage Point Selection for Everyone"
Malte Appel (Internet Initiative Japan), Emile Aben (RIPE NCC), and Romain Fontugne (Internet Initiative Japan)
at TMA thursday
randy
-- ripe-atlas mailing list ripe-atlas@ripe.net https://lists.ripe.net/mailman/listinfo/ripe-atlas
On 6/29/22 00:23, Michael Rabinovich wrote:
Looking forward to reading Emile’s paper, but in the meantime: Nick Kernan, a graduate student of mine, wrote a python script for selecting a geographically diverse set of probes from a list of probes.
The paper describes a similar approach, but using topological distances (e.g. AS path length, RTT). It is not perfect but more useful than Atlas' world-wide probe selection. Results are weekly updated here: https://ihr.iijlab.net/ihr/en-us/metis/selection We've also extended this approach to find places where deploying new Atlas probes would add more diversity to Atlas: https://ihr.iijlab.net/ihr/en-us/metis/deployment The paper is now available: https://tma.ifip.org/2022/wp-content/uploads/sites/11/2022/06/tma2022-paper1... Thanks, Romain
Thanks Randy, Michael, Emile and Romain for your data. I will have to allocate some time to go through it. I think it’s also useful to understand better my specific needs. I usually do anycast measurements per country and I use all probes from the country (yes, I’m even able to use 1000+ probes from DE). I don’t really care about geographical distribution because after all I use all probes in the country. My point is to see if my work goes in good direction and to catch all routing anomalies. That’s why my only concern is to filter out probes which are: 1. Definitely not from this country. I regularly report to RIPE team probes which for example are assigned to DE but in reality they are located in GB. I usually catch them when I see that they are routed to anycast node in another country and don’t see domestic hops in traceroutes. Such probes gets system-geoloc-disputed tag so other users can filter out these probes more easily. 2. Definitely duplicates of already existing probes (in the same location, in the same AS like in case of Hostinger). They are example that more data is not always good because they skew cumulative results and lead to over representative of given location and ASN. I hope that data that you shared will help me to make this filtering easier. Regards, Grzegorz From: Romain Fontugne via ripe-atlas <ripe-atlas@ripe.net> Reply to: Romain Fontugne <romain@iij.ad.jp> Date: Thursday 2022-06-30 at 07:49 To: Michael Rabinovich <michael.rabinovich@case.edu>, Randy Bush <randy@psg.com> Cc: "Ponikierski, Grzegorz via ripe-atlas" <ripe-atlas@ripe.net>, Nicholas Kernan <nlk39@case.edu>, Emile Aben <emile.aben@ripe.net> Subject: Re: [atlas] Overuse of software probes On 6/29/22 00:23, Michael Rabinovich wrote: Looking forward to reading Emile’s paper, but in the meantime: Nick Kernan, a graduate student of mine, wrote a python script for selecting a geographically diverse set of probes from a list of probes. The paper describes a similar approach, but using topological distances (e.g. AS path length, RTT). It is not perfect but more useful than Atlas' world-wide probe selection. Results are weekly updated here: https://ihr.iijlab.net/ihr/en-us/metis/selection We've also extended this approach to find places where deploying new Atlas probes would add more diversity to Atlas: https://ihr.iijlab.net/ihr/en-us/metis/deployment The paper is now available: https://tma.ifip.org/2022/wp-content/uploads/sites/11/2022/06/tma2022-paper1... Thanks, Romain
grzegorz: your work sounds fun, interesting, and useful. but your message pushed one of my minor buttons. just some food for thought, not a critique of any of these efforts. i really like them. maybe it is because i am more routing/forwarding oriented; but i think more about topology more than i think of geography or political realms. e.g. for an example look at the ipv6 peering poolpah in asia. and one more level down, i am not entirely comfortable with AS topology. there are teensie local ASs, large global ASs, and many flavors in between. contrawise, when i discussed this with romain a while back, i had to admit that there are sooooo many small ASs that the law of large numbers is on his side. just food for thought. randy
Emile Aben, I would be happy to see code for detecting overly similar probes. It would save a lot of time spend on data filtering.
These are snapshots of data for probe similarity detection [1] in IPv4 and IPv6 https://sg-pub.ripe.net/emile/probe-similarity/probe_similarity_ipv4-2022-06... https://sg-pub.ripe.net/emile/probe-similarity/probe_similarity_ipv6-2022-06... This calculates 3 similarity values between 0 and 1 (the last 3 values in the csv). Pick the middle one if you don't care about the details. There are 54k probe-pairs with similarity values over 0.5, 7.4k with value over 0.95 6.7k with value over 0.99 My gut feeling is that anything over 0.95 is likely very redundant for many types of measurements. There seems to be a cluster of about 400 probes that are very similar to each other, and a couple of smaller clusters too. Happy to work with you and others to see if we can make this into something that is operationally valuable. kind regards, Emile Aben RIPE NCC [1] Holterbach, Thomas, et al. "Measurement vantage point selection using a similarity metric." Proceedings of the Applied Networking Research Workshop. 2017. https://trac.ietf.org/trac/irtf/export/478/www/content/anrw/2017/anrw17-fina...
On 31. 03. 22 20:15, Ponikierski, Grzegorz via ripe-atlas wrote:
Software probes is great improvement for Atlas environment and it made deployments so much easier and more scalable. One of my probes is also software probe on Raspberry Pi. However, me and my team noticed that sometimes software probes are overused and make measurements more cumbersome to interpret. For example we found that there 4 software probes in AWS Johannesburg [1] and 15 software probes in Hostinger Sao Paulo [2]. That's total overkill. These 15 probes in Hostinger Sao Paulo is 21% of all connected probes in Brazil. All of them delivers practically the same results so without additional data filtering (which is not easy) they can dramatically skew final results for Brazil.
With this I would like to open discussion how to handle this situation. Please share your thoughts about these questions:
* How can we standardize data filtering for such case?
I'm not sure what data filtering you have in mind, but I know for sure that richer _probe selection_ filters would help for my use-cases. First, the current probe selection options I'm aware of are: (web wizard) - Geo name - ASN # filter - IP filter - Probe # filter (web "manual" selection) - Type (mandatory) - Area (mandatory) - Number of probes (mandatory) - Include tags - Exclude tags Proposal for new filters options: - Spread selection evenly across geo locations, max. N probes location - Spread selection evenly across ASNs, max. N probes per ASN - Spread selection across IP subnets, max. N probes per IP subnet I imagine that these three should work as intersection with the other filter (and themselves). I.e. it should allow to specify: - location = BR - max 2 probes per ASN - max 1 probe per subnet Right now I'm trying to do that by manually selecting probe IDs when I need to, but obviously that does not scale. Thank you for considering this. -- Petr Špaček @ Internet Systems Consortium
Hi Grzegorz, I've encountered similar problems and have done some research in potential solutions. One of them is calculating how similar probes are, and then using this in probe selection (ie. use probes that are as dissimilar as possible). For your particular use case, the 15 software probes in Sao Paolo would likely be 100% similar to each other, which can be taken into account with probe selection (or in the data analysis if you already took measurements). I have some prototype code for this laying around, I will see if we can produce something useful out of this. We are currently also collaborating with some talented researchers on structurally looking into the bias of measurement infrastructures (see https://labs.ripe.net/author/pavlos_sermpezis/bias-in-internet-measurement-i... ). I hope some of the future results of that will also help with estimating and dealing with all kinds of measurement biases. kind regards, Emile Aben On 2022-03-31 20:15, Ponikierski, Grzegorz via ripe-atlas wrote:
Hello all!
Software probes is great improvement for Atlas environment and it made deployments so much easier and more scalable. One of my probes is also software probe on Raspberry Pi. However, me and my team noticed that sometimes software probes are overused and make measurements more cumbersome to interpret. For example we found that there 4 software probes in AWS Johannesburg [1] and 15 software probes in Hostinger Sao Paulo [2]. That's total overkill. These 15 probes in Hostinger Sao Paulo is 21% of all connected probes in Brazil. All of them delivers practically the same results so without additional data filtering (which is not easy) they can dramatically skew final results for Brazil.
With this I would like to open discussion how to handle this situation. Please share your thoughts about these questions:
* How can we standardize data filtering for such case? * How to proactively and automatically detect overuse of probes? * How can we encourage software probes hosts to avoid deployments in data centers which are already covered by Atlas? * How to give incentives to deploy probes in data centers without existing probes?
[1] 4 software probes inside AWS Johannesburg:
1000707 <https://atlas.ripe.net/frames/probes/1000707/>
1001397 <https://atlas.ripe.net/frames/probes/1001397/>
1002066 <https://atlas.ripe.net/frames/probes/1002066/>
1002632 <https://atlas.ripe.net/frames/probes/1002632/>
[2] 15 software probes inside Hostinger Sao Paulo
1003554 <https://atlas.ripe.net/frames/probes/1003554/>
1003555 <https://atlas.ripe.net/frames/probes/1003555/>
1003556 <https://atlas.ripe.net/frames/probes/1003556/>
1003557 <https://atlas.ripe.net/frames/probes/1003557/>
1003558 <https://atlas.ripe.net/frames/probes/1003558/>
1003559 <https://atlas.ripe.net/frames/probes/1003559/>
1003560 <https://atlas.ripe.net/frames/probes/1003560/>
1003561 <https://atlas.ripe.net/frames/probes/1003561/>
1003562 <https://atlas.ripe.net/frames/probes/1003562/>
1003563 <https://atlas.ripe.net/frames/probes/1003563/>
1003564 <https://atlas.ripe.net/frames/probes/1003564/>
1003565 <https://atlas.ripe.net/frames/probes/1003565/>
1003566 <https://atlas.ripe.net/frames/probes/1003566/>
1003567 <https://atlas.ripe.net/frames/probes/1003567/>
1003568 <https://atlas.ripe.net/frames/probes/1003568/>
Regards,
Grzegorz Ponikierski
participants (13)
-
Andreas Härpfer
-
Emile Aben
-
Eric Kuhnke
-
Gert Doering
-
Hugo Salgado
-
İlteriş Yağıztegin Eroğlu
-
Michael Rabinovich
-
Petr Špaček
-
Ponikierski, Grzegorz
-
Randy Bush
-
Robert Kisteleki
-
Romain Fontugne
-
Sanjeev Gupta