Hi Grzegorz, I've encountered similar problems and have done some research in potential solutions. One of them is calculating how similar probes are, and then using this in probe selection (ie. use probes that are as dissimilar as possible). For your particular use case, the 15 software probes in Sao Paolo would likely be 100% similar to each other, which can be taken into account with probe selection (or in the data analysis if you already took measurements). I have some prototype code for this laying around, I will see if we can produce something useful out of this. We are currently also collaborating with some talented researchers on structurally looking into the bias of measurement infrastructures (see https://labs.ripe.net/author/pavlos_sermpezis/bias-in-internet-measurement-i... ). I hope some of the future results of that will also help with estimating and dealing with all kinds of measurement biases. kind regards, Emile Aben On 2022-03-31 20:15, Ponikierski, Grzegorz via ripe-atlas wrote:
Hello all!
Software probes is great improvement for Atlas environment and it made deployments so much easier and more scalable. One of my probes is also software probe on Raspberry Pi. However, me and my team noticed that sometimes software probes are overused and make measurements more cumbersome to interpret. For example we found that there 4 software probes in AWS Johannesburg [1] and 15 software probes in Hostinger Sao Paulo [2]. That's total overkill. These 15 probes in Hostinger Sao Paulo is 21% of all connected probes in Brazil. All of them delivers practically the same results so without additional data filtering (which is not easy) they can dramatically skew final results for Brazil.
With this I would like to open discussion how to handle this situation. Please share your thoughts about these questions:
* How can we standardize data filtering for such case? * How to proactively and automatically detect overuse of probes? * How can we encourage software probes hosts to avoid deployments in data centers which are already covered by Atlas? * How to give incentives to deploy probes in data centers without existing probes?
[1] 4 software probes inside AWS Johannesburg:
1000707 <https://atlas.ripe.net/frames/probes/1000707/>
1001397 <https://atlas.ripe.net/frames/probes/1001397/>
1002066 <https://atlas.ripe.net/frames/probes/1002066/>
1002632 <https://atlas.ripe.net/frames/probes/1002632/>
[2] 15 software probes inside Hostinger Sao Paulo
1003554 <https://atlas.ripe.net/frames/probes/1003554/>
1003555 <https://atlas.ripe.net/frames/probes/1003555/>
1003556 <https://atlas.ripe.net/frames/probes/1003556/>
1003557 <https://atlas.ripe.net/frames/probes/1003557/>
1003558 <https://atlas.ripe.net/frames/probes/1003558/>
1003559 <https://atlas.ripe.net/frames/probes/1003559/>
1003560 <https://atlas.ripe.net/frames/probes/1003560/>
1003561 <https://atlas.ripe.net/frames/probes/1003561/>
1003562 <https://atlas.ripe.net/frames/probes/1003562/>
1003563 <https://atlas.ripe.net/frames/probes/1003563/>
1003564 <https://atlas.ripe.net/frames/probes/1003564/>
1003565 <https://atlas.ripe.net/frames/probes/1003565/>
1003566 <https://atlas.ripe.net/frames/probes/1003566/>
1003567 <https://atlas.ripe.net/frames/probes/1003567/>
1003568 <https://atlas.ripe.net/frames/probes/1003568/>
Regards,
Grzegorz Ponikierski