Re: [atlas] Overuse of software probes

1 Apr 2022

      On 31. 03. 22 20:15, Ponikierski, Grzegorz via ripe-atlas wrote:
...
Software probes is great improvement for Atlas environment and it made 
deployments so much easier and more scalable. One of my probes is also 
software probe on Raspberry Pi. However, me and my team noticed that 
sometimes software probes are overused and make measurements more 
cumbersome to interpret. For example we found that there 4 software 
probes in AWS Johannesburg [1] and 15 software probes in Hostinger Sao 
Paulo [2]. That's total overkill. These 15 probes in Hostinger Sao Paulo 
is 21% of all connected probes in Brazil. All of them delivers 
practically the same results so without additional data filtering (which 
is not easy) they can dramatically skew final results for Brazil.
With this I would like to open discussion how to handle this situation. 
Please share your thoughts about these questions:
* How can we standardize data filtering for such case?
I'm not sure what data filtering you have in mind, but I know for sure 
that richer _probe selection_ filters would help for my use-cases.

First, the current probe selection options I'm aware of are:

(web wizard)
- Geo name
- ASN # filter
- IP filter
- Probe # filter

(web "manual" selection)
- Type (mandatory)
- Area (mandatory)
- Number of probes (mandatory)
- Include tags
- Exclude tags

Proposal for new filters options:
- Spread selection evenly across geo locations, max. N probes location
- Spread selection evenly across ASNs, max. N probes per ASN
- Spread selection across IP subnets, max. N probes per IP subnet

I imagine that these three should work as intersection with the other 
filter (and themselves). I.e. it should allow to specify:
- location = BR
- max 2 probes per ASN
- max 1 probe per subnet

Right now I'm trying to do that by manually selecting probe IDs when I 
need to, but obviously that does not scale.

Thank you for considering this.

-- 
Petr Špaček  @  Internet Systems Consortium