On 31. 03. 22 20:15, Ponikierski, Grzegorz via ripe-atlas wrote:
Software probes is great improvement for Atlas environment and it made deployments so much easier and more scalable. One of my probes is also software probe on Raspberry Pi. However, me and my team noticed that sometimes software probes are overused and make measurements more cumbersome to interpret. For example we found that there 4 software probes in AWS Johannesburg [1] and 15 software probes in Hostinger Sao Paulo [2]. That's total overkill. These 15 probes in Hostinger Sao Paulo is 21% of all connected probes in Brazil. All of them delivers practically the same results so without additional data filtering (which is not easy) they can dramatically skew final results for Brazil.
With this I would like to open discussion how to handle this situation. Please share your thoughts about these questions:
* How can we standardize data filtering for such case?
I'm not sure what data filtering you have in mind, but I know for sure that richer _probe selection_ filters would help for my use-cases. First, the current probe selection options I'm aware of are: (web wizard) - Geo name - ASN # filter - IP filter - Probe # filter (web "manual" selection) - Type (mandatory) - Area (mandatory) - Number of probes (mandatory) - Include tags - Exclude tags Proposal for new filters options: - Spread selection evenly across geo locations, max. N probes location - Spread selection evenly across ASNs, max. N probes per ASN - Spread selection across IP subnets, max. N probes per IP subnet I imagine that these three should work as intersection with the other filter (and themselves). I.e. it should allow to specify: - location = BR - max 2 probes per ASN - max 1 probe per subnet Right now I'm trying to do that by manually selecting probe IDs when I need to, but obviously that does not scale. Thank you for considering this. -- Petr Špaček @ Internet Systems Consortium