Re: [atlas] Overuse of software probes

29 Jun 2022

      ...
Emile Aben, I would be happy to see code for detecting overly similar 
probes. It would save a lot of time spend on data filtering.
These are snapshots of data for probe similarity detection [1] in IPv4 
and IPv6

https://sg-pub.ripe.net/emile/probe-similarity/probe_similarity_ipv4-2022-06...
https://sg-pub.ripe.net/emile/probe-similarity/probe_similarity_ipv6-2022-06...

This calculates 3 similarity values between 0 and 1 (the last 3 values 
in the csv). Pick the middle one if you don't care about the details.

There are 54k probe-pairs with similarity values over 0.5,
7.4k with value over 0.95
6.7k with value over 0.99

My gut feeling is that anything over 0.95 is likely very redundant for 
many types of measurements.

There seems to be a cluster of about 400 probes that are very similar to 
each other, and a couple of smaller clusters too.

Happy to work with you and others to see if we can make this into 
something that is operationally valuable.

kind regards,
Emile Aben
RIPE NCC

[1] Holterbach, Thomas, et al. "Measurement vantage point selection 
using a similarity metric." Proceedings of the Applied Networking 
Research Workshop. 2017.
https://trac.ietf.org/trac/irtf/export/478/www/content/anrw/2017/anrw17-fina...