Inconsistencies in anchor API and measurements

15 Nov 2021

      Hi everyone,

sorry in advance for the long mail.
tl;dr: Anchor API and UI give inconsistent results. Some anchor mesh
measurements could be fixed, some might target non-anchors. Not sure what to do
about it.

I am currently working a lot with RIPE Atlas and recently wanted to use the
anchors and their mesh measurements in particular. I wanted to answer two simple
queries:

   1. Get a list of all active anchors
   2. Get a list of all active anchor mesh and probe measurements (traceroute,
      for my particular use case)

However, while trying to answer these queries, I stumbled upon quite some
inconsistencies, depending on which interface / API is used. As far as I can
tell there are four ways in which one could technically answer query 1:

   1. Look at the website: https://atlas.ripe.net/anchors/list/
      -> 840 results
   2. Query the /anchors API with attribute is_disabled = false:
      https://atlas.ripe.net/api/v2/anchors/?is_disabled=false
      -> 844 results
   3. Query the /probes API with attributes is_anchor = true and status = 1
      (Connected):
      https://atlas.ripe.net/api/v2/probes/?is_anchor=True&status=1
      -> 729 results
   4. Query the /probes API with attributes tags = system-anchor and status = 1
      (Connected):
      https://atlas.ripe.net/api/v2/probes/?tags=system-anchor&status=1
      -> 729 results

Methods 3 and 4 are actually consistent!

The main discrepancy is between the /anchors and /probes API: 116 anchors that
are listed on the webpage and/or /anchors API as ‘active’ are inactive (97
abandoned; 19 disconnected at time of writing) according to their respective
/probes entry.

I understand that disconnects might be temporary, but some anchors seem to be
inactive for years (at least according to their status) and are still listed as
active.

I have attached a text file with some notes that go deeper into the differences,
but might be hard to read.

For query 2, I faced a similar situation:

   1. Look at the website: https://atlas.ripe.net/anchors/list/full/
       Anchoring Mesh IPv4:  840
       Anchoring Mesh IPv6:  739
     Anchoring Probes IPv4:  840
     Anchoring Probes IPv6:  705
                     Total: 3124
   2. Query the /anchor-measurements API
       Anchoring Mesh IPv4:  849
       Anchoring Mesh IPv6:  743
     Anchoring Probes IPv4:  902
     Anchoring Probes IPv6:  753
                     Total: 3247
   3. Query the /measurements API with attributes status = 2 (Ongoing),
      type = traceroute and corresponding attributes:
      Anchoring Mesh IPv4: af = 4; tags=anchoring,mesh
      Anchoring Mesh IPv6: af = 6; tags=anchoring,mesh
      Anchoring Probes IPv4: af = 4; tags=anchoring,probes
      Anchoring Probes IPv6: af = 6; tags=anchoring,probes
      For example:
        https://atlas.ripe.net/api/v2/measurements/?status=2&type=traceroute&af=6&ta...
       Anchoring Mesh IPv4: 1026
       Anchoring Mesh IPv6:  902
     Anchoring Probes IPv4:  668
     Anchoring Probes IPv6:  619
                     Total: 3215

These results are even more mixed:

   - Tags can be inconsistent: Some measurements have none, some have the
     ‘probes’ or ’mesh’ tag, but miss the ’anchoring’ tag.
   - Some anchors have multiple measurements (especially probes measurements), of
     which most actually are run by the same set of probes, i.e., they are
     duplicates.
   - Which measurements are contained in which of the three result sets is very
     mixed, maybe I should draw a Venn diagram :)

Finally, I looked at the consistency of the IP addresses of the /anchors API
(ip_v4 and ip_v6), the /probes API (address_v4, address_v6), the DNS result for
the FQDN of the anchors, and the target IP of the mesh/probes measurements.

I noticed some problems, since our lab (IIJ) also operates an anchor (probe 6425
[0]) and we updated the IP address some time ago, but are actually not reached
by the mesh measurement, because the measurement still targets the old IP.

I attached a CSV that includes the raw data (of measurements with some form of
problem), but basically there are 93 measurements from connected anchors that
fail, and out of which 68 (from 29 anchors) could work, if the measurement would
target the correct IP. These measurements have matching anchor/probe IPs and DNS
records, so I do not know why the measurement target is stale.  There are some
additional measurements that could work, but it is unclear what the intended
‘correct’ IP is.

On that note, there are 48 measurements that ‘work‘, i.e., they get a response
from the target, but it is not clear if the target is the intended receiver:
   - 8 target abandoned anchors
   - 18 have different probe and anchor IPs and target one of them
   - 21 have the same probe and anchor IP but target something else

Again, I am sorry for this long mail. I understand that RIPE Atlas is a huge
project that has grown over time so it might be hard to keep some things
synchronized, and some other things might not be easily decidable (e.g., when to
mark an anchor is inactive).
However, I think especially the IP address of an anchor in the /anchors and
/probes APIs, in the DNS entry, and the target of the mesh/probes measurements
need to be consistent. Currently some mesh measurements might target an entirely
different machine.

I wanted to bring some attention to this, but not sure what else I can do as a
user. I don‘t want to complain too much :)
For now I will just use all data sources as input and apply some sanity checks.

Best,
Malte

P.S.: Some feedback on how we can bring the measurement of our anchor to target
our anchor would be nice though.

[0] https://atlas.ripe.net/probes/6425

Malte Appel

Johan ter Beest

Malte Appel

tags

participants (2)