Planned changes to "measurement archive"
Dear RIPE Atlas users, You may be aware that we publish "measurement metadata dumps" on "FTP" (https://ftp.ripe.net/ripe/atlas/measurements/) every week. These are meant to make the complete measurement metadata downloadable for those who are interested. Even though there is still a point in publishing our metadata in a downloadable format, the current method is not sustainable: as it is defined it'll grow forever - which is not realistic. Currently *every* dump contains *all* the measurements we *ever* had in the system. We plan to change that to "everything that's running now + everything that was running in the last two weeks". This has about 200k items currently. If we published this every day instead of once a week, we will still only produce 10% of the output we do today - by virtue of eliminating 90% of redundancy and repeated work. The syntax and semantics of the content will not change. We will keep the existing dumps as they are mostly because there's no reason to remove them. What this means for users is: * if you care about "current / recent" measurements, you can just use the last dump from yesterday * if you really need all (or a large part of) history, you can fetch one dump per 2 weeks or so and combine them * you can still use the API for random, finite searches We plan to implement the above change just after RIPE 81 (i.e. early November). Please be aware of this change if you rely on these dumps. Regards, Robert
Dear all, This change has been implemented - the measurement archive is created daily now, with reduced content, as described below. Regards, Robert On 2020-09-09 11:48, Robert Kisteleki wrote:
Dear RIPE Atlas users,
You may be aware that we publish "measurement metadata dumps" on "FTP" (https://ftp.ripe.net/ripe/atlas/measurements/) every week. These are meant to make the complete measurement metadata downloadable for those who are interested.
Even though there is still a point in publishing our metadata in a downloadable format, the current method is not sustainable: as it is defined it'll grow forever - which is not realistic.
Currently *every* dump contains *all* the measurements we *ever* had in the system. We plan to change that to "everything that's running now + everything that was running in the last two weeks". This has about 200k items currently. If we published this every day instead of once a week, we will still only produce 10% of the output we do today - by virtue of eliminating 90% of redundancy and repeated work. The syntax and semantics of the content will not change.
We will keep the existing dumps as they are mostly because there's no reason to remove them.
What this means for users is: * if you care about "current / recent" measurements, you can just use the last dump from yesterday * if you really need all (or a large part of) history, you can fetch one dump per 2 weeks or so and combine them * you can still use the API for random, finite searches
We plan to implement the above change just after RIPE 81 (i.e. early November). Please be aware of this change if you rely on these dumps.
Regards, Robert
participants (1)
-
Robert Kisteleki