Hello everyone,
Apologies for the long post in advance!
I'm a long time RIS data user, and I have a couple of suggestions related to the RIS data retention topic that Robert presented yesterday.
First is about the usefulness of keeping multiple daily snapshots of the peer RIBs of decades ago.
I agree that having 3x daily snapshots is useful to take a quick look at routing tables and is very simple to use. However, I would like
to point out that it is possible to recreate the RIB of each peer at any time starting from any RIB snapshot and applying the content of the UPDATE files collected by RIS between the RIB snapshot creation and the desired time. For example, if I want to see
the RIB status of rrc00 at 04:00UTC, I can take the RIB snapshot taken at midnight, evolve that with all the UPDATE files from midnight to 04:00UTC and enjoy the results.
Said that, I think that a possibility to save some data could be to get rid of 2 of the 3 daily snapshots for older months of RIS. RIS
could keep the last years' RIBs as they are now, and remove the RIBs taken at 08:00 and 16:00 for anything older than that - keeping the 00:00. Taking into analysis the month of October 2023 for rrc00, RIBs took 38.8GB while UPDATEs took 45GB. Of course, different
collectors have different peers and a different traffic of BGP updates being recorded. Still, cutting the RIBs to one third would give a good saving in data.
Second, is about compression. I understand that RIS is leveraging on the collecting software to create gz files, but probably it would
be worth to consider to switch to some compressing technique able to compress data more - at least for older data.
I know RouteViews is using bz2 already, that could be a good choice if the collecting software already handle that. Every MRT reader is
capable of handling bz2 files. However, I found xz extremely performing on top of MRT files - even though only a few MRT readers are capable of reading that.
As an exercise, I took bview.20231129.0000.gz of rrc00. The size of the file is 406MB, which becomes 4.1GB uncompressed. If I were to
bzip2 the uncompressed file, I would have a bview.20231129.0000.bz2 of 242MB. If I were to xz the uncompressed file, I would have a bview.20231129.0000.xz of 160MB.
There may be other compression tools that are even
more efficient on MRT data out there. I think a little study on the effectiveness of the different compressing technique should be performed before taking any decision - if you want to follow this route.
Apologies once again for the long post!
Alessandro
|
|
|
|
Alessandro Improta
|
Engineering manager
|
|
p. +393488077654
|
|
|
|
a. Via
Aurelia Sud km 367, Pietrasanta (LU)
|
|
|
|
|
|
|
|
|
|
From: mat-wg <mat-wg-bounces@ripe.net> on behalf of Robert Kisteleki <robert@ripe.net>
Sent: Wednesday, November 22, 2023 5:43 PM
To: Measurement Analysis and Tools Working Group <mat-wg@ripe.net>
Subject: [mat-wg] RIPE NCC measurement data retention