Update on improved rsync repository publication
Dear colleagues, Over the last weeks, as mentioned in a previous email [0], we have been exploring two approaches to improve our rsync publication process in parallel. The first approach (“batched publication”) changes the RPKI system to atomically publish the complete set of files instead of writing them at multiple places in the code. For rsync, after all files have been written to a new directory, the repository is switched to the new content by atomically updating a symlink. The repository is updated every 15 minutes. We plan to deploy this soon during a scheduled downtime. During this downtime the repositories will be available, but managing a certification authority will not be possible. The second approach uses RRDP as a source of truth and writes the repository to a local disk (instead of to NFS) with krill-sync (which also uses a symlink as the base for the current rsync module). We have an experimental rsync environment based on the second approach available for testing. The environment contains the production RPKI repository. Please note that this environment is experimental and thus is not considered to be a critical service. Do not rely on this service for production networks. We may cause downtime on this service to adjust the configuration. However, when services are available, the content should be valid and contain the same objects as the production RRDP repository. It updates from RRDP every minute. The non-repeatable reads should not occur. rpki2.ripe.net provides all services (rsync repo, HTTPS trust anchor certificate) that rpki.ripe.net provides. The recommended way to test this is by making sure that your traffic that usually goes to rpki.ripe.net arrives at the IPs of rpki2.ripe.net. We will evaluate this environment with multiple relying party instances and a tool that checks for repository divergence over the coming weeks. In parallel, we are finishing the configuration (e.g. metrics and alerting). Please share your feedback with us at rpki@ripe.net. We aim to deploy the first approach before the end of the month. Kind regards, Ties de Kock [0] https://www.ripe.net/ripe/mail/archives/routing-wg/2021-April/004314.html
Ties de Kock wrote on 20/05/2021 12:24:
We aim to deploy the first approach before the end of the month.
nice one - this sounds really positive! Lots of stuff in there that ticks the right boxes (moving away from nfs, atomic update + cutover, etc). It will be great to see how this works out in production. Nick
Dear colleagues, Yesterday morning (2021-5-31, effective from ~9:434 UTC) we deployed a release that implemented "batched publication" of our RPKI repositories. This release writes the full rsync repository before changing the symlink to the root of the repository. Repositories are kept available for two hours after being written. This mitigates the non-repeable reads that users observed. We continue to be focused on the publication process of our RPKI systems. We will continue to evaluate using RRDP as a source of truth (to be independent of NFS/use cached IO) and to handle the expected increased load in the future due to relying party implementations implementing rsync fallback. Kind regards, Ties
On 20 May 2021, at 13:24, Ties de Kock <tdekock@ripe.net> wrote:
Dear colleagues,
Over the last weeks, as mentioned in a previous email [0], we have been exploring two approaches to improve our rsync publication process in parallel.
The first approach (“batched publication”) changes the RPKI system to atomically publish the complete set of files instead of writing them at multiple places in the code. For rsync, after all files have been written to a new directory, the repository is switched to the new content by atomically updating a symlink. The repository is updated every 15 minutes. We plan to deploy this soon during a scheduled downtime. During this downtime the repositories will be available, but managing a certification authority will not be possible.
The second approach uses RRDP as a source of truth and writes the repository to a local disk (instead of to NFS) with krill-sync (which also uses a symlink as the base for the current rsync module).
We have an experimental rsync environment based on the second approach available for testing. The environment contains the production RPKI repository. Please note that this environment is experimental and thus is not considered to be a critical service. Do not rely on this service for production networks. We may cause downtime on this service to adjust the configuration. However, when services are available, the content should be valid and contain the same objects as the production RRDP repository. It updates from RRDP every minute. The non-repeatable reads should not occur.
rpki2.ripe.net provides all services (rsync repo, HTTPS trust anchor certificate) that rpki.ripe.net provides. The recommended way to test this is by making sure that your traffic that usually goes to rpki.ripe.net arrives at the IPs of rpki2.ripe.net.
We will evaluate this environment with multiple relying party instances and a tool that checks for repository divergence over the coming weeks. In parallel, we are finishing the configuration (e.g. metrics and alerting). Please share your feedback with us at rpki@ripe.net.
We aim to deploy the first approach before the end of the month.
Kind regards, Ties de Kock
[0] https://www.ripe.net/ripe/mail/archives/routing-wg/2021-April/004314.html
participants (2)
-
Nick Hilliard
-
Ties de Kock