Op 16 feb. 2021, om 19:22 heeft Job Snijders via routing-wg <
routing-wg@ripe.net> het volgende geschreven:
Dear RIPE NCC,
On Tue, Feb 16, 2021 at 04:56:31PM +0100, Nathalie Trenaman wrote:
On Monday, 15 February we encountered an issue with our RPKI software.
This issue prevented us from publishing RPKI object updates from
08:07-18:06 (UTC).
During this period, Certificate Authority activation and Route Origin
Authorization configuration updates were delayed and therefore not
visible in the RPKI repository.
It appears Certificate Authority revocation was also delayed.
Indeed, all modifications, creations and deletions were delayed.
The updates were published after we restarted the system at 17:45
(UTC), with full recovery completed by 18:06 (UTC). Since this
non-publishing period is shorter than our default RPKI object validity
period, set to 8 hours, existing objects that are not updated were
still valid. No data was lost during this period.
Can the following phrase "default RPKI object validty period, set to 8
hours" please be clarified?
For objects produced in the RIPE-hosted RPKI environment I observe the
following validity periods are commonly used:
Object type | validity duration after issuance
-------------------+---------------------------------
CRLs | 24 hours
ROA EE certs | 18 months
Manifest eContent | 24 hours
Manifest EE certs | 7 days
CAs | 18 months
I'm just guessing, is the '8 hour' period a reference to RIPE-751
section 2.3?
"A certificate will be published within eight hours of being issued (or deleted)."
Yes, the eight hours referred to this section in the CPS but also in 4.3.1:
"The Production CA and the ACA, as well as hosted CAs, make all subordinate certificates and objects available for publication. While the system will make a best effort to publish these materials as soon as possible, publication should happen no later than eight hours after issuance (as described in Section 2.3.)”
The validity periods are all longer than eight hours, and we can confirm that none of the RIPE hosted objects expired within the non-publishing time frame.
The RIPE-751 CPS also states in section 4.9.8 ("Maximum latency for
CRLs"): CRLs will be published to the repository system within one hour
of their generation.
As the outage appears to have exceeded both the 1 hour revocation window
and 8 hour object publication window, RIPE NCC was not compliant with
its own CPS.
Correct.
The multitude of RPKI service impacting events as a result from
maloperation of the RIPE NCC trust anchor are starting to give me me
cause for concern.
I’m sorry to hear this. Transparency is key for us, this means that we report any event. In this case, we were not compliant with our CPS and this non-publishing period had operational impact.
However, not all relying party software discovered this non-publishing period, for example, rpki-client. Routinator logs these warnings. Is this something all relying party software should log? Maybe this should be discussed in the SIDROPS wg at the IETF.
Kind regards,
Nathalie Trenaman
Routing Security Programme Manager
RIPE NCC