
Dear Job, and Routing Working Group, I have been reading this thread with interest! As RIPE NCC employee I am happy to see this as a proposed policy. In my role I do not want to comment on the merit of the policy itself, at least not in any formal sense, but I believe that sharing technical and/or implementation related observations would be helpful. I am responding to the original email, not because I am unware of valuable comments that were made, but because I would like to make some additional technical observations that were not yet raised. See in-line:
On 25 Feb 2025, at 13:05, Job Snijders <job@sobornost.net> wrote:
Dear all,
I'd like to propose a mechanism to automatically prune dead branches in the RPKI directly subordinate to RIPE NCC. Below is the policy proposal text that I have in mind.
In short: RIPE NCC should revoke a Delegated CA's certificate when absolutely no "sign of life" (valid Manifest) has been observed for over a 100 days.
It probably is good to have some discussion around:
* should perpetually broken Delegated CA setups be culled at some point?
I will leave the policy answer to this to the community. But let me share "prior art": From my previous employment I know that nic.br have a similar process for their member CAs. All of their member CAs are "delegated". This may not be apparent because most of them publish through a publication server and repository provided by nic.br. In any case, nic.br monitors for CAs that have gone offline and "prune" them if they are not restored in a given time. I am not entirely sure what timeline they use, and if it's an automated or manual process, but I believe it is less than 100 days.
* is the continuous lack of a valid current manifest for a 100 day period of time a good indicator for "Delegated CA brokenness"?
This is one way to do this. It requires that the parent CA (RIPE NCC in this case) monitors the repositories for member CA certificates that it issues to delegated CAs. These CAs may use the Publication-as-a-Service (PaaS) provided by RIPE NCC, or they could run their own publication server. If they run their own publication server then it may be that the delegated CA itself is available, and as a parent we see it connecting regularly through RFC 6492 (provisioning protocol) exchanges, but their repository is unavailable. In such cases, it may be better to advise these CAs to migrate to using the RIPE NCC provided PaaS instead. Another thing to note here is that revoking a CA certificate will stop RPKI validators from trying to get the content, and it will silence the warnings in the logs, but if these CAs use a shared repository - such as the PaaS, then the content will still be there until that is also actively removed. Delegated CAs can delegate... A member that uses a delegated CA may delegate all or some of their resources to "child" CAs of their own. Those CAs may publish at the PaaS (we currently allow the member to configure up to 10 publishers), or publish in their own repositories. It may happen that the member CA issued under RIPE NCC is functioning correctly, but their delegated CAs (or "grandchildren" etc) are having an issue. I think we should have clarity to the RIPE NCC what to do in such cases: - Is this out-of-scope? - Should the RIPE NCC monitor the entire delegated member tree? - Should the RIPE NCC revoke a member CA with broken delegated CAs? - Should the RIPE NCC actively engage with such member CAs, but leave the actions to them? If the RIPE NCC is to monitor the entire tree under a certificate issued to a delegated member CA, then this could amount to significant work. On the other hand, another advantage of such a process could be that the RIPE NCC can also monitor (re-actively, because it is always after the fact) for other bad things that can happen, such as CAs issuing an enormous amount of objects or delegating to a vast number of CAs - which may also impact RPKI validators and perhaps should warrant some kind of action. Finally, I also want to mention PaaS - once more. What may happen here is that a delegated member CA delegates to a CA of their own that also publishes at the PaaS. If the member CA then removes the delegated CA (revokes it) - that may actually continue to function and publish at the PaaS, or they may simply not remove their old content. The latter can be detected relatively easily (no more RFC 8181 interactions, and the manifests in the (sub-)repo are old). But the former is harder from the RIPE NCC perspective. The result of not doing anything here is having content in the PaaS repository that is unreferenced in the RPKI tree. It may not result in warnings in RPKI validators, but it adds to their burden in terms of data usage for RRDP snapshot downloads, full rsync, and local storage in the validators. In short, this may also warrant thought. I hope these observations are helpful. And I hope it is clear that they are not intended to dissuade people from the policy. There are many corner cases that we can consider, and many ways that we can deal with them. It may be hard to enumerate them all in a policy. In that I would like to echo a comment made earlier that if the policy describes the problem to solve rather than the solution this may leave us at the RIPE NCC more room to come up with a good solution and adjust it over time. Of course any such implementation of policy requirements would be published publicly and open to community feedback. Kind regards, Tim Bruijnzeels (Principal Engineer RPKI, RIPE NCC)
Your feedback is most welcome!
Kind regards,
Job
-------------------
Policy Proposal Name:
Automatic Revocation of Persistently Non-functional Delegated RPKI Certification Authorities
Author: a. name: Job Snijders b. email: job@sobornost.net
Proposal Version: (to be assigned by the RIPE NCC)
Submission Date: February 25th, 2025
Suggested RIPE WG for discussion and publication: RIPE Routing Working Group
Proposal Type: NEW
Policy Term: Indefinite
Summary of proposal:
RIPE NCC offers users of its RPKI certification service two deployment models: "Hosted CA setup" and "Delegated CA setup". In the Hosted setup RIPE NCC is responsible for timely issuance and publication of new RPKI Manifests and CRLs, but in the Delegated setup users themselves manage their CA on their own infrastructure.
It is possible for Delegated CA infrastructure to be offline for extended periods of time or for the contents of publication repositories to become stale or otherwise invalid. This proposal suggests to provide mandate to RIPE NCC to revoke resource certificates associated with longtime non-functional CAs to reduce Relying Party workload.
This policy proposal targets only pathologically non-functional CAs. An example of a situation considered out-of-scope for this policy would be a publication repository service advertised to also be available via IPv6 and RRDP but in practise only reachable via IPv4 and Rsync: the associated CA would still be considered functional (provided a valid and current Manifest could somehow be retrieved and validated sometime in the previous one hundred days). In other words: this policy proposal isn't about CAs that didn't achieve 100% uptime, but about CAs that are down all the time.
Policy text:
If RIPE NCC is unable to discover and validate a Delegated RPKI Certification Authority's (CA's) current Manifest and CRL for one hundred consecutive days, that Delegated CA's resource certificate shall be revoked by the RIPE NCC. RIPE NCC shall make reasonable efforts to discover new Manifests, for example, by corroborating information from multiple vantage points. After revocation, the Resource Holder may either reinitialize the Delegated CA setup or choose the Hosted CA setup.
Rationale: a. Arguments supporting the proposal
Persistently Non-functional Delegated CAs (PNDCs, for short) have subtle effects within the RPKI ecosystem which may become more pronounced over time.
* PNDCs offer nothing of value to RPs (because without a current valid Manifest any signed payloads are unavailable).
* RP synchronisation becomes more economic with fewer purposeless caRepositories to traverse.
* PNDCs besmirch Relying Party (RP) syslog message archives and waste RP CPU cycles and network traffic.
* Automatic revocation is only a minor inconvenience for CAs (that already were non-functional to begin with), but a big deal for RPs - especially when taking into account many future synchronisation attempts over long periods of time.
b. Arguments opposing the proposal
* Resource holders might require more than one hundred days to complete the initial Delegated CA setup.
(Counterpoints: initial setup procedures usually only takes a few minutes. Resource holders are free to simply retry the delegated CA setup procedure following automatic revocation.)
Additional opposing arguments to be determined. ----- To unsubscribe from this mailing list or change your subscription options, please visit: https://mailman.ripe.net/mailman3/lists/routing-wg.ripe.net/ As we have migrated to Mailman 3, you will need to create an account with the email matching your subscription before you can change your settings. More details at: https://www.ripe.net/membership/mail/mailman-3-migration/