[routing-wg] Re: policy proposal: "Automatic Revocation of Persistently Non-functional Delegated RPKI CAs"

6 Mar 2025

      Dear Job, and Routing Working Group,

I have been reading this thread with interest!

As RIPE NCC employee I am happy to see this as a proposed policy. In my
role I do not want to comment on the merit of the policy itself, at least
not in any formal sense, but I believe that sharing technical and/or
implementation related observations would be helpful.

I am responding to the original email, not because I am unware of valuable
comments that were made, but because I would like to make some additional
technical observations that were not yet raised.

See in-line:
...
On 25 Feb 2025, at 13:05, Job Snijders <job@sobornost.net> wrote:
Dear all,
I'd like to propose a mechanism to automatically prune dead branches
in the RPKI directly subordinate to RIPE NCC. Below is the policy proposal
text that I have in mind.
In short: RIPE NCC should revoke a Delegated CA's certificate when absolutely
no "sign of life" (valid Manifest) has been observed for over a 100 days.
It probably is good to have some discussion around:
* should perpetually broken Delegated CA setups be culled at some point?
I will leave the policy answer to this to the community.

But let me share "prior art":

From my previous employment I know that nic.br have a similar process for
their member CAs. All of their member CAs are "delegated". This may not be
apparent because most of them publish through a publication server and
repository provided by nic.br. In any case, nic.br monitors for CAs that have
gone offline and "prune" them if they are not restored in a given time. I am
not entirely sure what timeline they use, and if it's an automated or manual
process, but I believe it is less than 100 days.
...
* is the continuous lack of a valid current manifest for a 100 day period
 of time a good indicator for "Delegated CA brokenness"?
This is one way to do this.

It requires that the parent CA (RIPE NCC in this case) monitors the
repositories for member CA certificates that it issues to delegated CAs.
These CAs may use the Publication-as-a-Service (PaaS) provided by RIPE NCC,
or they could run their own publication server.

If they run their own publication server then it may be that the delegated
CA itself is available, and as a parent we see it connecting regularly
through RFC 6492 (provisioning protocol) exchanges, but their repository
is unavailable. In such cases, it may be better to advise these CAs to
migrate to using the RIPE NCC provided PaaS instead.

Another thing to note here is that revoking a CA certificate will stop
RPKI validators from trying to get the content, and it will silence the
warnings in the logs, but if these CAs use a shared repository - such
as the PaaS, then the content will still be there until that is also
actively removed.

Delegated CAs can delegate...

A member that uses a delegated CA may delegate all or some of their
resources to "child" CAs of their own. Those CAs may publish at the
PaaS (we currently allow the member to configure up to 10 publishers),
or publish in their own repositories. It may happen that the member
CA issued under RIPE NCC is functioning correctly, but their delegated
CAs (or "grandchildren" etc) are having an issue.

I think we should have clarity to the RIPE NCC what to do in such
cases:
- Is this out-of-scope?
- Should the RIPE NCC monitor the entire delegated member tree?
- Should the RIPE NCC revoke a member CA with broken delegated CAs?
- Should the RIPE NCC actively engage with such member CAs,
  but leave the actions to them?

If the RIPE NCC is to monitor the entire tree under a certificate
issued to a delegated member CA, then this could amount to significant
work. On the other hand, another advantage of such a process could
be that the RIPE NCC can also monitor (re-actively, because it is
always after the fact) for other bad things that can happen, such
as CAs issuing an enormous amount of objects or delegating to a vast
number of CAs - which may also impact RPKI validators and perhaps
should warrant some kind of action.

Finally, I also want to mention PaaS - once more. What may happen
here is that a delegated member CA delegates to a CA of their own
that also publishes at the PaaS. If the member CA then removes the
delegated CA (revokes it) - that may actually continue to function
and publish at the PaaS, or they may simply not remove their old
content. The latter can be detected relatively easily (no more
RFC 8181 interactions, and the manifests in the (sub-)repo are old).
But the former is harder from the RIPE NCC perspective.

The result of not doing anything here is having content in the PaaS
repository that is unreferenced in the RPKI tree. It may not result
in warnings in RPKI validators, but it adds to their burden in terms
of data usage for RRDP snapshot downloads, full rsync, and local
storage in the validators. In short, this may also warrant thought.

I hope these observations are helpful. And I hope it is clear that
they are not intended to dissuade people from the policy.

There are many corner cases that we can consider, and many ways that
we can deal with them. It may be hard to enumerate them all in a
policy. In that I would like to echo a comment made earlier that
if the policy describes the problem to solve rather than the solution
this may leave us at the RIPE NCC more room to come up with a good
solution and adjust it over time. Of course any such implementation
of policy requirements would be published publicly and open to community
feedback.

Kind regards,

Tim Bruijnzeels

(Principal Engineer RPKI, RIPE NCC)
...
Your feedback is most welcome!
Kind regards,
Job
-------------------
Policy Proposal Name:
Automatic Revocation of Persistently Non-functional Delegated RPKI Certification Authorities
Author:
a. name: Job Snijders
b. email: job@sobornost.net
Proposal Version: (to be assigned by the RIPE NCC)
Submission Date: February 25th, 2025
Suggested RIPE WG for discussion and publication: RIPE Routing Working Group
Proposal Type: NEW
Policy Term: Indefinite
Summary of proposal:
RIPE NCC offers users of its RPKI certification service two
deployment models: "Hosted CA setup" and "Delegated CA setup".
In the Hosted setup RIPE NCC is responsible for timely issuance
and publication of new RPKI Manifests and CRLs, but in the
Delegated setup users themselves manage their CA on their own
infrastructure.
It is possible for Delegated CA infrastructure to be offline for
extended periods of time or for the contents of publication
repositories to become stale or otherwise invalid. This proposal
suggests to provide mandate to RIPE NCC to revoke resource
certificates associated with longtime non-functional CAs to
reduce Relying Party workload.
This policy proposal targets only pathologically non-functional
CAs. An example of a situation considered out-of-scope for this
policy would be a publication repository service advertised to
also be available via IPv6 and RRDP but in practise only
reachable via IPv4 and Rsync: the associated CA would still be
considered functional (provided a valid and current Manifest
could somehow be retrieved and validated sometime in the
previous one hundred days). In other words: this policy proposal
isn't about CAs that didn't achieve 100% uptime, but about CAs
that are down all the time.
Policy text:
If RIPE NCC is unable to discover and validate a Delegated RPKI
Certification Authority's (CA's) current Manifest and CRL for
one hundred consecutive days, that Delegated CA's resource
certificate shall be revoked by the RIPE NCC. RIPE NCC shall
make reasonable efforts to discover new Manifests, for example,
by corroborating information from multiple vantage points. After
revocation, the Resource Holder may either reinitialize the
Delegated CA setup or choose the Hosted CA setup.
Rationale:
a. Arguments supporting the proposal
Persistently Non-functional Delegated CAs (PNDCs, for short)
have subtle effects within the RPKI ecosystem which may become
more pronounced over time.
* PNDCs offer nothing of value to RPs (because without a current
 valid Manifest any signed payloads are unavailable).
* RP synchronisation becomes more economic with fewer
 purposeless caRepositories to traverse.
* PNDCs besmirch Relying Party (RP) syslog message archives and
 waste RP CPU cycles and network traffic.
* Automatic revocation is only a minor inconvenience for CAs
 (that already were non-functional to begin with), but a big
 deal for RPs - especially when taking into account many
 future synchronisation attempts over long periods of time.
b. Arguments opposing the proposal
* Resource holders might require more than one hundred days to
 complete the initial Delegated CA setup.
(Counterpoints: initial setup procedures usually only takes a
  few minutes. Resource holders are free to simply retry the
  delegated CA setup procedure following automatic revocation.)
Additional opposing arguments to be determined.
-----
To unsubscribe from this mailing list or change your subscription options, please visit: https://mailman.ripe.net/mailman3/lists/routing-wg.ripe.net/
As we have migrated to Mailman 3, you will need to create an account with the email matching your subscription before you can change your settings. 
More details at: https://www.ripe.net/membership/mail/mailman-3-migration/