[routing-wg] Re: policy proposal: "Automatic Revocation of Persistently Non-functional Delegated RPKI CAs"

6 Mar 2025


      برنامه ای ک نتونه ی گوشی روازطریق شماره سریال وکد۱۵رقمی وشماره تلفنش ردیابی
کنه برنامه نیست

در تاریخ پنجشنبه ۶ مارس ۲۰۲۵،‏ ۱۷:۵۸ Job Snijders <job@sobornost.net> نوشت:
...
Dear Tim, others,
Thank you for sharing your feedback!
On Thu, Mar 06, 2025 at 11:01:25AM +0100, Tim Bruijnzeels wrote:
...
...
It probably is good to have some discussion around:
* should perpetually broken Delegated CA setups be culled at some
point?
I will leave the policy answer to this to the community.
But let me share "prior art":
From my previous employment I know that nic.br have a similar process
for their member CAs. All of their member CAs are "delegated". This
may not be apparent because most of them publish through a publication
server and repository provided by nic.br. In any case, nic.br monitors
for CAs that have gone offline and "prune" them if they are not
restored in a given time. I am not entirely sure what timeline they
use, and if it's an automated or manual process, but I believe it is
less than 100 days.
ah yes, yup, good to know!
...
...
* is the continuous lack of a valid current manifest for a 100 day
period
 of time a good indicator for "Delegated CA brokenness"?
This is one way to do this.
It requires that the parent CA (RIPE NCC in this case) monitors the
repositories for member CA certificates that it issues to delegated
CAs. These CAs may use the Publication-as-a-Service (PaaS) provided
by RIPE NCC, or they could run their own publication server.
Correct.
...
If they run their own publication server then it may be that the
delegated CA itself is available, and as a parent we see it connecting
regularly through RFC 6492 (provisioning protocol) exchanges, but
their repository is unavailable. In such cases, it may be better to
advise these CAs to migrate to using the RIPE NCC provided PaaS
instead.
100% - I'd consider it a net positive whenever Delegated CAs migrate
towards using RIPE NCC's PAAS instead of running their own.
...
Another thing to note here is that revoking a CA certificate will stop
RPKI validators from trying to get the content, and it will silence
the warnings in the logs, but if these CAs use a shared repository -
such as the PaaS, then the content will still be there until that is
also actively removed.
Yes, your observation is correct. To this point I'd argue that RPs (and
also publication point operators) simply are no worse off, as with (or
without) a revocation policy, such content continues to be distributed.
Cleaning up 'useless data in PAAS' probably is a good next policy
proposal to consider (after having established consensus &
implementation of an non-functional CA revocation policy).
...
Delegated CAs can delegate...
A member that uses a delegated CA may delegate all or some of their
resources to "child" CAs of their own. Those CAs may publish at the
PaaS (we currently allow the member to configure up to 10 publishers),
or publish in their own repositories. It may happen that the member CA
issued under RIPE NCC is functioning correctly, but their delegated
CAs (or "grandchildren" etc) are having an issue.
I think we should have clarity to the RIPE NCC what to do in such
cases:
- Is this out-of-scope?
I believe "Subjects of RIPE NCC's direct subjects" are out of scope,
quite literally - because RIPE NCC's CA cannot revoke child-of-child
certificates; such certificates simply are not within the scope of RIPE
NCC's CRL.
...
- Should the RIPE NCC monitor the entire delegated member tree?
Personally I'd monitor _at least_ the directly subordinate subjects, but
there may be advantages to having (separate?) instances monitor as much
as possible.
...
- Should the RIPE NCC revoke a member CA with broken delegated CAs?
In short: no.
I think strong arguments can be made that - when there is no current
valid Manifest/CRL for an extended period of time - nothing of value is
lost by revoking that non-functional CA.
On the other hand, the moment a CA is functional (functional enough to
have delegated some authority to other CAs), the CA might serve some
purpose to someone, even though (a subset of) its subordinate products
are broken in one way or another.
Perhaps in excess - of course a non-functional CA cannot delegate
authority, because functional delegation requires a functional CA :)
...
- Should the RIPE NCC actively engage with such member CAs, but leave
the actions to them?
Such an initiative probably doesn't need to be ratified in this policy
proposal, but in general it probably is good to attempt to understand
_why_ certain forms of brokenness exist in the ecosystem. Questions such
as "is the PAAS too hard to use?" "is there some kind of unforeseen
friction in one of RIPE NCC's RPKI services?" etc
I suspect most of the time things are just broken because someone didn't
clean up the leftovers of a study / experiment - but who knows? :-)
...
If the RIPE NCC is to monitor the entire tree under a certificate
issued to a delegated member CA, then this could amount to significant
work. On the other hand, another advantage of such a process could
be that the RIPE NCC can also monitor (re-actively, because it is
always after the fact) for other bad things that can happen, such
as CAs issuing an enormous amount of objects or delegating to a vast
number of CAs - which may also impact RPKI validators and perhaps
should warrant some kind of action.
Yes, that certainly is an advantage.
If I were to technically implement this I'd advocate for separate
monitoring pipelines so that one doesn't block the other. The
rpki-client implementation allows the RP operator to exactly specify
(through an allowlist/blocklist) how 'deep' to traverse the tree. So it
is conceivable to have instances that monitor "only the directly
subordinate CAs" and separate instances that monitor "everything under
the RIPE NCC TAL".
...
Finally, I also want to mention PaaS - once more.
Given that Easter is just around the corner, I anticipate that the Dutch
word 'paas' will be mentioned many many times in the coming weeks ;-)
...
What may happen here is that a delegated member CA delegates to a CA
of their own that also publishes at the PaaS. If the member CA then
removes the delegated CA (revokes it) - that may actually continue to
function and publish at the PaaS, or they may simply not remove their
old content. The latter can be detected relatively easily (no more RFC
8181 interactions, and the manifests in the (sub-)repo are old). But
the former is harder from the RIPE NCC perspective.
Yeah it sounds like a separate initiative to 'automatically clean PaaS'
is warranted down the road. Where the CA's functional/non-functional
state can be deducted from the observability of a current valid
Manifest/CRL, for PaaS detecting the "liveliness of a publication
client" might need to be inferred from 8181 interactions.
I think you raise important questions which suggest there is more work
to be done, I suspect it would be good to attempt that tackle that after
dealing with non-functional CAs first.
...
The result of not doing anything here is having content in the PaaS
repository that is unreferenced in the RPKI tree. It may not result in
warnings in RPKI validators, but it adds to their burden in terms of
data usage for RRDP snapshot downloads, full rsync, and local storage
in the validators. In short, this may also warrant thought.
I've been tracking 'unreferenced objects' over time and have reached out
to RIRs where the number was higher than 'the usual churn'. Situations
in which excessive numbers of unreferenced objects existed were all
resolved in a timely fashion. In this context 'excessive' was
tens-of-thousands of objects.
...
I hope these observations are helpful. And I hope it is clear that
they are not intended to dissuade people from the policy.
Yes, thank you for chiming in.
...
There are many corner cases that we can consider, and many ways that
we can deal with them. It may be hard to enumerate them all in a
policy. In that I would like to echo a comment made earlier that if
the policy describes the problem to solve rather than the solution
this may leave us at the RIPE NCC more room to come up with a good
solution and adjust it over time. Of course any such implementation of
policy requirements would be published publicly and open to community
feedback.
Yup - we should try to avoid painting ourselves into a corner by having
to re-do the PDP because the first attempt was overly prescriptive. :)
In closing - I want to mention I have already implemented my own
"non-functional CA detection system" and currently run two independent
instances in two cities. If this policy moves forward to the point that
RIPE NCC actually starts to study implementation, from my side there
will be an opportunity to compare notes and verify that both our
implementations arrive at the exact same outcome. I can check your
homework and you can check mine! :-)
I'm happy to explain (in-person?) how my detection system works, share
my code, and provide access to the ongoing measurements.
Kind regards,
Job
ps. Here is an overview of the top 10 locations' current number of
'unreferenced files' in the global RPKI.
Unref_Objects  Publication_FQDN          Size
        2065           rpki.apnic.net            12MB
        1776           rpki-repo.registro.br     14MB
        619            rsync.paas.rpki.ripe.net  4.4MB
        206            rpki.sub.apnic.net        1.4MB
        202            rpki.afrinic.net          1.2MB
        144            rpki-rps.arin.net         1.1MB
        39             rpki.cnnic.cn             775KB
        23             r.magellan.ipxo.com       130KB
        21             rpki.apernet.io           125KB
        20             repo.kagl.me              116KB
The total number of objects is around 5,500 - which is not perfect, but
certainly not as bad as it has been in the past.
-----
To unsubscribe from this mailing list or change your subscription options,
please visit: https://mailman.ripe.net/mailman3/lists/routing-wg.ripe.net/
As we have migrated to Mailman 3, you will need to create an account with
the email matching your subscription before you can change your settings.
More details at: https://www.ripe.net/membership/mail/mailman-3-migration/

[routing-wg] Re: policy proposal: "Automatic Revocation of Persistently Non-functional Delegated RPKI CAs"

elahemosavi