Hi all,
Many of you might not know me, but I’m part of RIPE’s software engineering team that takes care of RPKI.
I’ve been following this discussion closely and I've noticed some lack of clarity about our decision to duplicate our RPKI infrastructure.
So I think it’s important for us to tell a few things about our approach.
First what we have today in production:
- TA software (offline box)
- HSM for the TA (plus backups and spare parts)
- A few application servers running our RPKI software - I’ll call it RPKI-Core
- Redundant HSMs used by RPKI-Core
- RRDP publication service (cloud)
- Some rsync nodes (internal infra)
Something like the diagram below.
For testing environment we have practically the same infra.
And for public test (localcert) we use ‘soft' keys and no HSMs.
About the new AS0 TA, yes, we could simplify our infra.
One option would be to use ‘soft’ keys all around or use a HSM for TA only.
We could also use third-party software for TA, Core and publication service.
It crossed my mind, for a fraction of a second, to skip AS0 TA instances for our internal and/or public test environments.
But I don’t think we should treat it as a "second class citizen".
If we provide another TA, it’s worthy of receiving as much TLC as our production TA; meaning that it would also require the same (or similar) process around it as our production TA does. That includes keeping track of HSM card holders, defining a proper admin and operator quorum, scheduling periodical resigning sessions, etc…
I’m not here to advocate against nor in favour of AS0 TA.
But when discussing our implementation, this was our rationale to duplicate the infrastructure.
And that’s why it would cost us a lot to implement it.
Let me know you need more info about this subject.
Kind regards,
Thiago da Cruz
Sr. software engineer - RPKI Team
RIPE NCC
+---------------------+
| | +-------+
| TA (offline) +------------+ HSM |
| | +-------+
+---------------------+
+------------------------+
| |
+-----------> | RRDP publication |
| | |
| | (cloud) |
| | |
+-------------------+ +-------------------+ | +------------------------+
| | | | Publication |
| RPKI-Core 1 | (...) | RPKI-Core n | ----------------------> * +>
| | | | |
+--+-----+----+-----+ +--+------+-------+-+ | +----------------------------+
| | | | | | | | |
| | +---------------+ | | | | | Rsync publication |
| | | | | +----+ +-----------> | |
+-----+ +-----------+ +---------+ | | | (internal infra - x nodes) |
| | | | | | | |
| | | +-------------------+ | | |
| +-----------------------------------------+ | | +----------------------------+
| | | | | |
+-+----+--+ + + +-+------++
| HSM 1 | (......................) | HSM m |
+---------+ +---------+
Anton pointed out I may have both misunderstood and not answered your question.
The testbed is a soft TA. In deployment, people will have to move to a
new (as yet not created) TAL for AS0, as long as it runs independently
of the mainline TAL.
We intend running a distinct TA for AS0 until we get a clear signal
our community wants it integrated. We have stated concerns about the
automatic adoption of ASO products worldwide without visible agreement
of this activity, a separate TAL turns the activity from opt-out to
opt-in.
We are duplicating the software signing infrastructure, but with lower
costs overall given commonalities.
We are still discussing if we can run the offline-TA HSM and the
online production key HSM for both activities, or if we need a
distinct infrastructure for AS0 and mainline. Duplication overall is
not in APNIC's model, we rely on spares and alternate use of the HSM,
but production signing systems are single instances. I believe they
are capable of some virtualisation or segmentation but that skirts the
underlying physical risk/dependency.
Sorry for not being clearer before
-George
On Wed, Feb 26, 2020 at 6:18 PM Carlos Friaças via routing-wg
<
routing-wg@ripe.net> wrote:
Hi,
Any clue if APNIC has duplicated the infrastructure (and cost) as it is
foreseen in the NCC's impact analysis...?
Carlos
On Wed, 26 Feb 2020, JORDI PALET MARTINEZ via routing-wg wrote:
Hi Max,
I think is too early to take a decision, and in fact I don't think we are yet in case A.
Consensus is about justified objections. I can see also people in favor and I understand, as we usually do in any proposal discussion, that non-objection is consent.
The only justification that I can see is from Job about possible cost. However, I don't see figures about how much it cost to develop this AS0 + how much it cost the operators to use it (if they want) vs developing the SLURM + making sure it is secure as RPKI + how much ti cost the operators to use it.
And by the way, the AS0 is compatible with the SLURM, so opeartors can choose.
Regards,
Jordi
@jordipalet
El 25/2/20 20:30, "routing-wg en nombre de Massimiliano Stucchi" <routing-wg-bounces@ripe.net en nombre de max@stucchi.ch> escribió:
Hi everyone,
On 20/02/2020 15:39, Petrit Hasani wrote:
As per the RIPE Policy Development Process (PDP), the purpose of this four week Review Phase is to continue discussion of the proposal, taking the impact analysis into consideration, and to review the full draft RIPE Policy Document.
At the end of the Review Phase, the Working Group (WG) Chairs will determine whether the WG has reached rough consensus. It is therefore important to provide your opinion, even if it is simply a restatement of your input from the previous phase.
Today, me and the other proposers of this policy change had a meeting to
discuss the feedback we have been receiving on the list.
We understand that many people find this proposal controversial, and
many have expressed themselves against it in the past days.
We would like to encourage discussion and provide us with a bit of
guidance on how the community would like to proceed. At present we have
identified three ways of progressing:
A) We can try to go ahead with this proposal, although it will be hard
to get consensus;
B) We can drop the proposal, and leave everything as is;
C) We can change the proposal to a different ask for RIPE NCC. The idea
could be to ask RIPE NCC to provide a SLURM file (similar to what APNIC
does), so that single users can decide if they want to feed it to their
validators.
From what we gathered in the discussions, I think B) could be the most
sought-after decision, but we would like to propose C) as the way
forward. It would give the possibility to those who want to implement
this solution to do it in a lightweight fashion. It would for sure be
much much cheaper to implement.
In any case, as Job already pointed out, I prepared a simple tool to
generate a SLURM file using either the Team Cymru bogons list, or
considering any unassigned space from the NRO delegated stats file.
RIPE NCC has kindly provided help and patches to improve it. If you
want to give it a go, you can find it here:
https://github.com/stucchimax/rpki-as0-bogons
Thank you for any suggestion or any discussion around this.
Ciao!
--
Massimiliano Stucchi
MS16801-RIPE
Twitter/Telegram: @stucchimax
**********************************************
IPv4 is over
Are you ready for the new Internet ?
http://www.theipv6company.com
The IPv6 Company
This electronic message contains information which may be privileged or confidential. The information is intended to be for the exclusive use of the individual(s) named above and further non-explicilty authorized disclosure, copying, distribution or use of the contents of this information, even if partially, including attached files, is strictly prohibited and will be considered a criminal offense. If you are not the intended recipient be aware that any disclosure, copying, distribution or use of the contents of this information, even if partially, including attached files, is strictly prohibited, will be considered a criminal offense, so you must reply to the original sender to inform about this communication and delete it.