New on RIPE Labs: RPKI Repositories and the RIPE Database in the Cloud
Dear colleagues, The mission critical services the RIPE NCC provides to the Internet community require a solid technical foundation. In this new article on RIPE Labs, Felipe Silveira looks at plans to use cloud infrastructure as a means to that end. The full article is available here: https://labs.ripe.net/author/felipe_victolla_silveira/rpki-repositories-and-... Kind regards, Alun Davies RIPE Labs Editor RIPE NCC
Hello Alun, while I welcome this technical move towards the cloud and a more resilient infrastructure, please let's not forget to avoid any deadlocks in the future. I could also envision a difficult situation for DNS over HTTPS (i.e. how would you run HTTPS _without_ DNS?). What I want to say is, core-services that are fundamentally important, such as the RIPE-Database or RPKI-validations rely on _direct_ availability. In case of a catastrophic AWS failure, or simply "bootstrapping the Internet" - let's always keep this in mind that we are able to start up the net without too many interwoven services. So this is a vote towards "keep also a plan-b in reserve" in case there a problem. All the best - and please keep healthy, regards, Kurt Am 10.05.21 um 13:40 schrieb Alun Davies:
Dear colleagues,
The mission critical services the RIPE NCC provides to the Internet community require a solid technical foundation. In this new article on RIPE Labs, Felipe Silveira looks at plans to use cloud infrastructure as a means to that end. The full article is available here:
https://labs.ripe.net/author/felipe_victolla_silveira/rpki-repositories-and-...
Kind regards,
Alun Davies RIPE Labs Editor RIPE NCC
Dear Kurt, Thank you for your response. We agree with you: avoiding deadlocks and keeping our architecture simple without too many interdependencies are very important principles in our design. We believe that having a secondary cloud provider for RPKI repositories and keeping our own-premises infrastructure for WHOIS is one of the steps in achieving this goal, and we will continuously review our plans and adapt as needed. We appreciate your input and welcome others to join the discussion too. Next week I will present part of those plans in NCC Services WG, looking forward to hear your feedback there. Kind regards, Felipe
On 10 May 2021, at 14:09, Kurt Kayser <kurt_kayser@gmx.de> wrote:
Hello Alun,
while I welcome this technical move towards the cloud and a more resilient infrastructure, please let's not forget to avoid any deadlocks in the future.
I could also envision a difficult situation for DNS over HTTPS (i.e. how would you run HTTPS without DNS?).
What I want to say is, core-services that are fundamentally important, such as the RIPE-Database or RPKI-validations rely on direct availability.
In case of a catastrophic AWS failure, or simply "bootstrapping the Internet" - let's always keep this in mind that we are able to start up the net without too many interwoven services.
So this is a vote towards "keep also a plan-b in reserve" in case there a problem.
All the best - and please keep healthy,
regards, Kurt
Am 10.05.21 um 13:40 schrieb Alun Davies:
Dear colleagues,
The mission critical services the RIPE NCC provides to the Internet community require a solid technical foundation. In this new article on RIPE Labs, Felipe Silveira looks at plans to use cloud infrastructure as a means to that end. The full article is available here:
https://labs.ripe.net/author/felipe_victolla_silveira/rpki-repositories-and-... <https://labs.ripe.net/author/felipe_victolla_silveira/rpki-repositories-and-the-ripe-database-in-the-cloud/>
Kind regards,
Alun Davies RIPE Labs Editor RIPE NCC
* Felipe Victolla Silveira (fvictolla@ripe.net) [210511 15:03]:
Next week I will present part of those plans in NCC Services WG, looking forward to hear your feedback there.
Great :-) Certainly, all of these new cloud services will provide full IPv6 support without exceptions (which doesn't seem to have been easy with some cloud providers in the past). It would be nice the hear something about the challenges you faced on the way and how you managed to overcome them. Bjørn
Hi, On Mon, May 10, 2021 at 01:40:07PM +0200, Alun Davies wrote:
The mission critical services the RIPE NCC provides to the Internet community require a solid technical foundation. In this new article on RIPE Labs, Felipe Silveira looks at plans to use cloud infrastructure as a means to that end. The full article is available here:
As a member, I do not want the RIPE NCC to spend our money on "give away control on critical services". Use of cloud services (or any other "outsourced infrastructure") is something I consider acceptable as a *backup* in case of something catastrophic happening to the RIPE NCC operated machines, to restore services to members and community quicker. Using cloud services generally implies - loss of control --> so the NCC *must* be primary authority on all data, and the cloud can only be a cache - loss of contact and responsibility --> if a NCC provided service does not work, I do not want to talk to a cloud provider hotline, or hear from the NCC "well, there is nothing we can do, something in the cloud is broken" All this cloudstuff is really great if you need "elastic services" (like, when the big run on the last IPv4 space starts, scale up the LIR portal to 200 instances - oh, wait, this opportunity got missed), or "low latency for high bandwidth content delivery" (so, yeah, ... no?). But for the services the NCC provides, "cloud" sounds like "yeah, someone else to blaim if it explodes", and this is not why we give the NCC money. (And no, there is not much trust from my side, since the ticket system is *still* a major annoyance in our day to day dealing with the NCC - despite promises, two years ago, to make this more usable) Gert Doering -- voting LIR contact -- have you enabled IPv6 on something today...? SpaceNet AG Vorstand: Sebastian v. Bomhard, Michael Emmer Joseph-Dollinger-Bogen 14 Aufsichtsratsvors.: A. Grundner-Culemann D-80807 Muenchen HRB: 136055 (AG Muenchen) Tel: +49 (0)89/32356-444 USt-IdNr.: DE813185279
On 10 May 2021, at 12:40, Alun Davies wrote:
The mission critical services the RIPE NCC provides to the Internet community require a solid technical foundation. In this new article on RIPE Labs, Felipe Silveira looks at plans to use cloud infrastructure as a means to that end. The full article is available here:
https://labs.ripe.net/author/felipe_victolla_silveira/rpki-repositories-and-...
This article is an excellent teaser for next week’s presentation in the RIPE NCC Services WG at RIPE 82. It raises a number of questions in the reader’s mind and gives an implicit promise that these will be addressed in the presentation and in subsequent discussion. Here below are some questions which occur to me, in no particular order. I am interested to know the extent to which community input has informed the project, as it touches the scope of three RIPE working groups (Database, Routing, and NCC Services) and that of the Database Requirements Task Force. The intention to include an additional cloud provider is mentioned; I would like to know what the timeline for this is, and what the current state of progress is. I wonder how the availability of the service will be assessed for testing against the proposed targets (five nines and worst-case 1-hour recovery from catastrophe). I also wonder how apparent conflict between these availability targets will be resolved. I will share my arithmetic separately later. I would like to know the rationale for two design choices: the use of the DNS, rather than BGP and anycast for maintaining or restoring service under fault conditions, and the selection of a “fallback” model rather than a “composite cloud” model. I am sure that other members of the community will have equally (or even more) interesting questions, and look forward to an interesting presentation at RIPE 82 and to the subsequent discussion. Best regards, Niall O’Reilly
On 13 May 2021, at 16:19, Niall O'Reilly wrote:
I will share my arithmetic separately later.
So, as promised. 1. Time to recover five-nines availability after worst-case service-recovery delay: An hour’s outage must be matched by a minimum of 99,999 hours uninterrupted availability. 99999 / ( 24 * 365 ) gives 11.4. That’s nearly eleven and a half years. 2. Maximum cumulative outage between RIPE meetings which would allow RIPE NCC to report achieving the five-nines target. RIPE meetings are held twice a year, but not precisely six months apart; the interval is approximately 180 days. Converting to seconds and dividing by 100,000 is equivalent to multiplying by 0.864, as there are 86400 seconds in a day. 180 * 0.864 gives 155.5 seconds, or just over 2 minutes and 30 seconds. Best regards, Niall
On Thu, May 13, 2021 at 04:57:21PM +0100, Niall O'Reilly wrote:
On 13 May 2021, at 16:19, Niall O'Reilly wrote:
I will share my arithmetic separately later.
So, as promised.
Time to recover five-nines availability after worst-case service-recovery delay: An hour’s outage must be matched by a minimum of 99,999 hours uninterrupted availability.
I'm concerned the 99.99% and 99.999% numbers splayed in the blog post are very optimistic and unattainable. Even if a cloud provider offers such availability, at this moment I'm under the impression that RIPE NCC is not in a position to even be able to observe whether 98%, 99%, or some other level of service availability is achieved. The types of outages that the RPKI service has seen in the last year appear to be the result of human error, flaws in the application design, and exacerbated by a lack of monitoring. The below is a small selection of outages in the last year. Apr 1st, 2020 - "A subset of hosted RPKI ROAs were deleted (hours)" [1] Apr 6th, 2020 - "The rsync server was down (hours)" [2] Dec 16th, 2020 - "A subset of hosted RPKI ROAs were deleted (hours)" [3] Feb 15th, 2021 - "the publication server stopped working until it was rebooted (hours)" [4] ONGOING, 2021 - "RSYNC clients periodically are fed inconsistent data by RIPE RSYNC server" [5] The above list suggests general availability currently is less than 99.99%. Also the above don't appear of the class of issues 'the cloud' solves. The scaling requirements don't appear to exceeded beyond what a modern medium sized gigabit-connected SSD-backed server can muster. In this posting "Improving operations at RIPE NCC TA" https://www.ripe.net/ripe/mail/archives/routing-wg/2021-February/004237.html I suggested that RIPE NCC should make a dashboard available to the public that shows all metrics and aspects of the RPKI service. There are probably as many metrics as there are line items on a cloud invoice. :-) My suggestion would be to first make service availability statistics available, before migrating to the cloud. This way both the RIPE membership and RIPE NCC staff can easily compare 'before' and 'after'. The community would benefit from better insight into how RIPE NCC themselves appear to think things are going. In the meantime, I recommend solving the ongoing incconsistent publication problem affecting RSYNC clients - before moving to the cloud. The cloud does not solve flawed application designs. Deploying a new application and at the same time moving into the cloud is akin to making multiple unrelated changes at the same time. The blog post suggests a second cloud provider will not be part of 'phase 1' of moving into the cloud. Despite repeated pleas from the community to RIPE NCC to please come up with some kind of quick-fix/workaround for the RSYNC issue, RIPE NCC has been unable to come up with any form of relief or issue masking in the short term. The lack of a clear plan for a second provider, the lack of agility to mitigate ongoing service problems in the short term, and the apparent lacking monitoring make me question the current strategy. Are the right priorities set? What KPIs and goals does RIPE NCC set for themselves? I look forward to today's updates in services-wg. Kind regards, Job [1]: https://www.ripe.net/support/service-announcements/accidental-roa-deletion [2]: https://www.ripe.net/support/service-announcements/rsync-rpki-repository-dow... [3]: https://www.ripe.net/support/service-announcements/rpki-roas-deleted-for-som... [4]: https://www.ripe.net/support/service-announcements/delay-publishing-rpki-obj... [5]: no service announcement (?): https://www.ripe.net/ripe/mail/archives/routing-wg/2021-April/004297.html
Friends, On 10/05/2021 13.40, Alun Davies wrote:
The mission critical services the RIPE NCC provides to the Internet community require a solid technical foundation. In this new article on RIPE Labs, Felipe Silveira looks at plans to use cloud infrastructure as a means to that end. The full article is available here:
https://labs.ripe.net/author/felipe_victolla_silveira/rpki-repositories-and-...
I am unable to attend the NCC Services Working Group session at RIPE 82, so I thought that I would say something here. My main concern with moving RPKI repositories and the RIPE Database to the cloud is with the choice of AWS as provider, basically because Amazon is a US-based company. We know that tech companies in the US have handed over data to the US government - sometimes without a warrant, sometimes with. We know that the US law has provisions for secret subpoenas, where a service provider cannot reveal that subpoenas were issued. Using any US-based cloud provider means basically hoping that none of the data that RIPE puts there or the meta-data derived from usage of the service is interesting for any part of the US government. I know all of the big cloud providers are US-based, except for Alibaba Cloud. I would not feel a lot safer with a Chinese-based cloud provider for RIPE data and associated services. Not using one of the big cloud providers means going with smaller cloud providers. I think that's probably fine - the RIPE NCC's requirements are surely quite small, and can surely be met by at least two cloud vendors in Europe. I realize that using a European vendor might not be especially comforting for people outside of the EU sphere of influence. I don't think this can be completely resolved, although since the RIPE NCC is already a Dutch-based member association it should not add much extra legal or technical risk. A separate concern is with vendor lock-in. If the RIPE NCC really deploys their stuff to multiple cloud providers, then this won't be a problem, but the very real, seemingly firm choice of AWS and the hand-waving about what a second provider might look like doesn't fill me with confidence. My own suggestion would be to not use a second provider as a back-up but to run two cloud providers at all times (not necessarily with an equal split of load though). I wasn't sure whether I should bother sending this mail, because I worry that this effort is being run like a Dutch government project. That means that people are fully informed, their opinions are listened to, and then the project proceeds exactly as the government planned without change. 😉 Hopefully that is not the case here. Cheers, -- Shane
while i share your wariness of aws, it is for vendor lock-in, not
My main concern with moving RPKI repositories and the RIPE Database to the cloud is with the choice of AWS as provider, basically because Amazon is a US-based company. We know that tech companies in the US have handed over data to the US government - sometimes without a warrant, sometimes with. We know that the US law has provisions for secret subpoenas...
i don't take this as a credible threat; though i assume well thought out encryption, both at rest and in transit, and serious key hygiene by the ncc. but i would be inclined to a multi-cloud approach, providing not only redundancy, but also forcing avoidance of vendor lock-in.
I wasn't sure whether I should bother sending this mail, because I worry that this effort is being run like a Dutch government project. That means that people are fully informed, their opinions are listened to, and then the project proceeds exactly as the government planned without change.
it does have that aroma. but, being an engineer, i will judge by results. randy
On 2021-05-10, at 13:40:07, Alun Davies wrote:
Dear colleagues,
The mission critical services the RIPE NCC provides to the Internet community require a solid technical foundation. In this new article on RIPE Labs, Felipe Silveira looks at plans to use cloud infrastructure as a means to that end. The full article is available here:
https://labs.ripe.net/author/felipe_victolla_silveira/rpki-repositories-and-...
Kind regards,
Alun Davies RIPE Labs Editor RIPE NCC
Dear RIPE NCC Services team, In the context of GDPR, what is RIPE NCC's legal basis as a data controller to export PII to inadequate third countries? What supplementary measures are used [1]? Has a review according to GDPR Art 35 been performed, and will the result of this also be shared? Best regards, -- Martin Millnert [1] https://edpb.europa.eu/sites/default/files/consultation/edpb_recommendations...
On 2021-05-10, at 13:40:07, Alun Davies wrote:
Dear colleagues,
The mission critical services the RIPE NCC provides to the Internet community require a solid technical foundation. In this new article on RIPE Labs, Felipe Silveira looks at plans to use cloud infrastructure as a means to that end. The full article is available here:
https://labs.ripe.net/author/felipe_victolla_silveira/rpki-repositories-and-...
Kind regards,
Alun Davies RIPE Labs Editor RIPE NCC
Hi Alun, I've read the article and I think one possible long-term risk has been overlooked: loss of skill/proficiency. I would like to point to a very interesting and eye-opening presentation by Bert Hubert about this very topic: https://www.youtube.com/watch?v=PQccNdwm8Tw (transcript: https://berthub.eu/articles/posts/how-tech-loses-out/ )
participants (11)
-
Alun Davies
-
Bjoern Buerger
-
Felipe Victolla Silveira
-
Gert Doering
-
Job Snijders
-
Kurt Kayser
-
Martin Millnert
-
Michiel Klaver
-
Niall O'Reilly
-
Randy Bush
-
Shane Kerr