My apologies. I missed the reference to our Technical Emergency Hotline:

[1] RIPE NCC Technical Emergency Hotline:

https://www.ripe.net/support/contact/technical-emergency-hotline


On 6 Apr 2020, at 16:19, Felipe Victolla Silveira <fvictolla@ripe.net> wrote:

Dear Danny and all,

Thank you for your email.

We understand the importance of RPKI for Internet operations and we are
taking recent outages very seriously.

We already have alerting systems in place that did not report the
deletions because deletion of ROAs is sometimes a normal and necessary
action. However, as Nathalie mentions in her post-mortem, we have
already taken steps to ensure our systems prevent this from happening again.

We are also carrying out a separate investigation on the impact this
outage had on networks in terms of hijacking and route leaks.

There is a 24/7 hotline[1] in place that people can use to report
outages outside of office hours. In this case, none of the people who
contacted us used this method to alert us.

In our Activity Plan and Budget 2020, we requested a significant budget
allocation for resiliency of RPKI in anticipation of increased global
demand and operational reliance on this system. Lessons learned from
these outages will be incorporated into the RPKI activity and we will
take all necessary steps to ensure the stability of the system.

Kind regards,

Felipe Victolla Silveira
Chief Operations Officer
RIPE NCC

On 3 Apr 2020, at 22:56, Danny McPherson <danny@tcb.net> wrote:


Agreed, thanks for this Nathalie.

Given the operational importance of RPKI now and each RIRs role therein can you say anything about what plans RIPE has to provide 24x7 monitoring / support for these services (i.e., beyond your current "office hours")?

I also look forward to [your] analysis of the Rostelecom incident that occurred in the same timeframe.

Thanks,


-danny



On 2020-04-03 08:55, Nathalie Trenaman wrote:
Dear colleagues,
After our accidental deletion of RPKI ROAs on Wednesday evening, we have
a post-mortem report to share with the working group.
Following an update to our internal registry software on 1 April at
18:16 (UTC+2), 2,669 ROAs were deleted from Provider Independent (PI)
address assignments.
This was caused by our registry software classifying these assignments
as not-certifiable. From our logs, we can confirm that these blocks
never left the RIPE Registry, and within 15 minutes the registry was
back to normal. However, by that time the ROAs had already been deleted
and could not be restored without intervention from our engineers.
Affected users with alerts set up in the LIR Portal received a
notification email on 31 March at 22:23, stating that their ROAs were
missing. Some of these users emailed our Customer Service Department to
ask why their ROAs had been deleted. As this was outside of office
hours, our staff did not discover the issue until the next morning.
Our engineers were able to reinstate all of the missing ROAs by 13:15 on
2 April. We then informed our membership via ncc-announce and notified
the affected users directly.
We have since implemented stricter checks on both our registry and RPKI
software.
We are also investigating whether any of these PI assignments suffered
from route-leaks or hijacks after their ROAs were deleted.
We apologise for any inconvenience this may have caused and we are
taking all necessary steps to ensure this does not happen again in the
future.
Kind regards,
Nathalie Trenaman
Routing Security Programme Manager
RIPE NCC