Re: [dns-wg] [dns-operations] Additional information about the RIPE NCC reverse DNS issue

19 Mar 2017

      On 03/19/2017 08:09 AM, Shane Kerr wrote:
...
Doug,
At 2017-03-18 18:34:25 -0700
Doug Barton <dougb@dougbarton.email> wrote:
...
On 03/18/2017 08:46 AM, Anand Buddhdev wrote:
...
Dear colleagues,
This is a follow-up to our message of Friday about issues with some
reverse delegations.
After doing a thorough analysis, and with the help of ARIN staff, we
found more issues with our zonelet generation code
Can you say more about the benefit of this "zonelet" system vs. ARIN
simply delegating the appropriate zones to you, and you managing them
like any other DNS zone?
I do appreciate you keeping the community informed about the causes of
the outage, but it seems that at least part of the root cause is that
you're operating what sounds like a fairly fragile system in the first
place, with (de fact) insufficient validity checking.
I was at the RIPE NCC when we adopted the zonelet approach, although I
haven't worked for them for over a decade.
The zonelet system was designed to allow reverse DNS for IPv4 space
that was originally assigned to one RIR but was later partially migrated
to other RIRs. This happened when LACNIC and AfriNIC were formed,
although I think that an audit was done at the time and so space was
moved around between all 5 RIRs.
The problem is that we could have a delegation like 999.in-addr.arpa
going to the RIPE NCC and then 888.999.in-addr.arpa being managed by
ARIN... but want 888.999.in-addr.arpa to point to the **address
holder's** name servers, not **ARIN's** name servers.
So ARIN needs a way to get the information about the name servers to
the RIPE NCC somehow (and RIPE NCC to LACNIC, and so on). Zonelets are
used for this, which is basically just the NS records needed, probably
picked up using SSH.
I think that we discussed using dynamic DNS (DDNS) for this at the
time, but decided that the simplest & best solution was zonelets.
DNAME could be used, but it would involve an extra lookup for
resolvers, right? (DNAME was pretty new when zonelets were adopted, and
I don't know that BIND 8 supported them, which was still the most
popular DNS server at that time.)
My guess is that the bugs are probably more due to ancient Perl code
than an overly-complicated system for exchanging this information. Heck,
it's possible that the bugs are due to MY ancient Perl code, although I
really don't remember who wrote or tested the code....
Thank you, Shane for the explanation, which makes perfect sense.

RIPE folks, the operational answer to this problem would seem to be 
having ARIN implement a sanity check such that if more than N% of the 
information is changed in a given pass that humans need to get involved 
to approve the change. I had a lovely chat with John Curran about that 
on NANOG, which you can see starting here: 
https://mailman.nanog.org/pipermail/nanog/2017-March/090626.html

Short version, they won't do anything differently unless you 
specifically ask them to.

We all make mistakes, and I have no doubts that y'all have done your 
best to find/fix the bugs that created the most recent problem. But I've 
used similar sanity check systems in the past with good success. 
Everyone makes mistakes, and there is no shame to a "belt and braces" 
approach to critical infrastructure like this. I hope that you'll 
consider it.

Doug