Re: [atlas] testing DNS flag day compatibility

19 Dec 2018

      Hello once again,

I'm glad you are willing to consider it. Here we go:

On 18. 12. 18 14:09, Daniel Suchy wrote:
...
Hello,
I think there should be specified, which tests/options are really
*necesary* for this compability testing related to the DNS flag day.
From operator perspective, you just need to know, if your implementation
will have problem or if it's OK... and I think many details reported by
[2] will not be even understood by normal users.
Let me clarify that [2] is low-level tool with many tests, and all of
them are used for DNS flag day testing (see below).

Normal users are supposed to use form [1] which does post-processing of
results from [2] and transforms it into green/yellow/orange/red signal
with more human-friendly description.

The important difference here is that some test in [2] have non-binary
results whilst DNS flag day [1] is concerned only with
timeout/non-timeout result and ignores other details of individual
tests. (In the end this distinction is not important from Atlas point of
view because we either get the message back or not.)

More technical details about DNS flag day 2019
----------------------------------------------
In short, the DNS protocol specification does not allow the server to
drop (i.e. not respond at all) queries based on options in them, so the
tool attempts to test if something in DNS path is dropping queries in
violation of DNS protocol or not.

DNS client is free to set any flags or add arbitrary options and
protocol defines what the other side should do if the flag/option is not
understood. Thus, if the test passes you are safe with any version of
DNS resolver (without regard to particular configuration).

Different resolvers use different set of options by default, and also
the set depends on configuration. E.g. latest versions of BIND send DNS
cookie option by default and it of course breaks queries to some subset
of servers, which will be subject of DNS flag day 2019 (among others).

Finally to your question: Is it really needed?
----------------------------------------------
...
From a quick look, you're missing ability to set some bits (flags) and
other options in query packet. Majority of tests in linked source code
are using SOA, some other common types in query, which are already
included in options available, some aren't - but they're quite exotic
query types and probably not widely used - so are these really needed?
...
From this example I conclude that anyone who can buy own domain (for
For DNS flag day 2019 in particular we are interested only in queries
listed at [2] and tagged with constant EDNS, sorry for not making it
clear at the beginning. Of course future DNS flag days will have
different requirements ... see below.

More generically, as mentioned above, the purpose of test is to answer
"will this network work with any standard-compliant resolver" and to
answer this we need to test full spectrum.

Using only subset of tests would answer sub-questions like:
- will this network work with BIND 9.10
- will this network work with BIND 9.11 with cookies disabled
but will not answer other sub-questions like:
- BIND 9.14 in default configuration (no workarounds for non-compliance)
etc.

Please note that BIND is just an example, the test matrix is in fact:
V vendors * N versions * O options in each implementation, i.e. huge
matrix and reducing it to minimal set is not feasible as it changes with
each release.

Generalization - why are we talking about it at all?
----------------------------------------------------
Having said all that, I now realized this e-mail should have different
subject - it is *also* about looking forward *beyond* DNS flag day 2019
itself.

Robert Kisteleki made clear during RIPE 77 that we are not going to get
anything for DNS flag day 2019 because Atlas planning cycle does not
allow to get more features in.

That's understood and purpose of this excercise is to find out if there
are safe ways to make Atlas more useful for future DNS flag days (and
other uses, of course), because in fact we are already too late for
(hypotetical) DNS flag day 2020!

Problem description by example:
1. Imagine that there will be e.g. a DNS flag day 2020, 2021, 2022, etc.
2. DNS flag day is announced roughly a year before it happens to give
operators room for preparation.
3. We test authoritative sides using our own tools like [2] and scripts
around it [3]. It basically implements "DNS query carpet bombing" to
~ 23 milion domains.
4. Uncertainity which is left is question of compatibility problems in
*client* networks - that's why we are looking at Atlas.
/ end of introduction /
5. Current state of things forces people who write DNS clients to
specify what kind of queries we want to do for DNS flag day 2020 much
much earlier than necessary for other purposes so it can get included in
Atlas planning cycle. (In fact we would be already late if we wanted to
do experiments now and announce it in February 2019, i.e. a year ahead).
6. Naturally if we found out that also a different type of queries is
needed (which always happens once you start experimenting) it is either
too late to repeat the full cycle, or we have to do experiments years
before DNS flag day itself.

Such a big delay does not reflect pace of DNS ecosystem development,
i.e. is good only for measurement after the fact instead of being usable
as precaution/data gathering before the event. In other words we have to
hope for the best and let operators to find out what the problem is
because there is no way to measure it beforehand (again, in client
networks).

I hope it illustrates why this limitations and problems steming from them.

Proposal
--------
Proposal is to allow Atlas user to input wider variety of DNS messages
in some form, and do validation on them before sending user-provided DNS
message out.

This can be done in multiple ways and it up to discussion which way
gives reasonable assurance the client query will not cause problem.

Assessing impact
----------------
While assessing impact of this proposal we should take into account
current state of things. Even the current ability to send out simple A
query for user-provided name can trigger wide variety of bugs, including
security/denial-of-service bugs in DNS resolvers used by client networks.

One example for all is
https://doc.powerdns.com/recursor/security-advisories/powerdns-advisory-2017...
(not picking on this particular implementation!)

An attacker who controls single authoritative server can trigger this
bug by sending plain A query from current Atlas to DNS resolver "under
attack". Effectivelly all resolvers have had similar bugs in the past,
it is certainly not one-off.

like ~ 6 USD/year) can mount this attack using current Atlas API, today.

In my opinion, an implementation which takes user-provided DNS message
and checks it using 3 independent parsers compiled with Valgrind/ASAN
(e.g. BIND, Unbound/ldns, Knot DNS, or any other) provides roughly the
same level of (in)security as current limited set of options.

I hope this clarifies the case. Where do we go from here?

[1] https://dnsflagday.net/
[2]
https://gitlab.isc.org/isc-projects/DNS-Compliance-Testing/blob/master/genre...
[3] https://gitlab.labs.nic.cz/knot/edns-zone-scanner/

Petr Špaček  @  CZ.NIC

On 19. 12. 18 11:29, Daniel Suchy wrote:
...
Hello,
On 12/19/18 10:33 AM, Petr Špaček wrote:
...
On 18. 12. 18 14:09, Daniel Suchy wrote:
I remember from RIPE 77 meeting that there are strong opinions on
limiting what can be done and that there are reasons for that. Purpose
of my e-mail is to find out if there is a middle ground.
Does your answer mean "it is not going to happen, go away"
or is there a room for negotiation?
In my previous email I tried ask you to more precisely specify, what
tests are really *necesary* (important) for DNS flag-day compability
testing. I'm missing this information from you :-)
I think if you reduce (and explain) your needs, there's space for
discussion. In general, proposed test is useful in my oppinion - but
you're asking for more than you really need for that purpose, I think.
With regards,
Daniel