testing DNS flag day compatibility
Hello everyone, this is follow-up from RIPE 77 hallway discussion, sorry for delay. We are looking for ways to test DNS flag day [1] compatibility from client networks. Objective is to test hypothesis that most breakage happens on authoritative side of DNS. In other words, we would like to test that DNS recursive infrastructure and client networks do not significantly influence compatibility. That would help to provide precise information for network operators who will have to deal with DNS flag day. Problem here is that RIPE Atlas does not allow to send all types of queries [2] required for full test. It was discussed at length that Atlas team has its reasons for not sending random blobs to random IP addresses, which is understood. Question here is: Can we find a middle ground to allow greater variety of valid DNS queries without forcing Atlas team to reimplement everything? My notes from meeting mention two approaches for further dicussion: a) User provides command line arguments for well-known tool dig, which gets executed in controlled environment ("as part of RIPE Atlas infrastructure") and generates query packet/blob. This blob generated by dig is then used as payload so use cannot ship anything but syntactically valid DNS packet. b) User provides blob for payload, which is then analyzed by packet parser of choice (BIND/ldns/Knot DNS/all of them). The payload can be sent out only if packet parsers do not find out any problem/blob is syntactically valid. These two approaches can also be combined to guard again quirks in either component. c) <propose your own here> What do you think? Is there a way to allow greater flexibility to Atlas DNS? [1] https://dnsflagday.net/ [2] https://gitlab.isc.org/isc-projects/DNS-Compliance-Testing/blob/master/genre... -- Petr Špaček @ CZ.NIC
Hello, I think there should be specified, which tests/options are really *necesary* for this compability testing related to the DNS flag day.
From operator perspective, you just need to know, if your implementation will have problem or if it's OK... and I think many details reported by [2] will not be even understood by normal users.
From a quick look, you're missing ability to set some bits (flags) and other options in query packet. Majority of tests in linked source code are using SOA, some other common types in query, which are already included in options available, some aren't - but they're quite exotic query types and probably not widely used - so are these really needed?
I don't think allowing "simply" anything (as you're proposing in [a] or [b] below) is a good apporach. Some options (ignoretc, for example) will not be even understood by current `dig` implementations, that's another problem. And there's always some risk of malicious use and "open" Atlas network may be misused. So I prefer to stay restrictive in terms of queries allowed over Atlas network. Daniel On 12/17/18 6:40 PM, Petr Špaček wrote:
Hello everyone,
this is follow-up from RIPE 77 hallway discussion, sorry for delay.
We are looking for ways to test DNS flag day [1] compatibility from client networks. Objective is to test hypothesis that most breakage happens on authoritative side of DNS. In other words, we would like to test that DNS recursive infrastructure and client networks do not significantly influence compatibility.
That would help to provide precise information for network operators who will have to deal with DNS flag day.
Problem here is that RIPE Atlas does not allow to send all types of queries [2] required for full test. It was discussed at length that Atlas team has its reasons for not sending random blobs to random IP addresses, which is understood.
Question here is: Can we find a middle ground to allow greater variety of valid DNS queries without forcing Atlas team to reimplement everything?
My notes from meeting mention two approaches for further dicussion:
a) User provides command line arguments for well-known tool dig, which gets executed in controlled environment ("as part of RIPE Atlas infrastructure") and generates query packet/blob. This blob generated by dig is then used as payload so use cannot ship anything but syntactically valid DNS packet.
b) User provides blob for payload, which is then analyzed by packet parser of choice (BIND/ldns/Knot DNS/all of them). The payload can be sent out only if packet parsers do not find out any problem/blob is syntactically valid.
These two approaches can also be combined to guard again quirks in either component.
c) <propose your own here>
What do you think? Is there a way to allow greater flexibility to Atlas DNS?
[1] https://dnsflagday.net/ [2] https://gitlab.isc.org/isc-projects/DNS-Compliance-Testing/blob/master/genre...
Hello Daniel and others, On 18. 12. 18 14:09, Daniel Suchy wrote:
Hello, I think there should be specified, which tests/options are really *necesary* for this compability testing related to the DNS flag day. From operator perspective, you just need to know, if your implementation will have problem or if it's OK... and I think many details reported by [2] will not be even understood by normal users.
From a quick look, you're missing ability to set some bits (flags) and other options in query packet. Majority of tests in linked source code are using SOA, some other common types in query, which are already included in options available, some aren't - but they're quite exotic query types and probably not widely used - so are these really needed?
I don't think allowing "simply" anything (as you're proposing in [a] or [b] below) is a good apporach. Some options (ignoretc, for example) will not be even understood by current `dig` implementations, that's another problem. And there's always some risk of malicious use and "open" Atlas network may be misused. So I prefer to stay restrictive in terms of queries allowed over Atlas network.
I remember from RIPE 77 meeting that there are strong opinions on limiting what can be done and that there are reasons for that. Purpose of my e-mail is to find out if there is a middle ground. Does your answer mean "it is not going to happen, go away" or is there a room for negotiation? I can provide detailed argumentation if you are willing to negotiate. Petr Špaček @ CZ.NIC
Daniel
On 12/17/18 6:40 PM, Petr Špaček wrote:
Hello everyone,
this is follow-up from RIPE 77 hallway discussion, sorry for delay.
We are looking for ways to test DNS flag day [1] compatibility from client networks. Objective is to test hypothesis that most breakage happens on authoritative side of DNS. In other words, we would like to test that DNS recursive infrastructure and client networks do not significantly influence compatibility.
That would help to provide precise information for network operators who will have to deal with DNS flag day.
Problem here is that RIPE Atlas does not allow to send all types of queries [2] required for full test. It was discussed at length that Atlas team has its reasons for not sending random blobs to random IP addresses, which is understood.
Question here is: Can we find a middle ground to allow greater variety of valid DNS queries without forcing Atlas team to reimplement everything?
My notes from meeting mention two approaches for further dicussion:
a) User provides command line arguments for well-known tool dig, which gets executed in controlled environment ("as part of RIPE Atlas infrastructure") and generates query packet/blob. This blob generated by dig is then used as payload so use cannot ship anything but syntactically valid DNS packet.
b) User provides blob for payload, which is then analyzed by packet parser of choice (BIND/ldns/Knot DNS/all of them). The payload can be sent out only if packet parsers do not find out any problem/blob is syntactically valid.
These two approaches can also be combined to guard again quirks in either component.
c) <propose your own here>
What do you think? Is there a way to allow greater flexibility to Atlas DNS?
[1] https://dnsflagday.net/ [2] https://gitlab.isc.org/isc-projects/DNS-Compliance-Testing/blob/master/genre...
Hello, On 12/19/18 10:33 AM, Petr Špaček wrote:
On 18. 12. 18 14:09, Daniel Suchy wrote: I remember from RIPE 77 meeting that there are strong opinions on limiting what can be done and that there are reasons for that. Purpose of my e-mail is to find out if there is a middle ground.
Does your answer mean "it is not going to happen, go away" or is there a room for negotiation?
In my previous email I tried ask you to more precisely specify, what tests are really *necesary* (important) for DNS flag-day compability testing. I'm missing this information from you :-) I think if you reduce (and explain) your needs, there's space for discussion. In general, proposed test is useful in my oppinion - but you're asking for more than you really need for that purpose, I think. With regards, Daniel
Hello once again, I'm glad you are willing to consider it. Here we go: On 18. 12. 18 14:09, Daniel Suchy wrote:
Hello, I think there should be specified, which tests/options are really *necesary* for this compability testing related to the DNS flag day. From operator perspective, you just need to know, if your implementation will have problem or if it's OK... and I think many details reported by [2] will not be even understood by normal users.
Let me clarify that [2] is low-level tool with many tests, and all of them are used for DNS flag day testing (see below). Normal users are supposed to use form [1] which does post-processing of results from [2] and transforms it into green/yellow/orange/red signal with more human-friendly description. The important difference here is that some test in [2] have non-binary results whilst DNS flag day [1] is concerned only with timeout/non-timeout result and ignores other details of individual tests. (In the end this distinction is not important from Atlas point of view because we either get the message back or not.) More technical details about DNS flag day 2019 ---------------------------------------------- In short, the DNS protocol specification does not allow the server to drop (i.e. not respond at all) queries based on options in them, so the tool attempts to test if something in DNS path is dropping queries in violation of DNS protocol or not. DNS client is free to set any flags or add arbitrary options and protocol defines what the other side should do if the flag/option is not understood. Thus, if the test passes you are safe with any version of DNS resolver (without regard to particular configuration). Different resolvers use different set of options by default, and also the set depends on configuration. E.g. latest versions of BIND send DNS cookie option by default and it of course breaks queries to some subset of servers, which will be subject of DNS flag day 2019 (among others). Finally to your question: Is it really needed? ----------------------------------------------
From a quick look, you're missing ability to set some bits (flags) and other options in query packet. Majority of tests in linked source code are using SOA, some other common types in query, which are already included in options available, some aren't - but they're quite exotic query types and probably not widely used - so are these really needed?
From this example I conclude that anyone who can buy own domain (for
For DNS flag day 2019 in particular we are interested only in queries listed at [2] and tagged with constant EDNS, sorry for not making it clear at the beginning. Of course future DNS flag days will have different requirements ... see below. More generically, as mentioned above, the purpose of test is to answer "will this network work with any standard-compliant resolver" and to answer this we need to test full spectrum. Using only subset of tests would answer sub-questions like: - will this network work with BIND 9.10 - will this network work with BIND 9.11 with cookies disabled but will not answer other sub-questions like: - BIND 9.14 in default configuration (no workarounds for non-compliance) etc. Please note that BIND is just an example, the test matrix is in fact: V vendors * N versions * O options in each implementation, i.e. huge matrix and reducing it to minimal set is not feasible as it changes with each release. Generalization - why are we talking about it at all? ---------------------------------------------------- Having said all that, I now realized this e-mail should have different subject - it is *also* about looking forward *beyond* DNS flag day 2019 itself. Robert Kisteleki made clear during RIPE 77 that we are not going to get anything for DNS flag day 2019 because Atlas planning cycle does not allow to get more features in. That's understood and purpose of this excercise is to find out if there are safe ways to make Atlas more useful for future DNS flag days (and other uses, of course), because in fact we are already too late for (hypotetical) DNS flag day 2020! Problem description by example: 1. Imagine that there will be e.g. a DNS flag day 2020, 2021, 2022, etc. 2. DNS flag day is announced roughly a year before it happens to give operators room for preparation. 3. We test authoritative sides using our own tools like [2] and scripts around it [3]. It basically implements "DNS query carpet bombing" to ~ 23 milion domains. 4. Uncertainity which is left is question of compatibility problems in *client* networks - that's why we are looking at Atlas. / end of introduction / 5. Current state of things forces people who write DNS clients to specify what kind of queries we want to do for DNS flag day 2020 much much earlier than necessary for other purposes so it can get included in Atlas planning cycle. (In fact we would be already late if we wanted to do experiments now and announce it in February 2019, i.e. a year ahead). 6. Naturally if we found out that also a different type of queries is needed (which always happens once you start experimenting) it is either too late to repeat the full cycle, or we have to do experiments years before DNS flag day itself. Such a big delay does not reflect pace of DNS ecosystem development, i.e. is good only for measurement after the fact instead of being usable as precaution/data gathering before the event. In other words we have to hope for the best and let operators to find out what the problem is because there is no way to measure it beforehand (again, in client networks). I hope it illustrates why this limitations and problems steming from them. Proposal -------- Proposal is to allow Atlas user to input wider variety of DNS messages in some form, and do validation on them before sending user-provided DNS message out. This can be done in multiple ways and it up to discussion which way gives reasonable assurance the client query will not cause problem. Assessing impact ---------------- While assessing impact of this proposal we should take into account current state of things. Even the current ability to send out simple A query for user-provided name can trigger wide variety of bugs, including security/denial-of-service bugs in DNS resolvers used by client networks. One example for all is https://doc.powerdns.com/recursor/security-advisories/powerdns-advisory-2017... (not picking on this particular implementation!) An attacker who controls single authoritative server can trigger this bug by sending plain A query from current Atlas to DNS resolver "under attack". Effectivelly all resolvers have had similar bugs in the past, it is certainly not one-off. like ~ 6 USD/year) can mount this attack using current Atlas API, today. In my opinion, an implementation which takes user-provided DNS message and checks it using 3 independent parsers compiled with Valgrind/ASAN (e.g. BIND, Unbound/ldns, Knot DNS, or any other) provides roughly the same level of (in)security as current limited set of options. I hope this clarifies the case. Where do we go from here? [1] https://dnsflagday.net/ [2] https://gitlab.isc.org/isc-projects/DNS-Compliance-Testing/blob/master/genre... [3] https://gitlab.labs.nic.cz/knot/edns-zone-scanner/ Petr Špaček @ CZ.NIC On 19. 12. 18 11:29, Daniel Suchy wrote:
Hello,
On 12/19/18 10:33 AM, Petr Špaček wrote:
On 18. 12. 18 14:09, Daniel Suchy wrote: I remember from RIPE 77 meeting that there are strong opinions on limiting what can be done and that there are reasons for that. Purpose of my e-mail is to find out if there is a middle ground.
Does your answer mean "it is not going to happen, go away" or is there a room for negotiation?
In my previous email I tried ask you to more precisely specify, what tests are really *necesary* (important) for DNS flag-day compability testing. I'm missing this information from you :-)
I think if you reduce (and explain) your needs, there's space for discussion. In general, proposed test is useful in my oppinion - but you're asking for more than you really need for that purpose, I think.
With regards, Daniel
participants (2)
-
Daniel Suchy
-
Petr Špaček