Re: [dns-wg] Draft of RIPE DNS Resolver Best Common Practices

5 Dec 2023

      Shane and other TF members,

Great document!

I have some comments about the anycast bit :)
...
On Nov 26, 2023, at 12:01 PM, Shane Kerr <shane@time-travellers.org> wrote:
Colleagues,
Here is a draft of the RIPE DNS Resolver Best Common Practices document
that the task force of that name has been working on.
[..]
...
#### Anycasting
**Anycasting may be considered**
Anycasting means routing the same IP prefix to more than one location.
Anycast isn’t just for multiple locations, local scope anycast is a thing, typically used in support of high availability of the service, providing low cost load balancing and failover.

There are other ways to achieve the same goal, e.g. dedicated load balancers, other network level failover, etc.
...
As mentioned above for addressing, client support for multiple
addresses is not always good;
This is key, stub failover to a second resolver address can be painful, adding resilience to the primary resolver address is often worthwhile.
...
with anycasting you can use a single IP
address and have redundancy from different sites. This will often
allow you to place sites close to the user - although it is tricky to
get optimal routing with BGP.
For a resolver service with a single site there is no benefit. For a
resolver service with multiple sites, it may be better to configure
clients with different IP addresses rather than use anycasting.
[RFC7094](https://www.rfc-editor.org/rfc/rfc7094.html) discusses
anycast in detail, including references to various other RFC which
discuss anycasting in general and to DNS in particular.
If a separate prefix is to be used for anycasting, usually this means
a /24 in IPv4 and a /48 in IPv6, as those are the smallest sizes that
will be widely propagated in BGP. A common practice is to use a
covering prefix (/23 in IPv4 or /47 in IPv6) for fallback, and a
more-specific prefix (/24 or /48) for the traffic. The more-specific
prefix can then be withdrawn to send traffic to a backup site; this
will happen automatically if the site is disconnected from routing.
Perhaps this section might better deal with high availability in general, where anycast (both global and local scope) are suggested, along with examples of other techniques.

Here’s some possible alternative text:

#### High Availability

This can be considered in terms of local and global scope.

Local scope

Inside a single location/region, such as an office, campus, or small ISP
network, the main availability concern is that a resolver is always reachable.
Client systems can be configured with multiple resolver addresses, but the
failover behaviour of stub resolvers to a second address can be painful.
Ideally the primary address is highly available and such fallback rarely
required. How much effort is put into ensuring this is true should probably
scale in line with the number of users, or sensitivity of the clients using
that resolver to delayed resolution.

There are several ways to promote high availability of an individual resolver
address, such as dedicated load balancing equipment, or network techniques
like VRRP, or IP anycast. These generally have in common a pool of recursive
servers and the means to direct queries to them when a health check has
determined them to be capable of answering those queries.

Dedicated free or commercially produced, hardware or software load balancing
solutions are available. These typically own the resolver IP address and
forward queries to the currently available instances of a pool of recursive
servers.

VRRP enables a technique to make the resolver IP address available on
multiple servers, often used to provide automatic failover between two.
A pool of recursive servers using this technique must reside in the same
broadcast domain.

IP anycast in the local scope typically involves a pool of recursive servers
advertising a route to a shared resolver IP address into a routing protocol.
This can be configured in failover or load-sharing configurations. A load
sharing configuration typically requires network equipment able to balance
traffic to a destination over equal cost paths (ECMP). A pool of recursive
servers using this technique can be distributed in different parts of the
network.

Global scope

The same concerns as for local service availability are present in the global
scope, with the added issue that DNS resolution over long distances may be
slow. Practically speaking, only multiple resolver addresses, or IP anycast
are useful strategies here. The motivations for finding better failover
solutions than multiple resolver addresses have been covered above.

IP anycast in the global scope means routing the same IP prefix to more than
one location. This can provide effective solutions for failover and, when
optimally configured for routing client queries to the topologically least
distant recursive server location. IP anycast in the global scope requires
the use of globally routable prefixes. If a separate prefix is to be used for
anycasting, usually this means a /24 in IPv4 and a /48 in IPv6, as those are
the smallest sizes that will be widely propagated in BGP. A common practice
is to use a covering prefix (/23 in IPv4 or /47 in IPv6) for fallback, and a
more-specific prefix (/24 or /48) for the traffic. The more-specific prefix
can then be withdrawn to send traffic to a backup site; this will happen
automatically if the site is disconnected from routing.

[RFC7094](https://www.rfc-editor.org/rfc/rfc7094.html) discusses anycast
architecture in detail, including references to various other RFC which
discuss anycast in general and to DNS in particular.

[RFC4786](https://datatracker.ietf.org/doc/html/rfc4786) discuses operation
of anycast services.

Generally

Operators of a globally scoped recursive service are encouraged to also adopt
the local scope recommendations in each of the locations where the service is
provisioned.

Though the above deals with the shortcomings of reliance on stub resolver
failover between a list of addresses those recommendations shouldn’t be seen
as an exclusive alternative. Multiple resolver addresses, where each is
provisioned using differing failover strategies, can provide a resolver of
last resort and further improved resilience.

dave

Re: [dns-wg] Draft of RIPE DNS Resolver Best Common Practices

Dave Knight