Shane and other TF members, Great document! I have some comments about the anycast bit :)
On Nov 26, 2023, at 12:01 PM, Shane Kerr <shane@time-travellers.org> wrote:
Colleagues,
Here is a draft of the RIPE DNS Resolver Best Common Practices document that the task force of that name has been working on.
[..]
#### Anycasting
**Anycasting may be considered**
Anycasting means routing the same IP prefix to more than one location.
Anycast isn’t just for multiple locations, local scope anycast is a thing, typically used in support of high availability of the service, providing low cost load balancing and failover. There are other ways to achieve the same goal, e.g. dedicated load balancers, other network level failover, etc.
As mentioned above for addressing, client support for multiple addresses is not always good;
This is key, stub failover to a second resolver address can be painful, adding resilience to the primary resolver address is often worthwhile.
with anycasting you can use a single IP address and have redundancy from different sites. This will often allow you to place sites close to the user - although it is tricky to get optimal routing with BGP.
For a resolver service with a single site there is no benefit. For a resolver service with multiple sites, it may be better to configure clients with different IP addresses rather than use anycasting.
[RFC7094](https://www.rfc-editor.org/rfc/rfc7094.html) discusses anycast in detail, including references to various other RFC which discuss anycasting in general and to DNS in particular.
If a separate prefix is to be used for anycasting, usually this means a /24 in IPv4 and a /48 in IPv6, as those are the smallest sizes that will be widely propagated in BGP. A common practice is to use a covering prefix (/23 in IPv4 or /47 in IPv6) for fallback, and a more-specific prefix (/24 or /48) for the traffic. The more-specific prefix can then be withdrawn to send traffic to a backup site; this will happen automatically if the site is disconnected from routing.
Perhaps this section might better deal with high availability in general, where anycast (both global and local scope) are suggested, along with examples of other techniques. Here’s some possible alternative text: #### High Availability This can be considered in terms of local and global scope. Local scope Inside a single location/region, such as an office, campus, or small ISP network, the main availability concern is that a resolver is always reachable. Client systems can be configured with multiple resolver addresses, but the failover behaviour of stub resolvers to a second address can be painful. Ideally the primary address is highly available and such fallback rarely required. How much effort is put into ensuring this is true should probably scale in line with the number of users, or sensitivity of the clients using that resolver to delayed resolution. There are several ways to promote high availability of an individual resolver address, such as dedicated load balancing equipment, or network techniques like VRRP, or IP anycast. These generally have in common a pool of recursive servers and the means to direct queries to them when a health check has determined them to be capable of answering those queries. Dedicated free or commercially produced, hardware or software load balancing solutions are available. These typically own the resolver IP address and forward queries to the currently available instances of a pool of recursive servers. VRRP enables a technique to make the resolver IP address available on multiple servers, often used to provide automatic failover between two. A pool of recursive servers using this technique must reside in the same broadcast domain. IP anycast in the local scope typically involves a pool of recursive servers advertising a route to a shared resolver IP address into a routing protocol. This can be configured in failover or load-sharing configurations. A load sharing configuration typically requires network equipment able to balance traffic to a destination over equal cost paths (ECMP). A pool of recursive servers using this technique can be distributed in different parts of the network. Global scope The same concerns as for local service availability are present in the global scope, with the added issue that DNS resolution over long distances may be slow. Practically speaking, only multiple resolver addresses, or IP anycast are useful strategies here. The motivations for finding better failover solutions than multiple resolver addresses have been covered above. IP anycast in the global scope means routing the same IP prefix to more than one location. This can provide effective solutions for failover and, when optimally configured for routing client queries to the topologically least distant recursive server location. IP anycast in the global scope requires the use of globally routable prefixes. If a separate prefix is to be used for anycasting, usually this means a /24 in IPv4 and a /48 in IPv6, as those are the smallest sizes that will be widely propagated in BGP. A common practice is to use a covering prefix (/23 in IPv4 or /47 in IPv6) for fallback, and a more-specific prefix (/24 or /48) for the traffic. The more-specific prefix can then be withdrawn to send traffic to a backup site; this will happen automatically if the site is disconnected from routing. [RFC7094](https://www.rfc-editor.org/rfc/rfc7094.html) discusses anycast architecture in detail, including references to various other RFC which discuss anycast in general and to DNS in particular. [RFC4786](https://datatracker.ietf.org/doc/html/rfc4786) discuses operation of anycast services. Generally Operators of a globally scoped recursive service are encouraged to also adopt the local scope recommendations in each of the locations where the service is provisioned. Though the above deals with the shortcomings of reliance on stub resolver failover between a list of addresses those recommendations shouldn’t be seen as an exclusive alternative. Multiple resolver addresses, where each is provisioned using differing failover strategies, can provide a resolver of last resort and further improved resilience. dave