Dear Routing WG, thanks to Joachim who took notes and provided me with very elaborated draft minutes of our "Route Flap Dampening BOF" ! I did some additions and I'm now asking all participants to come back with comments. And I'd like to remind you to send me your pointers to similar discussions and related recommendations (IETF, NANOG, ...) since I wasn't able to attend other forums than RIPE yet. Regards Christian --- Christian Panigl : Vienna University Computer Center - ACOnet --- --- VUCC - ACOnet : -------------------------------------------- --- --- Universitaetsstrasse 7 : Mail: Panigl@CC.UniVie.ac.at (CP8-RIPE) --- --- A-1010 Vienna / Austria : Tel: +43 1 4065822-383 (Fax: -170) --- =================================================== Route Flap Dampening BOF, RIPE 26, 22.1.97, 14:00 Chairman: Christian Panigl (CP) Scribe: Joachim Schmitz (JS) Attendees: approx 30 In the Routing WG session Christian Panigl asked whether people are interested to participate in a BOF on route flap dampening. The BOF session was held after the plenary session of the RIPE meeting on Wednesday. CP experienced quite severe reachability problems of customer networks because route flap dampening became active at various AS borders following scheduled maintenance actions on a core router. If the default dampening parameters were used everywhere, it wouldn't have hurt that much, since dampening would have lasted for ~20-30 minutes only for all prefixes. Some backbone ISPs, however, have started to implement "progressive route flap dampening" typically using different parameters. The common effect is that longer prefixes are dampened more aggressively than shorter prefixes. In the observed case all /24 customer networks were cut off from parts of the Internet for more than 2 hours and were no longer able to reach for instance the root nameservers. By the way, many, even top- and second-level nameservers are sitting in /24 (192/TWD) prefixes themselves and could easily be "victims" of such a progressive dampening policy ! CP wasn't branding route flap dampening itself, but the aggressiveness of some of the implemented "progressive" parameters and was questioning the real usefulness of progressive dampening at all. Following CP's introduction a vivid discussion on route flap dampening came off: * Does flapping really depend on the prefix length? - To the knowledge of people attending the BOF session no measurements exist. Although several items were already measured by Merit on the stability of routes (as seen in the presentation by G.Winters in the Routing WG) they did not include a stability analysis with regard to the prefix length. If flapping does not necessarily depend on the prefix length longer prefixes should not be punished by more aggressive dampening. - However, the number of longer prefixes in the routing tables is much bigger than the number of shorter ones. As a consequence, if the percentage of flapping routes is the same for all prefix lengths the absolute number of flaps will be definitely higher for longer prefixes. As each flap consumes the same performance on the router (regardless of the prefix length) and to get the the best CPU saving factor, longer prefixes should be dampened more aggressively. - Further justification for the latter was primarily based on the assumption that longer prefixes are serving less users, which of course didn't stay uncontradicted (think of important servers sitting in a /24). * Which networks or prefixes are "important"? - Stating that shorter prefixes are more important because they cover more users doesn't hold in general. On the one hand this may be valuable and motivate ISPs to CIDRize and customers to renumber, on the other hand it may lead to the situation that organisations try to get (or keep, think of Class A/B recycling) as short a prefix as possible, wasting address space without having to care for stability. In this case instability would be moved to shorter prefixes which is far from desirable. - Long prefixes need not be instable. There are discussions to use long prefix routes ("golden networks") for root nameservers or for other Internet structure servers (even for application servers as news, etc). It can be well assumed that these routes are more stable than others and they must not be dampened too aggressively in order not to tackle the functionality of the Internet itself. During all the discussion the general consensus was clear: for routers with large BGP tables (notably with full routing) the CPU load would kill any existing router. To survive instabilities route flap dampening should be applied by *everybody*. However, it was obvious that dampening parameters need to be coordinated throughout the Internet in order to - allow efficient dampening and easy clearing after repair - dampen flaps at their source by keeping them from spreading in the network This will significantly increase the overall stability and the manageability. The broader (soft and default) dampening is deployed allover the Internet, the less the need for aggressive paramaters will be. The group was forming into two major camps with regard to how dampening should be done: - progressive dampening: needs to be accompanied by means to explicitely exclude "golden networks" from "hostility acts" - flat (default) dampening: because it's very hard to make a distinction between less and more "important", not to say "golden" networks, all prefixes should be treated equally. Efforts should be focussed on the propagation of dampening throughout the Internet. The default values for dampening parameters as they are found in Cisco routers are based upon some experiments approx one year ago. These experiments lead to recommendations by the IETF last year. Nevertheless, many ISPs have moved away from the default values and are using their own parameters. Because of the urgent need of coordination of these values CP will try to collect related recommendations and the outcome of similar discussions. This is an activity of the RIPE Routing WG, therefore everybody who is aware of related efforts (IETF, NANOG, ...) should come back to the Routing WG list with hints and pointers ! New Action 26.R4 on Chrisian Panigl To collect reasonable route flap dampening parameter values and to present them at the next RIPE meeting in the Routing WG. Further reading: ftp://ftp.ripe.net/ripe/minutes/ripe-m-24.ps ftp://ftp.ripe.net/ripe/minutes/ripe-m-25.ps http://www.ripe.net/wg/routing/r25-routing.html ftp://ftp.ripe.net/ripe/presentations/ripe-m25-tbarber-bgp-damp.html