Dear members of the Routing-WG, here is the final draft version of the "Route Flap Dampening" paper as it came out of our task-force. Please, if possible, have a look at it before the WG meeting on Wednesday morning and bring your comments to the meeting and/or post them to the list. See you Christian --- ---------------------------------------------------------------------- --- --- Christian Panigl : Vienna University Computer Center - ACOnet --- --- VUCC - ACOnet - VIX : -------------------------------------------- --- --- Universitaetsstrasse 7 : Mail: Panigl@CC.UniVie.ac.at (CP8-RIPE) --- --- A-1010 Vienna / Austria : Tel: +43 1 4277-14032 (Fax: -9140) --- --- ---------------------------------------------------------------------- --- =============================================================================== RIPE Routing-WG Recommendation for coordinated route-flap dampening paramters Tony Barber Sean Doran Daniel Karrenberg Christian Panigl Joachim Schmitz Document status: DRAFT 1.2 22-SEP-1997 ABSTRACT This paper recommends a set of route-flap dampening parameters which should be applied by all ISPs in the Internet and should be deployed as new default values by BGP router vendors. Table of Contents ABSTRACT 1. Introduction 1.1 Motivation for route-flap dampening 1.2 What is route-flap dampening ? 1.3 "Progressive" versus "flat&gentle" approach 1.4 Motivation for coordinated parameters 2. Recommended dampening parameters 2.1 Motivation for recommendation 2.2 Description of recommended dampening parameters 2.3 Example configuration for Cisco IOS 2.4 No BGP fast-external-fallover (Cisco IOS) 2.5 Clear IP BGP soft (Cisco IOS) 3. Open problems 3.1 Multiplication of flaps through multiply interconnected ASes 3.2 Is dampening of customer route-flaps a good idea ? 4. References 1. Introduction In the Routing WG session of RIPE26 Christian Panigl asked whether people are interested to participate in a BOF on route flap dampening. The BOF session was held after the plenary session of RIPE26. The discussion was continued in the Routing WG session of RIPE27 and led to a task-force directed to write a proposal document for coordinated route-flap dampening parameters. 1.1 Motivation for route-flap dampening About 1993/94 the massive growth of the Internet with regard to the number of announced prefixes (often due to inadequate prefix-aggregation), multiple paths and instabilities started to do significant harm to the efficiency of the core routers of the Internet. Every single line-flap at the periphery which makes a routing prefix unreachable has to be advertised to the whole core Internet and has to be dealt by every single router by means of updates of the routing-table. To overcome this situation a route-flap dampening mechanism was "invented" in 1993 and has been integrated into several router code since 1995 (Cisco, ISI/RSd, GateD Consortium). It significantly helps now with keeping severe instabilities more local. And there's a second benfit: it's raising the awareness of the existence of instabilities because severe route/line-flapping problems lead to permanent suppression of the unstable area by means of holding down the flapping prefixes. 1.2 What is route-flap dampening ? Route-flap dampening is a mechanism for (BGP) routers which is aimed at improving the overall stability of the Internet routing table and offloading core-routers CPUs. When BGP route-flap dampening is enabled in a router the router starts to collect statistics about the announcement and withdrawal of prefixes. Route-flap dampening is governed by a set of parameters with vendor-supplied default values which may be modified by the router manager. The names, semantic and syntax of these parameters differ between the various implementations, however, the behaviour of the dampening mechanism is basically the same: If a threshold of the number of pairs of withdrawals/announcements (=flap) is exceeded in a given timeframe the prefix is held down for a calculated period (penalty) which is further incremented with every subsequent flap. The penalty is then decremented by using a half-life parameter whenever the prefix is visible until the penalty is below a reuse-threshold. Therefore, after beeing stable up for a certain period the hold-down is released from the prefix and it is re-used and re-advertised. Pointers to some more detailed and vendor specific documents: Cisco BGP Case Studies: Route Flap Dampening http://www.cisco.com/warp/public/459/16.html ISI/RSd Configuration: Route Flap Dampening http://www.isi.edu/div7/ra/RSd/doc/dampen.html GateD Configuration: Weighted Route Dampening Statement http://www.gated.org/new_web/code/doc/gated-uni/config_guide/wrd.html See also "4. References" 1.3 "Progressive" versus "flat&gentle" approach One easy approach would be to just apply the current default-parameters which are treating all prefixes equally ("flat&gentle") everywhere, however, there is a major concern to penalise longer prefixes (=smaller aggregates) more than well aggregated short prefixes ("progressive"), because the number of short prefixes in the routing table is significantly lower and it seems in general that those are tending to be much more stable. Another aspect is that progressive dampening might increase the awareness of aggregation needs, however, it has to be accompanied by a careful design which doesn't force a rush for higher than required IP address-range allocations. Because a significant number of important services is sitting in long prefixes (e.g. root nameservers) the progressive approach has to exclude the strong penalisation for those long but "golden" prefixes. With this recommendation we are trying to make a compromise and call it therefor "graded dampening". 1.4 Motivation for coordinated parameters There is a strong need for the coordinated use of dampening parameters because of several reasons: Coordination of "progressiveness": If the boundaries for different treatment of longer prefixes and the penalties are not coordinated throughout the Internet, route-flap dampening could even lead to additional flapping or temporary routing-loops because longer prefixes might already be re-announced through some parts of the Internet where shorter prefixes are still held down through other paths. Coordination of "aggressiveness": If an upstream or peering provider would be dampening more aggressively (e.g. triggered by less flaps or applying longer hold-down timers) than an access-provider towards his customers it will lead to a very inconsistent situation, where a flapping network might still be able to reach "near-line" parts of the Internet. Debugging of such instabilities is then much harder because the effect for the customer leads to the assumption that there is a problem "somewhere" in the "upstream" Internet instead of making him just call his ISPs hotline and complain that he can't get out any longer. Further, after successful repair of the problem the access-provider can easily clear the flap-dampening for his customer on his local router instead of needing to contact upstream NOCs all over the Internet to get the dampening cleared. 2. Recommended dampening parameters 2.1 Motivation for recommendation At RIPE26 and 27 Christian Panigl presented the following network backbone maintenance example from his own experience, which was triggering flap dampening in some upstream and peering ISPs routers for all his and his customers /24 prefixes for more than 3 hours because of too "aggressive" paramters: scheduled SW upgrade of backbone router failed: - reload after SW upgrade 1 flap - new SW crashed 1 flap - reload with old SW 1 flap ------ 3 flaps within 10 minutes which resulted in the following dampening scenario at some boundaries with progressive route-flap dampening enabled: Prefix length: /24 /19 /16 suppress time: ~3h 45-60' <30' Therefore, in the Routing-WG session at RIPE27, it was agreed that suppression should not start until the 4th flap in a row and that the maximum suppression should in no case last longer than 1 hour from the last flap. It was agreed that a recommendation from RIPE would be desirable. Given that the current allocation policies are expected to hold for the foreseeable future, it was suggested that all /19's or shorter prefixes are not penalised harder than current Cisco default dampening does. Those suggestions in mind Tony Barber designed the following set of route-flap dampening parameters which have prooved to work smoothly in his environment for a couple of months. 2.2 Description of recommended dampening parameters Basically the recommended values do the following with harsher treatment for /24 and longer prefixes: - don't start dampening before the 4th flap in a row - /24 and longer prefixes: max=min outage 60 minutes - /22 and /23 prefixes: max outage 45 minutes but potential for less because of half life value - minimum of 30 minutes outage - all else prefixes: max outage 30 minutes min outage 10 minutes 2.3 Example configuration for Cisco IOS ! Parameters are : ! set dampening <half life> <reuse-at> <supress-at> <max suppress time> ! There is a 1000 penalty for each flap ! Penalty decays at granularity of 5 seconds ! Unsuppressed at granularity of 10 seconds ! Dampening info kept until penalty becomes < half of reuse limit. ! router bgp 65500 !no bgp damp bgp damp route-map graded-flap-dampening ! ! don't dampen candidate default routes ! OPTIONAL (not part of recommendation) ! access-list 189 is the candidate default routes ! no route-map graded-flap-dampening deny 5 route-map graded-flap-dampening deny 5 match ip address 189 ! ! don't dampen root nameserver nets ! no route-map graded-flap-dampening deny 7 route-map graded-flap-dampening deny 7 match ip address 180 ! ! Heavy dampening of all networks which have a mask of ! /24 and above. These are supressed into a datastructure ! with a half life of 30 minutes, only re-use when reaches 750 ! Max outage of 60 minutes. ! no route-map graded-flap-dampening permit 10 route-map graded-flap-dampening permit 10 match ip address 181 set dampening 30 750 3000 60 ! ! dampen /23 /22 ! half life is now 15 minutes and reuse at 1000 ! no route-map graded-flap-dampening permit 20 route-map graded-flap-dampening permit 20 match ip address 182 set dampening 15 750 3000 45 ! ! default dampening on all less than /22 defaults to this ! different to CISCO defaults which are 15 750 2000 30 ! bgp dampening command ! no route-map graded-flap-dampening permit 40 route-map graded-flap-dampening permit 40 set dampening 10 1500 3000 30 ! !----------------------------------------------------------------------- ! ACCESS LISTS 180-189 GO BELOW !----------------------------------------------------------------------- ! access-lists 180 to 189 used or reserved for progressive route flap dampening ! ! 180 - BGP dampening - root-nameservers.net networks are NOT dampened ! This filter stops these networks being dampened. ! Also DONT dampen routes used to derive default (see list 7) ! but this is handled in a separate route-map statement. ! in the file dampening-confg. ! Route map uses DENY to drop out of map on matching. ! no access-list 180 ! ! A.ROOT-SERVERS.NET. access-list 180 permit ip 198.41.0.0 0.0.0.0 255.255.252.0 0.0.0.0 ! ! B.ROOT-SERVERS.NET. access-list 180 permit ip 128.9.0.0 0.0.0.0 255.255.0.0 0.0.0.0 ! ! C.ROOT-SERVERS.NET. access-list 180 permit ip 192.33.4.0 0.0.0.0 255.255.255.0 0.0.0.0 ! ! D.ROOT-SERVERS.NET. access-list 180 permit ip 128.8.0.0 0.0.0.0 255.255.0.0 0.0.0.0 ! ! E.ROOT-SERVERS.NET. access-list 180 permit ip 192.203.230.0 0.0.0.0 255.255.255.0 0.0.0.0 ! ! F.ROOT-SERVERS.NET. access-list 180 permit ip 192.5.4.0 0.0.0.0 255.255.254.0 0.0.0.0 ! ! G.ROOT-SERVERS.NET. access-list 180 permit ip 192.112.36.0 0.0.0.0 255.255.255.0 0.0.0.0 ! ! H.ROOT-SERVERS.NET. access-list 180 permit ip 128.63.0.0 0.0.0.0 255.255.0.0 0.0.0.0 ! ! I.ROOT-SERVERS.NET. access-list 180 permit ip 192.36.148.0 0.0.0.0 255.255.255.0 0.0.0.0 ! ! J.ROOT-SERVERS.NET. 198.41.0.10 same net as A ! ! K.ROOT-SERVERS.NET. access-list 180 permit ip 193.0.14.0 0.0.0.0 255.255.255.0 0.0.0.0 ! ! L.ROOT-SERVERS.NET. 198.32.64.12 access-list 180 permit ip 198.32.64.0 0.0.0.255 255.255.255.0 0.0.0.255 ! ! M.ROOT-SERVERS.NET. 198.32.65.12 access-list 180 permit ip 198.32.65.0 0.0.0.255 255.255.255.0 0.0.0.255 ! ! ! - 181 - dampens /24 and greater prefixes ! no access-list 181 ! access-list 181 permit ip 0.0.0.0 255.255.255.255 255.255.255.0 0.0.0.255 access-list 181 deny ip 0.0.0.0 255.255.255.255 0.0.0.0 255.255.255.255 ! ! - 182 - dampens /23 /22 and above ! no access-list 182 ! access-list 182 permit ip 0.0.0.0 255.255.255.255 255.255.252.0 0.0.3.255 access-list 182 deny ip 0.0.0.0 255.255.255.255 0.0.0.0 255.255.255.255 ! ! - 189 - Candidate default networks used in some customer bgp implementations ! no access-list 189 ! access-list 189 permit ip !!! put your defaults in here access-list 189 deny ip any any ! 2.4 No BGP fast-external-fallover (Cisco IOS) In Cisco IOS there is a BGP configuration parameter "fast-external-fallover" which when on (default) leads to an immediate clearing of a BGP neighbor whenever the line-protocol to this external neighbor goes down. If it is turned off the BGP sessions will survive short line-flaps as they will use the longer BGP keepalive/hold timers (default 60/180 seconds). The drawback of turning it off is that the switchover to an alternative path will take longer. It is recommendet to turn off fast-external-fallover: ! router bgp 65501 no bgp fast-external-fallover ! 2.5 Clear IP BGP soft (Cisco IOS) There is a new "soft" mechanism for the clearing of BGP sessions available with Cisco IOS. For beeing able to make use of the "clear ip bgp x.x.x.x soft" command both sides must support it and need to be configured to accept it from the neighbor: ! router bgp 65501 neighbor 10.0.0.2 remote-as 65502 neighbor 10.0.0.2 soft-reconfiguration inbound ! Without the keyword "soft" a "clear ip bgp x.x.x.x" will always withdraw all announced prefixes from/to neighbor x.x.x.x and re-advertise them (= route-flap for all prefixes which are available before and after the clear). With "clear ip bgp x.x.x.x soft" only those prefixes will be withdrawn which are no longer available with the new update and only those prefixes will be re-advertised which haven't been known before (no route-flap). 3. Open problems 3.1 Multiplication of flaps through multiply interconnected ASes Christian Panigl recently made the following experience with a line upgrade of an Ebone customer: - It is absolutely positive that through the upgrade process just ONE flap was generated (disconnect router-port from modem A reconnect to modem B), nevertheless the customers prefix was dampened in all ICM routers (ICM/AS1800 is US upstream for Ebone). - The flap statistics in the ICM routers stated *4* flaps !!! - The only explanation would be that the multiple interconnections between Ebone/AS1755 and ICM/AS1800 did multiply the flaps (advertisements/withdrawals arrived time-shifted at ICM routers through the multiple paths). - This would then potentially hold true for any meshed topology because of the propagation delays of advertisements/withdrawals. - Workaround for scheduled actions like with the given example: Schedule a downtime for at least 3-5 minutes which should be enough for the prefix withdrawals to have propagated through all paths before reconnection and re-advertisement of the prefix. Avoid clearing BGP sessions as this is usually generating a 30" outage which might easily give the same result. - A solution has to be provided by the vendors ! 3.2 Is dampening of customer route-flaps a good idea ? As already explained in section 1.3 flap-dampening is at its best value and most consistent and helpful if applied as near to the source of the problem as possible. Therefore flap-dampening should not only be applied at peering boundaries but even more at customer boundaries ! 4. References RIPE/Routing-WG Minutes dealing with Route Flap Dampening: ftp://ftp.ripe.net/ripe/minutes/ripe-m-24.ps ftp://ftp.ripe.net/ripe/minutes/ripe-m-25.ps http://www.ripe.net/wg/routing/r25-routing.html http://www.ripe.net/wg/routing/r26-routing.html http://www.ripe.net/wg/routing/r27-routing.html Curtis Villamizar, ANS: Controlling BGP/IDRP Routing Overhead http://figaro.ans.net/route-dampen/ NANOG-Feb-1995 Route Flap Dampening Presentation (slides): ftp://engr.ans.net/pub/papers/slides/nanog-feb-1995-route-dampen.ps Merit/IPMA: Internet Routing Recommendations http://www.merit.edu/~ipma/docs/help.html Cisco BGP Case Studies: Route Flap Dampening http://www.cisco.com/warp/public/459/16.html ISI/RSd Configuration: Route Flap Dampening http://www.isi.edu/div7/ra/RSd/doc/dampen.html GateD Configuration: Weighted Route Dampening Statement http://www.gated.org/new_web/code/doc/gated-uni/config_guide/wrd.html
Hi Christian,
1.2 What is route-flap dampening ?
If a threshold of the number of pairs of withdrawals/announcements (=flap) is exceeded in a given timeframe the prefix is held down for a
scheduled SW upgrade of backbone router failed:
- reload after SW upgrade 1 flap - new SW crashed 1 flap - reload with old SW 1 flap ------ 3 flaps within 10 minutes
You regard a withdrawal + announcement as 1 flap, right? Look at this: router-a#sh ip bgp 195.206.64.168 BGP routing table entry for 195.206.64.168/30, version 6 Paths: (1 available, best #1) 65001 65002 192.168.1.1 from 192.168.1.1 Origin IGP, valid, external, best Now the BGP session between AS65001 and AS65002 is reset... router-a#sh ip bgp 195.206.64.168 BGP routing table entry for 195.206.64.168/30, version 7 Paths: (1 available, no best path) 65001 65002 (history entry) 192.168.1.1 from 192.168.1.1 Origin IGP, external Dampinfo: penalty 1000, flapped 1 times in 00:00:00 ...the route 195.206.64.168/30 becomes a history entry. The BGP session comes up again... router-a#sh ip bgp 195.206.64.168 BGP routing table entry for 195.206.64.168/30, version 8 Paths: (1 available, best #1) 65001 65002 192.168.1.1 from 192.168.1.1 Origin IGP, valid, external, best Dampinfo: penalty 1480, flapped 2 times in 00:00:32 ...and the route is back, but flapped 2 times! The reset of the BGP session could be caused by a reload of the AS65002 router and results in 2 flaps. It seems that the cisco definition of 'one flap' is different from the one in your paper, or am I overlooking something? (btw, it's not clear to me why the penalty is 1480 already after 32 seconds) --- Thomas
In message <009BAB0A.A37D7B9E.15@cc.univie.ac.at>, "Christian Panigl, ACOnet/Un iVie" writes:
Curtis Villamizar, ANS: Controlling BGP/IDRP Routing Overhead http://figaro.ans.net/route-dampen/
^^^^^^ engr.ans.net (its a CNAME but engr.ans.net will continue to work when we move to another machine) Curtis
In message <009BAB0A.A37D7B9E.15@cc.univie.ac.at>, "Christian Panigl, ACOnet/Un iVie" writes:
1.4 Motivation for coordinated parameters
There is a strong need for the coordinated use of dampening parameters because of several reasons:
Coordination of "progressiveness":
If the boundaries for different treatment of longer prefixes and the penalties are not coordinated throughout the Internet, route-flap dampening could even lead to additional flapping or temporary routing-loops because longer prefixes might already be re-announced through some parts of the Internet where shorter prefixes are still held down through other paths.
This is not true. If route flap damping is only applied to EBGP routes there are no problems except long secondary paths getting used. Some more specifics will be blocked in a few places and not in others and they will follow whatever route remains. If all of the more specifics are lost an aggregate will be followed and either blackholed at the aggregator or it will get to the dest. I don't see any opportunity for routing loops. I don't see any issue at all with less specifics being withdrawn and more specifics remaining as described above (I assume you meant the opposite).
Coordination of "aggressiveness":
If an upstream or peering provider would be dampening more aggressively (e.g. triggered by less flaps or applying longer hold-down timers) than an access-provider towards his customers it will lead to a very inconsistent situation, where a flapping network might still be able to reach "near-line" parts of the Internet. Debugging of such instabilities is then much harder because the effect for the customer leads to the assumption that there is a problem "somewhere" in the "upstream" Internet instead of making him just call his ISPs hotline and complain that he can't get out any longer.
Further, after successful repair of the problem the access-provider can easily clear the flap-dampening for his customer on his local router instead of needing to contact upstream NOCs all over the Internet to get the dampening cleared.
This would be an argument in favor of very aggressive damping of ones own customer routes which is unlikely to be a good idea.
2. Recommended dampening parameters
2.1 Motivation for recommendation
At RIPE26 and 27 Christian Panigl presented the following network backbone maintenance example from his own experience, which was triggering flap dampening in some upstream and peering ISPs routers for all his and his customers /24 prefixes for more than 3 hours because of too "aggressive" paramters:
scheduled SW upgrade of backbone router failed:
- reload after SW upgrade 1 flap - new SW crashed 1 flap - reload with old SW 1 flap ------ 3 flaps within 10 minutes
which resulted in the following dampening scenario at some boundaries with progressive route-flap dampening enabled:
Prefix length: /24 /19 /16 suppress time: ~3h 45-60' <30'
Therefore, in the Routing-WG session at RIPE27, it was agreed that suppression should not start until the 4th flap in a row and that the maximum suppression should in no case last longer than 1 hour from the last flap.
It was agreed that a recommendation from RIPE would be desirable. Given that the current allocation policies are expected to hold for the foreseeable future, it was suggested that all /19's or shorter prefixes are not penalised harder than current Cisco default dampening does.
Those suggestions in mind Tony Barber designed the following set of route-flap dampening parameters which have prooved to work smoothly in his environment for a couple of months.
Why is a /24 being announced globally? Our private peerings use a prefix taken from one of the provider's aggregates. The answer to this problem is to arrange things so the rest of the world doesn't need to know about a /24 that can be taken up and down by the software upgrade of a single router. That's what route flap damping can encourage and it seems to have worked in this case except the message didn't register.
3. Open problems
3.1 Multiplication of flaps through multiply interconnected ASes
Christian Panigl recently made the following experience with a line upgrade of an Ebone customer:
- It is absolutely positive that through the upgrade process just ONE flap was generated (disconnect router-port from modem A reconnect to modem B), nevertheless the customers prefix was dampened in all ICM routers (ICM/AS1800 is US upstream for Ebone).
- The flap statistics in the ICM routers stated *4* flaps !!!
- The only explanation would be that the multiple interconnections between Ebone/AS1755 and ICM/AS1800 did multiply the flaps (advertisements/withdrawals arrived time-shifted at ICM routers through the multiple paths).
The flap damping parameters should be applied to Adj-In routes which are per peer. The only problem then can occur if the AS-path changes multiple times. The only solution to that is to keep separate data structs for Adj-In and each observed AS path.
3.2 Is dampening of customer route-flaps a good idea ?
As already explained in section 1.3 flap-dampening is at its best value and most consistent and helpful if applied as near to the source of the problem as possible. Therefore flap-dampening should not only be applied at peering boundaries but even more at customer boundaries !
This is highly unreasonable. Do you really expect to shut off peer route damping every where and ask [insert irresponsible and clueless ISP name here] to damp at the customer attachment? Don't damp the customer attachment. Aggregate! If the customer's connectivity gets hosed a few times, be very persistent in reminding them that renumbering into an aggregate is an option that will solve that problem. Then the rest of the Internet has less flapping routes to damp. Curtis
Curtis Villamizar wrote:
This is not true. If route flap damping is only applied to EBGP routes there are no problems except long secondary paths getting used. Some more specifics will be blocked in a few places and not in others and they will follow whatever route remains. If all of the more specifics are lost an aggregate will be followed and either blackholed at the aggregator or it will get to the dest.
I don't see any opportunity for routing loops. I don't see any issue at all with less specifics being withdrawn and more specifics remaining as described above (I assume you meant the opposite).
Curtis Maybe it is better to say that inconsistent routing can occur ?
Further, after successful repair of the problem the access-provider can easily clear the flap-dampening for his customer on his local router instead of needing to contact upstream NOCs all over the Internet to get the dampening cleared.
This would be an argument in favor of very aggressive damping of ones own customer routes which is unlikely to be a good idea.
Why,there should be no reason to do anything different on the customer access point than is done on the inter ISP peerings.
It was agreed that a recommendation from RIPE would be desirable. Given that the current allocation policies are expected to hold for the foreseeable future, it was suggested that all /19's or shorter prefixes are not penalised harder than current Cisco default dampening does.
Those suggestions in mind Tony Barber designed the following set of route-flap dampening parameters which have prooved to work smoothly in his environment for a couple of months.
Why is a /24 being announced globally? Our private peerings use a prefix taken from one of the provider's aggregates.
Normally that would be the case. In many multihomed scenarios there is a requirement *not* to aggregate this /24 (or whatever longer prefix) This is why we should be seeing such small aggregates leaked out.
The answer to this problem is to arrange things so the rest of the world doesn't need to know about a /24 that can be taken up and down by the software upgrade of a single router. That's what route flap damping can encourage and it seems to have worked in this case except the message didn't register.
This isn't always possible Curtis :-( Also remember that many of your customers do not want to renumber into your PA, they may have class Bs for instance.
3.2 Is dampening of customer route-flaps a good idea ?
As already explained in section 1.3 flap-dampening is at its best value and most consistent and helpful if applied as near to the source of the problem as possible. Therefore flap-dampening should not only be applied at peering boundaries but even more at customer boundaries !
This is highly unreasonable. Do you really expect to shut off peer route damping every where and ask [insert irresponsible and clueless ISP name here] to damp at the customer attachment?
Don't damp the customer attachment. Aggregate!
Yes, but it's not always going to achieve anything, see above. I think that Christians assertion is highly reasonable. In an ideal world you would not dampen but aggregate if the downstream is in your PA. If not you would dampen. What is the best mode of attack and the gives consistency ? This will be down to ISP opinion probably. In your last sentence you agree with it :-?
If the customer's connectivity gets hosed a few times, be very persistent in reminding them that renumbering into an aggregate is an option that will solve that problem. Then the rest of the Internet has less flapping routes to damp.
Curtis
I don't understand this last sentence as it seems to contradict your previous paragraph re Aggregate!, sorry. Regards --Tony
In message <19970923205326.10979.qmail@pool.pipex.net>, Tony Barber writes:
Curtis Villamizar wrote:
This is not true. If route flap damping is only applied to EBGP routes there are no problems except long secondary paths getting used. Some more specifics will be blocked in a few places and not in others and they will follow whatever route remains. If all of the more specifics are lost an aggregate will be followed and either blackholed at the aggregator or it will get to the dest.
I don't see any opportunity for routing loops. I don't see any issue at all with less specifics being withdrawn and more specifics remaining as described above (I assume you meant the opposite).
Curtis
Maybe it is better to say that inconsistent routing can occur ?
Only if you can substantiate it. Looks to me like alternate paths would be used by some providers. No big deal.
Further, after successful repair of the problem the access-provider ca
n
easily clear the flap-dampening for his customer on his local router instead of needing to contact upstream NOCs all over the Internet to g
et
the dampening cleared.
This would be an argument in favor of very aggressive damping of ones own customer routes which is unlikely to be a good idea.
Why,there should be no reason to do anything different on the customer access point than is done on the inter ISP peerings.
For a direct customer we would disable dampenning completely. If a customer is part of an aggregate that is a purely local issue. If not, we'd expect to keep connectivity to our paying customer and only enable damping if they requested it. If others want to damp, we'd just encourage our customer to number into an aggregate. If they were single homed then statics would be a consideration but we'd prefer the customer was part of an aggregate.
Why is a /24 being announced globally? Our private peerings use a prefix taken from one of the provider's aggregates.
Normally that would be the case. In many multihomed scenarios there is a requirement *not* to aggregate this /24 (or whatever longer prefix) This is why we should be seeing such small aggregates leaked out.
If the /24 flaps, that prefix had better be part of a larger aggregate. The two or more direct providers can agree to exchange the /24 with little or no damping so the backup from the rest of the world is to follow the aggregate. The aggregator would then forward the traffic to the working attachment. When damping restrictions are lifted, the path might just be more optimal. Better yet, the /24 should **always** not be announced outside of some aggregation boundary beyond which there would be no change in the path regardless of which provider attachment was in use. (ie: routing from one continent to a multihome on another continent).
The answer to this problem is to arrange things so the rest of the world doesn't need to know about a /24 that can be taken up and down by the software upgrade of a single router. That's what route flap damping can encourage and it seems to have worked in this case except the message didn't register.
This isn't always possible Curtis :-( Also remember that many of your customers do not want to renumber into your PA, they may have class Bs for instance.
Yep. That's their choice. When their /16 gets damped we can only remind them that it was their choice. We recommend that they dual home their /16 to the same provider (us of course:) so they don't get damped. Since we got rid of the last of our Cisco routers in our backbone very little route flap originates from ANS so that is a very safe thing to do (of course some flap does come from ANS but if you look at the AS paths you can see that the vast majority of it reflects our redistributing flap we heard from transit customers).
3.2 Is dampening of customer route-flaps a good idea ?
As already explained in section 1.3 flap-dampening is at its best valu e and most consistent and helpful if applied as near to the source of the problem as possible. Therefore flap-dampening should not only be applied at peering boundaries but even more at customer boundaries !
This is highly unreasonable. Do you really expect to shut off peer route damping every where and ask [insert irresponsible and clueless ISP name here] to damp at the customer attachment?
Don't damp the customer attachment. Aggregate!
Yes, but it's not always going to achieve anything, see above. I think that Christians assertion is highly reasonable. In an ideal world you would not dampen but aggregate if the downstream is in your PA. If not you would dampen. What is the best mode of attack and the gives consistency ? This will be down to ISP opinion probably. In your last sentence you agree with it :-?
I don't favor providing my customer consistency if that means no one at all can reach them to be consistent with a number of providers on another continnent not willing to put up with their prefix flapping. I have yet to meet a customer that thought this would be a better way to meet their needs. I doubt other providers have.
If the customer's connectivity gets hosed a few times, be very persistent in reminding them that renumbering into an aggregate is an option that will solve that problem. Then the rest of the Internet has less flapping routes to damp.
Curtis
I don't understand this last sentence as it seems to contradict your previous paragraph re Aggregate!, sorry.
If the customer renumbers into a provider aggregate, then the rest of the Internet has one less flapping prefix. The customer prefix disappears into the aggregate and is no longer part of the DFZ. If this is a dual home across providers, then it's also OK for the more specific to get damped as long as the providers can keep the more specific from being damped. If some subset of nearby providers can manage not to announce the more specific beyond some boundary (and do it reliably), then all the better. Curtis
participants (4)
-
Christian Panigl, ACOnet/UniVie -
Curtis Villamizar -
Thomas Telkamp -
Tony Barber