In message <009BAB0A.A37D7B9E.15@cc.univie.ac.at>, "Christian Panigl, ACOnet/Un iVie" writes:
1.4 Motivation for coordinated parameters
There is a strong need for the coordinated use of dampening parameters because of several reasons:
Coordination of "progressiveness":
If the boundaries for different treatment of longer prefixes and the penalties are not coordinated throughout the Internet, route-flap dampening could even lead to additional flapping or temporary routing-loops because longer prefixes might already be re-announced through some parts of the Internet where shorter prefixes are still held down through other paths.
This is not true. If route flap damping is only applied to EBGP routes there are no problems except long secondary paths getting used. Some more specifics will be blocked in a few places and not in others and they will follow whatever route remains. If all of the more specifics are lost an aggregate will be followed and either blackholed at the aggregator or it will get to the dest. I don't see any opportunity for routing loops. I don't see any issue at all with less specifics being withdrawn and more specifics remaining as described above (I assume you meant the opposite).
Coordination of "aggressiveness":
If an upstream or peering provider would be dampening more aggressively (e.g. triggered by less flaps or applying longer hold-down timers) than an access-provider towards his customers it will lead to a very inconsistent situation, where a flapping network might still be able to reach "near-line" parts of the Internet. Debugging of such instabilities is then much harder because the effect for the customer leads to the assumption that there is a problem "somewhere" in the "upstream" Internet instead of making him just call his ISPs hotline and complain that he can't get out any longer.
Further, after successful repair of the problem the access-provider can easily clear the flap-dampening for his customer on his local router instead of needing to contact upstream NOCs all over the Internet to get the dampening cleared.
This would be an argument in favor of very aggressive damping of ones own customer routes which is unlikely to be a good idea.
2. Recommended dampening parameters
2.1 Motivation for recommendation
At RIPE26 and 27 Christian Panigl presented the following network backbone maintenance example from his own experience, which was triggering flap dampening in some upstream and peering ISPs routers for all his and his customers /24 prefixes for more than 3 hours because of too "aggressive" paramters:
scheduled SW upgrade of backbone router failed:
- reload after SW upgrade 1 flap - new SW crashed 1 flap - reload with old SW 1 flap ------ 3 flaps within 10 minutes
which resulted in the following dampening scenario at some boundaries with progressive route-flap dampening enabled:
Prefix length: /24 /19 /16 suppress time: ~3h 45-60' <30'
Therefore, in the Routing-WG session at RIPE27, it was agreed that suppression should not start until the 4th flap in a row and that the maximum suppression should in no case last longer than 1 hour from the last flap.
It was agreed that a recommendation from RIPE would be desirable. Given that the current allocation policies are expected to hold for the foreseeable future, it was suggested that all /19's or shorter prefixes are not penalised harder than current Cisco default dampening does.
Those suggestions in mind Tony Barber designed the following set of route-flap dampening parameters which have prooved to work smoothly in his environment for a couple of months.
Why is a /24 being announced globally? Our private peerings use a prefix taken from one of the provider's aggregates. The answer to this problem is to arrange things so the rest of the world doesn't need to know about a /24 that can be taken up and down by the software upgrade of a single router. That's what route flap damping can encourage and it seems to have worked in this case except the message didn't register.
3. Open problems
3.1 Multiplication of flaps through multiply interconnected ASes
Christian Panigl recently made the following experience with a line upgrade of an Ebone customer:
- It is absolutely positive that through the upgrade process just ONE flap was generated (disconnect router-port from modem A reconnect to modem B), nevertheless the customers prefix was dampened in all ICM routers (ICM/AS1800 is US upstream for Ebone).
- The flap statistics in the ICM routers stated *4* flaps !!!
- The only explanation would be that the multiple interconnections between Ebone/AS1755 and ICM/AS1800 did multiply the flaps (advertisements/withdrawals arrived time-shifted at ICM routers through the multiple paths).
The flap damping parameters should be applied to Adj-In routes which are per peer. The only problem then can occur if the AS-path changes multiple times. The only solution to that is to keep separate data structs for Adj-In and each observed AS path.
3.2 Is dampening of customer route-flaps a good idea ?
As already explained in section 1.3 flap-dampening is at its best value and most consistent and helpful if applied as near to the source of the problem as possible. Therefore flap-dampening should not only be applied at peering boundaries but even more at customer boundaries !
This is highly unreasonable. Do you really expect to shut off peer route damping every where and ask [insert irresponsible and clueless ISP name here] to damp at the customer attachment? Don't damp the customer attachment. Aggregate! If the customer's connectivity gets hosed a few times, be very persistent in reminding them that renumbering into an aggregate is an option that will solve that problem. Then the rest of the Internet has less flapping routes to damp. Curtis