Re: Draft of Route-Flap Dampening Paper

23 Sep 1997

      In message <009BAB0A.A37D7B9E.15@cc.univie.ac.at>, "Christian Panigl, ACOnet/Un
iVie" writes:
...
1.4 Motivation for coordinated parameters
There is a strong need for the coordinated use of dampening parameters
    because of several reasons:
Coordination of "progressiveness":
If the boundaries for different treatment of longer prefixes and the
    penalties are not coordinated throughout the Internet, route-flap
    dampening could even lead to additional flapping or temporary
    routing-loops because longer prefixes might already be re-announced
    through some parts of the Internet where shorter prefixes are still held
    down through other paths.
This is not true.  If route flap damping is only applied to EBGP
routes there are no problems except long secondary paths getting used.
Some more specifics will be blocked in a few places and not in others
and they will follow whatever route remains.  If all of the more
specifics are lost an aggregate will be followed and either blackholed
at the aggregator or it will get to the dest.

I don't see any opportunity for routing loops.  I don't see any issue
at all with less specifics being withdrawn and more specifics
remaining as described above (I assume you meant the opposite).
...
Coordination of "aggressiveness":
If an upstream or peering provider would be dampening more aggressively
    (e.g. triggered by less flaps or applying longer hold-down timers) than
    an access-provider towards his customers it will lead to a very
    inconsistent situation, where a flapping network might still be able to
    reach "near-line" parts of the Internet.  Debugging of such
    instabilities is then much harder because the effect for the customer
    leads to the assumption that there is a problem "somewhere" in the
    "upstream" Internet instead of making him just call his ISPs hotline and
    complain that he can't get out any longer.
Further, after successful repair of the problem the access-provider can
    easily clear the flap-dampening for his customer on his local router
    instead of needing to contact upstream NOCs all over the Internet to get
    the dampening cleared.
This would be an argument in favor of very aggressive damping of ones
own customer routes which is unlikely to be a good idea.
...
2. Recommended dampening parameters
2.1 Motivation for recommendation
At RIPE26 and 27 Christian Panigl presented the following network
    backbone maintenance example from his own experience, which was
    triggering flap dampening in some upstream and peering ISPs routers for
    all his and his customers /24 prefixes for more than 3 hours because of
    too "aggressive" paramters:
scheduled SW upgrade of backbone router failed:
- reload after SW upgrade	1 flap
  - new SW crashed		1 flap
  - reload with old SW		1 flap
      			------
      			3 flaps within 10 minutes
which resulted in the following dampening scenario at some boundaries
    with progressive route-flap dampening enabled:
Prefix length:	/24	/19	/16
    suppress time:	~3h	45-60'	<30'
Therefore, in the Routing-WG session at RIPE27, it was agreed that
    suppression should not start until the 4th flap in a row and that the
    maximum suppression should in no case last longer than 1 hour from the
    last flap.
It was agreed that a recommendation from RIPE would be desirable.  Given
    that the current allocation policies are expected to hold for the
    foreseeable future, it was suggested that all /19's or shorter prefixes
    are not penalised harder than current Cisco default dampening does.
Those suggestions in mind Tony Barber designed the following set of
    route-flap dampening parameters which have prooved to work smoothly in
    his environment for a couple of months.
Why is a /24 being announced globally?  Our private peerings use a
prefix taken from one of the provider's aggregates.

The answer to this problem is to arrange things so the rest of the
world doesn't need to know about a /24 that can be taken up and down
by the software upgrade of a single router.  That's what route flap
damping can encourage and it seems to have worked in this case except
the message didn't register.
...
3. Open problems
3.1 Multiplication of flaps through multiply interconnected ASes
Christian Panigl recently made the following experience with a line
    upgrade of an Ebone customer:
- It is absolutely positive that through the upgrade process just ONE
      flap was generated (disconnect router-port from modem A reconnect to
      modem B), nevertheless the customers prefix was dampened in all ICM
      routers (ICM/AS1800 is US upstream for Ebone).
- The flap statistics in the ICM routers stated *4* flaps !!!
- The only explanation would be that the multiple interconnections
      between Ebone/AS1755 and ICM/AS1800 did multiply the flaps
      (advertisements/withdrawals arrived time-shifted at ICM routers
      through the multiple paths).
The flap damping parameters should be applied to Adj-In routes which
are per peer.  The only problem then can occur if the AS-path changes
multiple times.  The only solution to that is to keep separate data
structs for Adj-In and each observed AS path.
...
3.2 Is dampening of customer route-flaps a good idea ?
As already explained in section 1.3 flap-dampening is at its best value
    and most consistent and helpful if applied as near to the source of
    the problem as possible.  Therefore flap-dampening should not only be
    applied at peering boundaries but even more at customer boundaries !
This is highly unreasonable.  Do you really expect to shut off peer
route damping every where and ask [insert irresponsible and clueless
ISP name here] to damp at the customer attachment?

Don't damp the customer attachment.  Aggregate!

If the customer's connectivity gets hosed a few times, be very
persistent in reminding them that renumbering into an aggregate is an
option that will solve that problem.  Then the rest of the Internet
has less flapping routes to damp.

Curtis

Re: Draft of Route-Flap Dampening Paper

Curtis Villamizar