Re: [routing-wg] New on RIPE Labs: BGP Zombies

24 Apr 2019

      ...
One of the issues we found (Philip Smith and I) "back then" was indeed
router bugs.  The combination of "export policy is changed" with "an 
update is queued for this neighbour right then" led to control-plane 
confusion and missing withdraws.  This was fixed.
cool
...
My conclusion then was that something along the following line happens
- router R1 remembers where an UPDATE was sent to
 - export policy on R1 is changed, changing whether or not a given
   peer would receive an UPDATE for a given prefix
 - R1 receives withdraw from his best (and only) path, prefix is gone
 - R1 sends withdraw to "all peers it remembers"
 - and something goes wrong if that list of peers is not reflecting the
   real set of peers, possibly due to "BGP internal state not fully in
   sync between 'export policy is changed' and 'withdraw comes in'", so
   R1 is no longer aware that one of his neighbours received the prefix
   originally.
believable conjecture.  could and should be tested in lab.

but does not explain the cases where we see stuck routes on devices
which have no config changes for a loooong time (if you believe rancid).

randy

Re: [routing-wg] New on RIPE Labs: BGP Zombies

Randy Bush