bgp behavior with exchange prefix in ebgp

Dear EIXers, I thought I'd expand on my comments in the EIX session earlier this week. In a network that peers over an exchange, you'll see routes like this in the BGP table: Network Next Hop Metric LocPrf Weight Path *>i62.3.192.0/18 193.148.15.112 200 110 0 6728 i *>i62.8.128.0/17 193.148.15.209 110 0 9132 ? *>i62.12.128.0/17 193.148.15.224 1100 110 0 15623 i Note that the next hop points to an address that falls within the (old) AMS-IX peering LAN prefix. Since this router isn't directly connected to the AMS-IX, it needs to resolve this next hop: Routing entry for 193.148.15.0/24 Known via "ospf 1", distance 110, metric 20, type extern 2, forward metric 1 Last update from 62.212.94.61 on GigabitEthernet1/0.1, 00:00:41 ago Routing Descriptor Blocks: * 62.212.94.61, from 62.212.80.68, 00:00:41 ago, via GigabitEthernet1/0.1 Route metric is 20, traffic share count is 1 So in this case the peering LAN prefix is announced over OSPF, so the next hop resolves correctly and all the traffic is sent to the next router, which is the one directly connected to the AMS-IX here. Now consider what happens when someone elsewhere announces the peering LAN prefix over EBGP (this is taken from route-views.oregon-ix.net): BGP routing table entry for 193.148.15.0/24, version 426250 Paths: (1 available, best #1, table Default-IP-Routing-Table) Not advertised to any peer 3277 8482 28968 3246 1200 194.85.4.249 from 194.85.4.249 (194.85.4.249) Origin IGP, localpref 100, valid, external, best And: Routing entry for 193.148.15.0/24 Known via "bgp 6447", distance 20, metric 0 Tag 3277, type external Last update from 194.85.4.249 03:30:30 ago Routing Descriptor Blocks: * 194.85.4.249, from 194.85.4.249, 03:30:30 ago Route metric is 0, traffic share count is 1 AS Hops 5 The important part here is that the 193.148.15.0/24 prefix in OSPF has an administrative distance of 110, while the EBGP route shown here has a much lower administrative distance of 20. This means that the next hop address for the AMS-IX would resolve to an EBGP neighbor and all traffic to exchange peers is redirected over this neighbor! Now fortunately this only happens for routers that learn the peering LAN prefix over EBGP, so the impact is usually limited, but it's well worth the effort to filter out the peering LAN prefixes you connect to yourself. What happened this week at AMS-IX was even more fun: the new prefix for the peering LAN is a /23, but someone started to announce a /24 within that /24. (Probably typed ... 255.255.255.0 rather than ... 255.255.254.0, easy mistake.) The rule that kicked in now wasn't the relatively minor "EBGP has a lower administrative distance than IGPs" but the much heavier "longest match first" one. So in this case even the next hop resolved to an external route on the AMS-IX connected routers themselves, breaking the peering sessions. Iljitsch van Beijnum

Hi, On Fri, Sep 05, 2003 at 10:18:45AM +0200, Iljitsch van Beijnum wrote:
Note that the next hop points to an address that falls within the (old) AMS-IX peering LAN prefix. Since this router isn't directly connected to the AMS-IX, it needs to resolve this next hop:
Two ways to remedy this: - use next-hop-self on all iBGP peerings (which isn't always the perfect choice, but will easily fix these issues) - filter the exchange prefix(es) for all IXes that you participate *and all more specifics* on *all* eBGP sessions. This is strongly recommended even when already using next-hop-self (more specifics can break eBGP sessions at the IXP peering routers) Gert Doering -- NetMaster -- Total number of prefixes smaller than registry allocations: 55575 (56535) SpaceNet AG Mail: netmaster@Space.Net Joseph-Dollinger-Bogen 14 Tel : +49-89-32356-0 80807 Muenchen Fax : +49-89-32356-299

I know -- catching up on old e-mail...
On Fri, Sep 05, 2003 at 10:18:45AM +0200, Iljitsch van Beijnum wrote:
Note that the next hop points to an address that falls within the (old) AMS-IX peering LAN prefix. Since this router isn't directly connected to the AMS-IX, it needs to resolve this next hop:
Two ways to remedy this:
- use next-hop-self on all iBGP peerings (which isn't always the perfect choice, but will easily fix these issues)
- filter the exchange prefix(es) for all IXes that you participate *and all more specifics* on *all* eBGP sessions. This is strongly recommended even when already using next-hop-self (more specifics can break eBGP sessions at the IXP peering routers)
A third and additional defensive measure could be to adjust the preference of your IGP to a value below that for eBGP, and do the moral equivalent of "redistribute connected" on the exchange point router into your IGP. Regards, - HÃ¥vard

Hi, On Thu, Oct 02, 2003 at 10:10:51AM +0200, Havard Eidnes wrote:
- filter the exchange prefix(es) for all IXes that you participate *and all more specifics* on *all* eBGP sessions. This is strongly recommended even when already using next-hop-self (more specifics can break eBGP sessions at the IXP peering routers)
A third and additional defensive measure could be to adjust the preference of your IGP to a value below that for eBGP, and do the moral equivalent of "redistribute connected" on the exchange point router into your IGP.
While I agree that it's useful, it still needs measure 2 - the problem being more-specific annonucements of the IXP prefixes (which can't be caught by the IGP<->BGP preference). Gert Doering -- NetMaster -- Total number of prefixes smaller than registry allocations: 56883 (56833) SpaceNet AG Mail: netmaster@Space.Net Joseph-Dollinger-Bogen 14 Tel : +49-89-32356-0 80807 Muenchen Fax : +49-89-32356-299
participants (4)
-
Gert Doering
-
Havard Eidnes
-
Iljitsch van Beijnum
-
Randy Bush