not announcing IXP IPv6 peering lan prefixes in global BGP table possibly breaks PMTUD

Hello, we had an IPv6 path MTU discovery issue last week and I would like to discuss possible solutions here. The problem is a combination of not announcing IXP IPv6 peering prefixes in the global BGP table and activating loose uRPF at the border of a network. I made the following traceroutes and pings after deactivating loose uRPF at the border. Before this I did not see any packet from the LINX peering lan address 2001:7f8:4::d1c:1, because it is not announced to the global BGP table and therefore not routable. The traceroute ended after 2001:450:2001:800e::2 before. chris@router> traceroute 2a01:e0c:1:1599::1 source 2001:x:x:x::2 traceroute6 to 2a01:e0c:1:1599::1 (2a01:e0c:1:1599::1) from 2001:x:x:x::2, 64 hops max, 12 byte packets [...] 2 2001:450:2001:800e::2 (2001:450:2001:800e::2) 0.793 ms 0.611 ms 0.604 ms 3 2001:7f8:4::d1c:1 (2001:7f8:4::d1c:1) 25.692 ms 227.307 ms 18.160 ms 4 2001:1900:5:2::12e (2001:1900:5:2::12e) 26.565 ms 26.310 ms 26.248 ms 5 2a01:e00:2:9::1 (2a01:e00:2:9::1) 26.746 ms 28.774 ms 40.343 ms [...] An ICMP echo request packet with more than 1410 bytes (1450 byte incl. header) shows that there is a smaller MTU between two routers in the backbone of Level3: PING www.free.fr(www.free.fr) 1403 data bytes
From 2001:7f8:4::d1c:1 icmp_seq=86 Packet too big: mtu=1450 1411 bytes from www.free.fr: icmp_seq=87 ttl=53 time=41.5 ms [...]
2001:7f8:4::d1c:1 seems to be a router of Level3 at LINX. The next router 2001:1900:5:2::12e also has an IP address from the Level3 IPv6 allocation. There seems to be an MTU of 1450 bytes between those two routers. The router at LINX sends out an ICMP "Packet too big" with the source address of the interface where he sees the route to the source address. This is the LINX peering lan, which is currently not announced in BGP. We use loose uRPF at the border to drop all packets from source addresses that are not globally routed. The ICMP "Packet too big" gets lost and path MTU discovery is broken. Communication with big packets is not possible. Some IXPs decided not to announce their peering lan prefixes for some reasons, but in combination with loose uRPF this leads to problems like this one. I would like to discuss the best current practise and possible solutions here. Possible solutions from my point of view: 1) Do not activate loose uRPF at the border of any network. 2) Any network where loose uRPF is configured at the border has to configure static routes for the IXP ranges of every RIR and redistribute them in the IGP so there is a valid route for loose uRPF checks. 3) IXPs announce their peering lan prefixes in the global BGP table and make loose uRPF work for the rest of the world. Members of the IXPs should possibly filter BGP announcements of the IXP peering lan prefixes from external peers when they do not use "next-hop-self" in iBGP within their network. 4) Remove tunnels, use native IPv6. There will always be links with an MTU lower than 1500 byte (access), so this is possibly not the best solution. 5) ? Regards from Berlin, Chris

Hi,
5) ?
Adapt uRPF so that it does't filter ICMP error messages. Whether this is useful depends on how much ICMP error messages with unreachable source addresses we expect to see… When people/organizations start to use ULA addresses it might be more than we see now. - Sander

Hi, On Mon, Jul 25, 2011 at 11:37:05AM +0200, Sander Steffann wrote:
5) ?
Adapt uRPF so that it does't filter ICMP error messages. Whether this is useful depends on how much ICMP error messages with unreachable source addresses we expect to see? When people/organizations start to use ULA addresses it might be more than we see now.
Indeed this sounds like a good "option #5". Christian, can your gear do IPv6-uRPF-with-permit-ACLs in Hardware? (My gear can only do IPv6-uRPF in software, no matter what options I use, so we currently filter by ACL) Gert Doering -- NetMaster -- have you enabled IPv6 on something today...? SpaceNet AG Vorstand: Sebastian v. Bomhard Joseph-Dollinger-Bogen 14 Aufsichtsratsvors.: A. Grundner-Culemann D-80807 Muenchen HRB: 136055 (AG Muenchen) Tel: +49 (89) 32356-444 USt-IdNr.: DE813185279

5) easier said than done as uRPF checks are done solely at layer-3 and nobody wants to send rejected packets to the route processor, don't we?
-----Original Message----- From: ipv6-wg-admin@ripe.net [mailto:ipv6-wg-admin@ripe.net] On Behalf Of Gert Doering Sent: lundi 25 juillet 2011 05:43 To: Sander Steffann Cc: Christian Seitz; ipv6-wg@ripe.net Subject: Re: [ipv6-wg] not announcing IXP IPv6 peering lan prefixes in global BGP table possibly breaks PMTUD
Hi,
On Mon, Jul 25, 2011 at 11:37:05AM +0200, Sander Steffann wrote:
5) ?
Adapt uRPF so that it does't filter ICMP error messages. Whether this is useful depends on how much ICMP error messages with unreachable source addresses we expect to see? When people/organizations start to use ULA addresses it might be more than we see now.
Indeed this sounds like a good "option #5".
Christian, can your gear do IPv6-uRPF-with-permit-ACLs in Hardware?
(My gear can only do IPv6-uRPF in software, no matter what options I use, so we currently filter by ACL)
Gert Doering -- NetMaster -- have you enabled IPv6 on something today...?
SpaceNet AG Vorstand: Sebastian v. Bomhard Joseph-Dollinger-Bogen 14 Aufsichtsratsvors.: A. Grundner-Culemann D-80807 Muenchen HRB: 136055 (AG Muenchen) Tel: +49 (89) 32356-444 USt-IdNr.: DE813185279

Hi, On Mon, Jul 25, 2011 at 05:24:10PM +0200, Eric Vyncke (evyncke) wrote:
5) easier said than done as uRPF checks are done solely at layer-3 and nobody wants to send rejected packets to the route processor, don't we?
Well, that completely depends on the hardware capabilities. Maybe some vendors can do IPv6 uRPF-checks plus uRPF-exception-ACLs in hardware now... Gert Doering -- have you enabled IPv6 on something today...? SpaceNet AG Vorstand: Sebastian v. Bomhard Joseph-Dollinger-Bogen 14 Aufsichtsratsvors.: A. Grundner-Culemann D-80807 Muenchen HRB: 136055 (AG Muenchen) Tel: +49 (89) 32356-444 USt-IdNr.: DE813185279

Hi Gert, On Mon, 25 Jul 2011, Gert Doering wrote:
On Mon, Jul 25, 2011 at 11:37:05AM +0200, Sander Steffann wrote:
5) ?
Adapt uRPF so that it does't filter ICMP error messages. Whether this is useful depends on how much ICMP error messages with unreachable source addresses we expect to see? When people/organizations start to use ULA addresses it might be more than we see now.
Indeed this sounds like a good "option #5".
Christian, can your gear do IPv6-uRPF-with-permit-ACLs in Hardware?
(My gear can only do IPv6-uRPF in software, no matter what options I use, so we currently filter by ACL)
both Juniper MX960 and Cisco CRS-1 are able to do IPv6 uRPF incl. permit ACL (Cisco) and fail-filter (Juniper) in hardware so we would like to use it as for IPv4. Regards, Chris

Hello, On Mon, 25 Jul 2011, Sander Steffann wrote:
5) ?
Adapt uRPF so that it does't filter ICMP error messages. Whether this is useful depends on how much ICMP error messages with unreachable source addresses we expect to see? When people/organizations start to use ULA addresses it might be more than we see now.
do you really want to disable filtering all ICMP packets from non-routed addresses? I do not like to have an ICMP DoS from unroutable addresses in my network. ICMP is important for IPv6 communication to work, yes, but only from routable addresses. ULA could be the next problem. Not only loose uRPF may be the problem in this case, but also infrastructure ACLs which deny ULA addresses from outside. RFC4193 4.3 says that packets from ULA addresses should be filtered at the border. If somebody sends ICMP "Packet too big" with an address from the ULA range as the source address it is expected that it will be dropped somewhere (at the border of the own network, at the border of the destination network or somewhere in a backbone between those two networks). Regards, Chris

Hi, On Tue, Jul 26, 2011 at 09:13:39AM +0200, Christian Seitz wrote:
On Mon, 25 Jul 2011, Sander Steffann wrote:
5) ?
Adapt uRPF so that it does't filter ICMP error messages. Whether this is useful depends on how much ICMP error messages with unreachable source addresses we expect to see? When people/organizations start to use ULA addresses it might be more than we see now.
do you really want to disable filtering all ICMP packets from non-routed addresses? I do not like to have an ICMP DoS from unroutable addresses in my network. ICMP is important for IPv6 communication to work, yes, but only from routable addresses.
Uh, I don't think that point is valid. Regarding DoS possibilities, for ICMP *error* messages (which are not replied to) there's no difference between "coming from routed space" and "coming from non-routed space". If you're worried about DoS-by-ICMP, you need rate-limits. uRPF won't help, as it's easy for a moderate-sized botnet to send you enough traffic from legitimate sources without needing to spoof source addresses...
ULA could be the next problem. Not only loose uRPF may be the problem in this case, but also infrastructure ACLs which deny ULA addresses from outside. RFC4193 4.3 says that packets from ULA addresses should be filtered at the border. If somebody sends ICMP "Packet too big" with an address from the ULA range as the source address it is expected that it will be dropped somewhere (at the border of the own network, at the border of the destination network or somewhere in a backbone between those two networks).
Now that's a different can of worms. If someone numbers their transit network with ULAs and sends ICMP errors from ULA space, they deserve what you can think up for them. Gert Doering -- NetMaster -- have you enabled IPv6 on something today...? SpaceNet AG Vorstand: Sebastian v. Bomhard Joseph-Dollinger-Bogen 14 Aufsichtsratsvors.: A. Grundner-Culemann D-80807 Muenchen HRB: 136055 (AG Muenchen) Tel: +49 (89) 32356-444 USt-IdNr.: DE813185279

ULA could be the next problem. Not only loose uRPF may be the problem in this case, but also infrastructure ACLs which deny ULA addresses from outside. RFC4193 4.3 says that packets from ULA addresses should be filtered at the border. If somebody sends ICMP "Packet too big" with an address from the ULA range as the source address it is expected that it will be dropped somewhere (at the border of the own network, at the border of the destination network or somewhere in a backbone between those two networks).
Now that's a different can of worms. If someone numbers their transit network with ULAs and sends ICMP errors from ULA space, they deserve what you can think up for them.
Enterprise IPv6 networks using PA space will (to avoid renumbering after an ISP change) and you'll see ICMP errors coming from them when the inbound IPv6 packets transit VPN tunnels. Ivan

* Christian Seitz:
5) ?
Send those ICMP messages from a globally reachable IP address. The source address doesn't matter, after all. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99

Hello Florian, On Mon, 25 Jul 2011, Florian Weimer wrote:
* Christian Seitz:
5) ?
Send those ICMP messages from a globally reachable IP address. The source address doesn't matter, after all.
so Level3 should fix the problem in this case? I notified them about the problem last week, but since today nothing happened. I still see
From 2001:7f8:4::d1c:1 icmp_seq=1 Packet too big: mtu=1450
when sending big packets. free.fr still has many IPv6 customers online and I hope that more and more access providers will enable IPv6. These issues have to be fixed in the own network first. Waiting for the big carriers to do anything can take some time... :-/ Regards, Chris

On Tue, Jul 26, 2011 at 09:05:25AM +0200, Christian Seitz wrote:
Send those ICMP messages from a globally reachable IP address. The source address doesn't matter, after all.
so Level3 should fix the problem in this case?
Surely not. There is nothing to fix, they don't do anything wrong. The problem is indeed that the IXP prefix is not advertised, which is inherently incompatible with any uRPF (and exactly the reason why you see e.g. 2001:7f8:2c::/48 in the DFZ since 2004 :-)).
From 2001:7f8:4::d1c:1 icmp_seq=1 Packet too big: mtu=1450
Die Adresse ist korrekt. Siehe RFC4443: =================================================================== 2.2. Message Source Address Determination A node that originates an ICMPv6 message has to determine both the Source and Destination IPv6 Addresses in the IPv6 header before calculating the checksum. If the node has more than one unicast address, it MUST choose the Source Address of the message as follows: (a) If the message is a response to a message sent to one of the node's unicast addresses, the Source Address of the reply MUST be that same address. (b) If the message is a response to a message sent to any other address, such as - a multicast group address, - an anycast address implemented by the node, or - a unicast address that does not belong to the node the Source Address of the ICMPv6 packet MUST be a unicast address belonging to the node. The address SHOULD be chosen according to the rules that would be used to select the source address for any other packet originated by the node, given the destination address of the packet. However, it MAY be selected in an alternative way if this would lead to a more informative choice of address reachable from the destination of the ICMPv6 packet. =================================================================== And the address which would be selected as "source address for any other packet originated by the node, given the destination address of the packet" is quite usually the interface IP of the egress interface towards the ICMP packet destination. Asking L3 to change that (if they could at all) is unreasonable IMHO. Best regards, Daniel -- CLUE-RIPE -- Jabber: dr@cluenet.de -- dr@IRCnet -- PGP: 0xA85C8AA0
participants (7)
-
Christian Seitz
-
Daniel Roesen
-
Eric Vyncke (evyncke)
-
Florian Weimer
-
Gert Doering
-
Ivan Pepelnjak
-
Sander Steffann