Hello, we had an IPv6 path MTU discovery issue last week and I would like to discuss possible solutions here. The problem is a combination of not announcing IXP IPv6 peering prefixes in the global BGP table and activating loose uRPF at the border of a network. I made the following traceroutes and pings after deactivating loose uRPF at the border. Before this I did not see any packet from the LINX peering lan address 2001:7f8:4::d1c:1, because it is not announced to the global BGP table and therefore not routable. The traceroute ended after 2001:450:2001:800e::2 before. chris@router> traceroute 2a01:e0c:1:1599::1 source 2001:x:x:x::2 traceroute6 to 2a01:e0c:1:1599::1 (2a01:e0c:1:1599::1) from 2001:x:x:x::2, 64 hops max, 12 byte packets [...] 2 2001:450:2001:800e::2 (2001:450:2001:800e::2) 0.793 ms 0.611 ms 0.604 ms 3 2001:7f8:4::d1c:1 (2001:7f8:4::d1c:1) 25.692 ms 227.307 ms 18.160 ms 4 2001:1900:5:2::12e (2001:1900:5:2::12e) 26.565 ms 26.310 ms 26.248 ms 5 2a01:e00:2:9::1 (2a01:e00:2:9::1) 26.746 ms 28.774 ms 40.343 ms [...] An ICMP echo request packet with more than 1410 bytes (1450 byte incl. header) shows that there is a smaller MTU between two routers in the backbone of Level3: PING www.free.fr(www.free.fr) 1403 data bytes
From 2001:7f8:4::d1c:1 icmp_seq=86 Packet too big: mtu=1450 1411 bytes from www.free.fr: icmp_seq=87 ttl=53 time=41.5 ms [...]
2001:7f8:4::d1c:1 seems to be a router of Level3 at LINX. The next router 2001:1900:5:2::12e also has an IP address from the Level3 IPv6 allocation. There seems to be an MTU of 1450 bytes between those two routers. The router at LINX sends out an ICMP "Packet too big" with the source address of the interface where he sees the route to the source address. This is the LINX peering lan, which is currently not announced in BGP. We use loose uRPF at the border to drop all packets from source addresses that are not globally routed. The ICMP "Packet too big" gets lost and path MTU discovery is broken. Communication with big packets is not possible. Some IXPs decided not to announce their peering lan prefixes for some reasons, but in combination with loose uRPF this leads to problems like this one. I would like to discuss the best current practise and possible solutions here. Possible solutions from my point of view: 1) Do not activate loose uRPF at the border of any network. 2) Any network where loose uRPF is configured at the border has to configure static routes for the IXP ranges of every RIR and redistribute them in the IGP so there is a valid route for loose uRPF checks. 3) IXPs announce their peering lan prefixes in the global BGP table and make loose uRPF work for the rest of the world. Members of the IXPs should possibly filter BGP announcements of the IXP peering lan prefixes from external peers when they do not use "next-hop-self" in iBGP within their network. 4) Remove tunnels, use native IPv6. There will always be links with an MTU lower than 1500 byte (access), so this is possibly not the best solution. 5) ? Regards from Berlin, Chris