Fw: [Open Peering] Routers hit by malformed BGP attribute lists

Before we start playing with asn32 beacons, this might be of interest:
It might very well be that this route was trying to encode the newly (januari 1) introduced 32-bits AS numbers in a backwards compatible (but apparently problematic) way. In that case the problem might reoccur, until everybody has upgraded their OS version to support 32-bits AS numbers.
Although it's not sure yet whether it's asn32 or something else, we should be careful not to cause other people's sessions to die. Full message follows: ----- Forwarded message from openpeering@openpeering.nl ----- Date: Thu, 04 Jan 2007 11:28:39 +0100 Organization: Open Peering User-Agent: Thunderbird 1.5.0.7 (X11/20060916) To: 'Open Peering NOC' <noc@openpeering.nl> From: openpeering@openpeering.nl Cc: Subject: [risops] [Open Peering] Routers hit by malformed BGP attribute lists Dear Open Peering customers, Yesterday, wednesday Januari 3, 2007 around 17:36, serveral parties saw their BGP session(s) towards Open Peering go down/flap with a log message about bad/invalid/malformed BGP attribute lists. This problem was caused by a provider in the Ukraine (AS35731) which advertised a (/30) route on the Internet sourced by AS65422 (an AS number from the IANA private range) with an attribute list which apparantly was not handled well by serveral router brands/OS versions. The Open Peering router and all peer router hops on the path between this Ukraine provider and Open Peering (4 AS'es) apparently did not have a problem with the route, but apparantly some of the Open Peering customer routers responded on the route announcement by shutting down/resetting the BGP session with Open Peering. Routers should probably not shutdown an entire BGP session if they receive a route with an attribute list they do not understand/cannot handle: they should probably simply log and drop that specific route instead. The Open Peeirng router itself did not have any problem with the route, so inherently it did not log any data about the route either. The route is not advertised currently anymore, so it is also not possible to debug it from the current routing table. It might very well be that this route was trying to encode the newly (januari 1) introduced 32-bits AS numbers in a backwards compatible (but apparently problematic) way. In that case the problem might reoccur, until everybody has upgraded their OS version to support 32-bits AS numbers. Therefor we are currently relying on error messages and especially hex dumps in the logs of customer routers to try to find out what was 'special' about the specific route and if the route was actually invalid (not according to official specs) or that it was officially valid but is not handled right by some router brands/OS versions. Apart from the attribute list, specifically of this route we know the AS number it was announced under was special: AS65422, which is part of the AS number range 64.512 through 65.534, which is reserved by IANA for private AS numbers and should not be visible on the Internet at all. Like many of cour customers, we use a 'remove-private-as' statement on all BGP sessions already, but this does not filter/drop routes if they contain both normal and private AS numbers in the AS path, which was the case here. We are therefor currently filtering private AS numbers with a as-path access list like ^*6[45][0-9][0-9][0-9] and it would be advisable for customers to implement such a filter on all their BGP session as well. Ofcourse in the future a route with the same 'special' atribute list could theoretically also be advertised from a normal/official source AS number. Therefor filtering private AS numbers does not inherently protect you from this problem in all cases. In this specific case it would probably have prevented problems though. Further in this specific case the route was also one with a netmask more specific then /24. Therefor customers could decide to filter out all routes they receive which have a more specific netmask. Again this is not a solution the protects you in all cases, but would have prevented problems in this specific case. As many global transit carriers already filter out all routes with more specific netmasks then /24, you probably have not received the same problematic route from other transit providers in the specific case. This explains why other BGP sessions did not go down at the same time. Again, if in the future the problem reoccurs with an official AS number and a subnet of /24 or larger, the route would not be filtered out by other global transit carriers either. It would be advisable to contact your BGP router supplier under your OS support contract to check: - If they have a 'known problem' with BGP attribute lists which causes BGP sessions to go down and have a fix for that; - If you have log info with detailed info (hex dump), check with them if they can explain what is going wrong with the route and why their OS cannot handle it, and if there is a fix for that; - If they have an OS option to change this behaviour into simply only logging and ignoring any 'problem' route instead of shutting down the BGP session; - If they have an OS upgrade available to support 32-bits AS numbers; - If they have an OP option or can implement it to enforce filtering of routes with private AS numbers in the AS path in any case. We will ofcourse continue to investigate the issue ourselves as well, and report if we have any other relevant information. Our current filtering will prevent the problem to reoccur if the route is announced under the same conditions. But as we do not know the details of the route (actual attribute list) nor why it causes problems on some customers routers, it is impossible to filter it out in all cases currently. Therefor the problem might theoretically reoccur. Regards, -- Open Peering NOC noc@openpeering.nl Raamweg 17, 2596 HL Den Haag, Holland +31 (0)70 363 16 61 (voice) +31 (0)70 392 22 16 (fax) _______________________________________________ OpenPeering announcement Email: noc@openpeering.nl Web: http://www.openpeering.nl/ ----- End forwarded message -----
participants (1)
-
Erik Romijn