On Wed, 6 Apr 2005 17:52:30 +0200, Iljitsch van Beijnum wrote:
So what? Most of us aren't. Does your native language use asterisks and smileys for punctuation? If you don't like them, ignore them.
Well, worst case a 10 Gbps router has 67.2 nanoseconds to forward a packet before the next one comes in. At these speeds every single memory cycle is relevant. This is simply wrong.
In know this for sure because we are designing such boxes. The key point is that packet forwarding is a process which can (and is in current designs) perfectly pipelined and parallelized. What you do is e.g.: - Convert the table to a radix tree or to microcode representing a radix tree - Wait until the packet header is complete - Extract the destination of packet 1, add some cyclic local id., to address and packet buffer (or take the buffer id. as local id.) put both in the pipeline for checking the first tree level - In the meantime do the same for packet 2, 3 etc. - With the first level decision done, re-schedule the address of the packets for level 2, 3 etc. - At the end there is a decision for the packet referenced by the local id. Flush the packet buffer to the next hop and release it for new packets. Typically one would implement an online compiler to translate the table into microcode and would have multiple "threads" running on the same engine. The response time is not critical, just the throughput, which can be easily increased by adding additional processing units. Another approach is using multiple network CPU's on a single chip. But if you like, you may also do a forwarding decision on a 256K table in one TCAM cycle. IDT delivers such devices and they are happy to sell them.
However, I must admit that I was a bit careless with updating the table and forwarding, things will have to get pretty bad before updating will run into memory bandwidth problems. But if and when that happens, we're in a deep pile of brown stuff as read random access memory bandwidth for large memories hasn't been able to keep up with progress in other areas in the past and it's unlikely that this will happen in the future. I can give you numbers: Standard PC, 1,5GHz, nothing really fast, full table, full policy test of our implementation :
Can you please point me to the part in the BGP spec that explains where in BGP the SPF calculations are done? It is a shortest inter-AS path calculation which is applied if
Full table (150K) read and processed in less than 1 second if the other side can deliver it as fast as we read it. there is no policy which enforces other rules. The selection rules are outside the scope of RFC1771 (see para 9.3) but they are implemented by all vendors in a similar way: If no other policy enforces other selections, the path list provided is evaluated and the shortest list is taken. As this (hopefully, if there are no politics and thus specific policies active) happens within other AS's too, typically a shortest inter AS path is determinated. Please keep in mind that local policies (e.g. localpref) are active, they are prefered, because of this and of the missing AS internal knowledge the route determination does not have the same quality as e.g. intra-area OSPF.
Yes, if you sell hardware. Not so much if you're the person who has to foot the bill. Obvously I'm not saying we should be able to run core networks on 15 year old hardware. The 2501 is of the 10 to 15 years class ...
Take *any* PC from Ebay, put a Zebra/Quagga on it and you will have a *much* better routing performance and all your problems with the "large" table are solved. A table with 100K to 200K entries is not large for actual hardware. If we talk about >>1 Mio. Prefixes and PI for everyone, then we should consider an update of the BGP. Greetings Oliver Oliver Bartels F+E + Bartels System GmbH + 85435 Erding, Germany oliver@bartels.de + http://www.bartels.de + Tel. +49-8122-9729-0