Re: [address-policy-wg] Policy proposal: #gamma IPv6 Initial Allocation Criteria

6 Apr 2005

      On Wed, 6 Apr 2005 17:52:30 +0200, Iljitsch van Beijnum wrote:
...
So what? Most of us aren't. Does your native language use asterisks and 
smileys for punctuation?
If you don't like them, ignore them.
...
Well, worst case a 10 Gbps router has 67.2 nanoseconds to forward a 
packet before the next one comes in. At these speeds every single 
memory cycle is relevant.
This is simply wrong.
In know this for sure because we are designing such boxes.

The key point is that packet forwarding is a process which
can (and is in current designs) perfectly pipelined and
parallelized.

What you do is e.g.:
- Convert the table to a radix tree or to microcode representing
a radix tree
- Wait until the packet header is complete
- Extract the destination of packet 1, add some cyclic local id.,
to address and packet buffer (or take the buffer id. as local id.)
put both in the pipeline for checking the first tree level
- In the meantime do the same for packet 2, 3 etc.
- With the first level decision done, re-schedule the
address of the packets for level 2, 3 etc.
- At the end there is a decision for the packet referenced
by the local id. Flush the packet buffer to the next hop and
release it for new packets.

Typically one would implement an online compiler to translate
the table into microcode and would have multiple "threads"
running on the same engine.

The response time is not critical, just the throughput, which
can be easily increased by adding additional processing
units.

Another approach is using multiple network CPU's on a
single chip.

But if you like, you may also do a forwarding decision on
a 256K table in one TCAM cycle. IDT delivers such devices
and they are happy to sell them.
...
However, I must admit that I was a bit careless with updating the table 
and forwarding, things will have to get pretty bad before updating will 
run into memory bandwidth problems. But if and when that happens, we're 
in a deep pile of brown stuff as read random access memory bandwidth 
for large memories hasn't been able to keep up with progress in other 
areas in the past and it's unlikely that this will happen in the 
future.
I can give you numbers: Standard PC, 1,5GHz, nothing really fast,
full table, full policy test of our implementation :
...
Can you please point me to the part in the BGP spec that explains where 
in BGP the SPF calculations are done?
It is a shortest inter-AS path calculation which is applied if
Full table (150K) read and processed in less than 1 second
if the other side can deliver it as fast as we read it.

there is no policy which enforces other rules.
The selection rules are outside the scope of RFC1771
(see para 9.3) but they are implemented by all vendors
in a similar way: If no other policy enforces other selections,
the path list provided is evaluated and the shortest list
is taken. As this (hopefully, if there are no politics and
thus specific policies active) happens within other
AS's too, typically a shortest inter AS path is determinated.

Please keep in mind that local policies (e.g. localpref)
are active, they are prefered, because of this and of the
missing AS internal knowledge the route determination
does not have the same quality as e.g. intra-area OSPF.
...
Yes, if you sell hardware. Not so much if you're the person who has to 
foot the bill. Obvously I'm not saying we should be able to run core 
networks on 15 year old hardware.
The 2501 is of the 10 to 15 years class ...
Take *any* PC from Ebay, put a Zebra/Quagga on it and you will
have a *much* better routing performance and all your problems
with the "large" table are solved.

A table with 100K to 200K entries is not large for actual hardware.

If we talk about >>1 Mio. Prefixes and PI for everyone, then we
should consider an update of the BGP.

Greetings
Oliver

Oliver Bartels F+E + Bartels System GmbH + 85435 Erding, Germany
oliver@bartels.de + http://www.bartels.de + Tel. +49-8122-9729-0