Re: [address-policy-wg] Re: Fallacy by Kurt (was Re: IPv6 Policy Clarification - Initial allocation criteria "d)")

22 Jun 2004

      Oliver Bartels;
...
...
However, memory access cycle can be a lot larger than a CPU clock
cycle.
With 10000 prefixed put in in a TCAM and get a result per pipeline
cycle. This means 100M packet lookups per second, which is
sufficient for >10G. TCAM does the complete prefix lookup
in one cycle.
I know.

Do you know that the current Internet backbone is operating
with parallel 10Gs?
...
It's limit by price is the table size (typically 64K to 256K).
So, large memory costs.

Do you also know that access speed of memory (including but
not limited to TCAM) degrades proportional to log or sqrt of
the number of entries?
...
If it is combined with regular RAM (per cluster table), someone can
select millions of prefixes *pipelined* within few accesses in
a fully pipelined architecture in the >=10G and >=100Mpps range.
I know. And, it costs a lot.
...
...
On typical modern chips, tens of registers can be accessed within
a CPU cycle. On chip primary cache with thousands of entries
needs about twice or three times more than that. Off chip cache
needs about ten, twenty or, maybe, hundred more to access.
Modern Routers no longer use traditional CPU/cache architectures.
Either fast static RAM together with trie structures (e.g. patricia
tree/radix tree) or TCAM is used together with highly pipelined
processors.
Both modern routers and modern CPUs are highly pipelined, which
means there is some performance loss if TCAM or primary cache
miss occurs.

Secondary or third level cache of modern CPUS often have millions
of entries and constructed with static RAM.

							Masataka Ohta