Oliver Bartels;
However, memory access cycle can be a lot larger than a CPU clock cycle.
With 10000 prefixed put in in a TCAM and get a result per pipeline cycle. This means 100M packet lookups per second, which is sufficient for >10G. TCAM does the complete prefix lookup in one cycle.
I know. Do you know that the current Internet backbone is operating with parallel 10Gs?
It's limit by price is the table size (typically 64K to 256K).
So, large memory costs. Do you also know that access speed of memory (including but not limited to TCAM) degrades proportional to log or sqrt of the number of entries?
If it is combined with regular RAM (per cluster table), someone can select millions of prefixes *pipelined* within few accesses in a fully pipelined architecture in the >=10G and >=100Mpps range.
I know. And, it costs a lot.
On typical modern chips, tens of registers can be accessed within a CPU cycle. On chip primary cache with thousands of entries needs about twice or three times more than that. Off chip cache needs about ten, twenty or, maybe, hundred more to access.
Modern Routers no longer use traditional CPU/cache architectures. Either fast static RAM together with trie structures (e.g. patricia tree/radix tree) or TCAM is used together with highly pipelined processors.
Both modern routers and modern CPUs are highly pipelined, which means there is some performance loss if TCAM or primary cache miss occurs. Secondary or third level cache of modern CPUS often have millions of entries and constructed with static RAM. Masataka Ohta