Re: [db-wg] NRTM replication inefficiencies

9 Nov 2017

      On Thu, Nov 09, 2017 at 04:28:03PM +0100, Tim Bruijnzeels wrote:
...
...
On 7 Nov 2017, at 23:11, Job Snijders via db-wg <db-wg@ripe.net> wrote:
I would also welcome an investigation into alternative approaches, (some
not-via-WHOIS replication mechanisms), perhaps something over HTTPS can
be done? Either way, something more robust would be useful.
We recently developed and implement a standard for something similar for RPKI:
https://datatracker.ietf.org/doc/rfc8182/
I believe this approach can be useful here as well. Without going into all the RPKI specifics, it works a little something like this:
Starting points:
= The state of the rpki repository (or whois) at a given point in time can represented by a ‘snapshot’
    - This snapshot is “immutable” - therefore they may be cached indefinitely and we can give it a unique URL and deliver it through a distributed CDN
= The delta between two consecutive snapshots is also “immutable” data - so again we can cache it and give it a unique URL and distribute
= We can publish a notification file (which should NOT be cached) that points to:
   - the CURRENT snapshot
   - a list of deltas (each for 1 increment) - total size of deltas MUST not exceed size of snapshot
Clients can then just poll the notification file and work out for
themselves whether a list of deltas is available to them, or that they
need to get the latest snapshot instead.
Yes, we use a session_id and hashes of referenced files for additional
checks (details in the RFC).
The idea behind this design was that we wanted to minimise the impact
on the server. In a chatty protocol (like rsync which is still used in
RPKI) the server and client need to work out their differences
together to determine what needs to be transferred. This is fine in
one on one relations, but when a server needs to serve a multitude of
clients this doesn’t scale. We want to be able parallelise as much as
we can (Amdahl’s law), so we push the computational burden to the
clients. The server just needs a one-off investment to create the
snapshot and delta and latest notification which it can then offload.
Using HTTPS allows us to leverage one of the many, many CDNs out
there. This problem has been solved in the industry. So we do not need
to invent our own infrastructure for this.
Note that in the case of RPKI the protocol is XML based. This made
sense because it leveraged existing definitions in the RPKI space that
were also XML based. For whois it may make more sense to look at JSON
and/or RDAP.
Please let me know if you see merit in this kind of ‘delta’ protocol
in the whois space.
yes, I think there may be merit to replacing NRTM, and DELTA would
certainly be a good source of inspiration.

Would it be fair to ask for a two-pronged approach? DELTA-WHOIS +
WHOIS-END-OF-BLURP markings?

How much work (or complexity?) is involved for RIPE NCC to develop a
marking that is send to the client at the end of a '-g' query that also
had '-k' enabled?

Kind regards,

Job