On Thu, Feb 18, 2021 at 10:37 AM Randy Bush <randy@psg.com> wrote:
To refresh the stack, can you give me an instance please?
then you need to fix operational deployment.
Thats work-in-progress. We were hoping to move on a process design to get there, while we finish that deployment. Almost all children NOT in hosted, are RRDP active. I would be very surprised if the majority use case now, is not RRDP active.
then you can measure the net to be sure everybody is serving rrdp properly.
That sounds like a fine activity for somebody ELSE to do, to me.
see our imc 2020 paper
The data is from January-April 2020. It would be interesting to see how the landscape has changed by April 2021 I think. Two reasons: the publishing side may well have changed, and the RP side has definitely changed in some ways. Not that it invalidates the IMC paper: far from it. The point would be, to see if it can help show there has been a substantive change in the system overall. Do you think a re-measure is achievable as a low(ish) cost activity?
but we have had this discussion before.
Yea, I know, but the problem is we've arrived at needing to boost resiliency against scale, and rsync is a really poor fit for the problem because of the fact most CDN choices are tuned for HTTP and not arbitrary TCP protocols.
your emergency due to lack of planning and action does not motivate me
I think this is a poor characterisation of what should be done, and what the cost/benefit issues are. Suffice to say we have plans, and we are acting. The "emergency" such as there is one, is, that during the deployment and planning, service levels are going to continue to be open to question. I have this work timed for Q3/4 in 2021 because I have a larger body of un-related work in Q1/2. The distribution of service into self-hosted raises concerns for me that no amount of work in the RIR will fix. We have been promoting "publish in parent" because it helps to reduce the points of connect, which are going to tend to be SPF for many self-hosted people until they also put their publication states into a resilient fabric. We're improving our own resiliency all the time. I discussed some RTT outcomes today with Job, in RRDP he can see 300ms drop to 5ms from the CDN/DNS solution we use, which is a significant improvement in RTT, and load sharing. I cannot achieve that in the non-web protocol because nobody can offer cache for the datastream in question. I can do better than 300ms delay (which RobA frequently pointed out made APNIC look particularly eggregious on the long-haul datapath, because rsync is innately serialised read/write function) if I can get enough points of presence behind rsync, but then I get a coherency problem, which the CDN for web guys solved. Its just hard to fix this, in rsync. You know this, and its one of the reasons I wanted to promote deprecation. It might help, if publication-as-a-service was a thing, and we all decided to put the publication burden into prime agents, we paid to do this under SLA. That has problems of its own, in terms of governance, maybe it needs to be a market. But, thats kind-of how the DNS works. There's a label, its served by different people, sometimes they administrate the boxes directly, sometimes they use intermediaries, we measure the effectiveness of them against load, it mostly works. I wouldn't have a problem if there was a declared market price to do publication protocol into AWS, Cloudflare, Fastly, GCP, same protocol endpoint, they do the rest once you write objects in. It might well be significantly more resilient than what we're trying to do now. Hosting the TA function, the HSM bound functions, I don't think we've hit significant stresses yet. RIPE are looking at dual-redundant signer models. There are cloud-HSM services. -G
randy