Proposal: Measure well-known CDNs,[CDN-HTTP]
Dear RIPE Atlas users, We recently published a RIPE Labs article containing a few proposals: https://labs.ripe.net/author/kistel/five-proposals-for-a-better-ripe-atlas/. We'd like to encourage you to express your comments about this proposal (if you'd like to share them) here. Regards, Robert Kisteleki For the RIPE Atlas team
On Wed, Dec 14, 2022 at 03:02:46PM +0100, Robert Kisteleki <robert@ripe.net> wrote a message of 15 lines which said:
We recently published a RIPE Labs article containing a few proposals: https://labs.ripe.net/author/kistel/five-proposals-for-a-better-ripe-atlas/. We'd like to encourage you to express your comments about this proposal (if you'd like to share them) here.
(In the Cons) "Possible arguments about which provider to include in this set and which to refuse." There is a larger problem here, a more strategic one: such a feature would contribute to the centralisation of the Internet, which is already too important. Tagging some targets are "important" and "worthy of measurements" would mean that we consider some HTTP servers to be more useful than others. That would be a bad message from RIPE.
On Dec 16, 2022, at 23:29, Stephane Bortzmeyer <bortzmeyer@nic.fr> wrote:
There is a larger problem here, a more strategic one: such a feature would contribute to the centralisation of the Internet, which is already too important. Tagging some targets are "important" and "worthy of measurements" would mean that we consider some HTTP servers to be more useful than others. That would be a bad message from RIPE.
We’ve come full circle - we started with centralized PTTs - moved to a decentralized ASN/Paul Baran model - now Re-centralized based on marketing domination. +1 with Stephane’s observation. The selection of who to measure is a statement.
On 17/12/2022 00:19, Barry Raveendran Greene wrote:
On Dec 16, 2022, at 23:29, Stephane Bortzmeyer <bortzmeyer@nic.fr> wrote:
There is a larger problem here, a more strategic one: such a feature would contribute to the centralisation of the Internet, which is already too important. Tagging some targets are "important" and "worthy of measurements" would mean that we consider some HTTP servers to be more useful than others. That would be a bad message from RIPE.
We’ve come full circle - we started with centralized PTTs - moved to a decentralized ASN/Paul Baran model - now Re-centralized based on marketing domination.
+1 with Stephane’s observation. The selection of who to measure is a statement.
+1 Also, while the data would be useful, I don't think the role of the ripe ncc is to grade commercial services. Let other companies or individual researchers do that.
On 17.12.22 03:00, Massimo Candela wrote:
On 17/12/2022 00:19, Barry Raveendran Greene wrote:
On Dec 16, 2022, at 23:29, Stephane Bortzmeyer <bortzmeyer@nic.fr> wrote:
There is a larger problem here, a more strategic one: such a feature would contribute to the centralisation of the Internet, which is already too important. Tagging some targets are "important" and "worthy of measurements" would mean that we consider some HTTP servers to be more useful than others. That would be a bad message from RIPE.
We’ve come full circle - we started with centralized PTTs - moved to a decentralized ASN/Paul Baran model - now Re-centralized based on marketing domination.
+1 with Stephane’s observation. The selection of who to measure is a statement.
+1
Also, while the data would be useful, I don't think the role of the ripe ncc is to grade commercial services. Let other companies or individual researchers do that.
On the one hand i agree with Stephane, but on the other hand, it is a fact that there are a few large providers/CDNs who have a significant share... of specific webservices. I don't like that either, but i can't deny it either. I agree, that manually selecting those providers/CDNs for such a measurement could be understood as a statement. But maybe there's another way, to select which one of those providers/CDNs to measure? Instead of manually selecting them, there could be some sort of "threshold" which a provider/CDN has to reach, to be part of this kind of measurement. This way, it wouldn't be a statement, but a static delimitation. I am unsure what kind of threshold it could be, and how to detect it. It should probably be some sort of technical value. BR, Simon
Hello, On Tue, 20 Dec 2022 at 14:38, <ripe.net@toppas.net> wrote:
But maybe there's another way, to select which one of those providers/CDNs to measure? Instead of manually selecting them, there could be some sort of "threshold" which a provider/CDN has to reach
One layer of indirection doesn't solve that. You are just selecting CDN indirectly then, by picking an arbitrary "threshold" that works well for group A and doesn't for group B. I also really can't come up with a single publicly verifiable value that would work well for this. I doubt that the availability of favicon.ico on a certain CDN is a useful health check; lots of things can go wrong at a CDN which do not impair favicon.ico delivery. I think the entire idea of CDN-HTTP is flawed and I would much rather see RIPE invest their resources into improving ATLAS generally. ATLAS is not downdetector.com or Cloudflare Radar and it isn't a tool for end-users. I don't see a lot of value in CDN-HTTP, but lots of problems. I'm suggesting dropping CDN-HTTP altogether and focusing the effort on GENERIC-HTTP instead which can be very useful for everyone. However it needs some discussion: - where have those security concerns been previously discussed? The arguments sound a little hand-waving'y to me. Is this really that big of a deal that we need an opt-in approach? Are you suggesting that people deploy ATLAS probes in security sensitive inside parts of corporate networks? And those security concerns affect only GENERIC-HTTP not other currently available measurements like DNS? NLNOG RING provides *root access* to Ubuntu VMs in a large number of organizations across the globe, we are talking about small HTTP HEAD and GET requests here. I agree that with an opt-in approach this feature is likely useless, for questionable security concerns. - I think HTTPS would be very useful, considering that we are already talking about STARTTLS (which seems more complicated to me), is HTTPS support for GENERIC-HTTP really that long down the road? I'm talking about HTTP/1.1 over TLS. - data retention policies can always be modified at any point in time based on actual data, I don't think this is a big deal -- cheers, lukas
On Tue, Dec 20, 2022 at 05:48:08PM +0100, Lukas Tribus <lukas@ltri.eu> wrote a message of 60 lines which said:
- where have those security concerns been previously discussed?
Several times on this list. This is a recurring discussion, for many years.
Are you suggesting that people deploy ATLAS probes in security sensitive inside parts of corporate networks?
I believe that the concerns were more about the security of the server than the security of the probe. Nobody wants Atlas to be used as a botnet against unsuspecting HTTP servers.
And those security concerns affect only GENERIC-HTTP not other currently available measurements like DNS?
For a typical DNS server, the "cost" does not depend on the request (at least for authoritative DNS servers). On the contrary, for HTTP, the cost can vary immensely from a static favicon.ico to a request involving many SQL statements.
we are talking about small HTTP HEAD and GET requests here.
The GET can be small but incurring a huge cost for the server.
Hello, On Thu, 29 Dec 2022 at 15:15, Stephane Bortzmeyer <bortzmeyer@nic.fr> wrote:
On Tue, Dec 20, 2022 at 05:48:08PM +0100, Lukas Tribus <lukas@ltri.eu> wrote a message of 60 lines which said:
- where have those security concerns been previously discussed?
Several times on this list. This is a recurring discussion, for many years.
Are you suggesting that people deploy ATLAS probes in security sensitive inside parts of corporate networks?
I believe that the concerns were more about the security of the server than the security of the probe. Nobody wants Atlas to be used as a botnet against unsuspecting HTTP servers.
I was specifically addressing the decision to make this an "opt-in" option for the probe owner. I fully agree with the proposal to limit the request to HEAD and GET with limited response size. With the proposed limitations, the fact that measurements are public, atlas credits have non-zero costs, I don't have huge concerns about Denial of service/load concerns of the measurement destination (HTTP) servers. However this is orthogonal to the concerns about load/traffic of the probe itself, which is why I believe we have a proposal of an opt-in option per probe.
And those security concerns affect only GENERIC-HTTP not other currently available measurements like DNS?
For a typical DNS server, the "cost" does not depend on the request (at least for authoritative DNS servers). On the contrary, for HTTP, the cost can vary immensely from a static favicon.ico to a request involving many SQL statements.
Additional limitations may be "global destination based rate limiting" or robots.txt parsing (User-Agent/request URI disallowed from robots.txt -> stop there), although I believe that with the proposed limitations botnets are already an order of magnitude more efficient for an attacker. However this is unrelated to whether or not HTTP measurements are opt-in for probe owners or not, which is what my entire point was based on. thanks, lukas
we are talking about small HTTP HEAD and GET requests here.
The GET can be small but incurring a huge cost for the server.
On Dec 20, 2022, at 8:38 AM, ripe.net@toppas.net wrote:
On the one hand i agree with Stephane, but on the other hand, it is a fact that there are a few large providers/CDNs who have a significant share... of specific webservices. I don't like that either, but i can't deny it either. I agree, that manually selecting those providers/CDNs for such a measurement could be understood as a statement. But maybe there's another way, to select which one of those providers/CDNs to measure? Instead of manually selecting them, there could be some sort of "threshold" which a provider/CDN has to reach, to be part of this kind of measurement. This way, it wouldn't be a statement, but a static delimitation. I am unsure what kind of threshold it could be, and how to detect it. It should probably be some sort of technical value.
I’m also thinking of those of us that also have multiple CDN planes, etc.. and the fact that if we give out a test-point, those customers tend to get billed for the usage. There’s also a lot of regionality to content and things, it’s unlikely you would see the same performance in a far-flung geolocation to a primarily US property or vice-versa. Even if you were to do $employer.com that doesn’t mean that content is hosted on all our servers, they have various functions and roles on our side, so you end up with all the measurement bias that would occur from there. I know that things are similar at other CDNs. - Jared
Hello! Disclaimer: I work for the Wikimedia Foundation. I see some of the pushbacks on this proposal are about running measurements towards commercial services. One middle ground here might be to run such measurements towards non for profits entities that run their own CDN. Obviously I'm thinking of Wikipedia (and we would be happy to support such a goal) but there might be others as well (eg. Internet Archives). Thanks On Tue, Dec 20, 2022 at 6:00 PM Jared Mauch <jared@puck.nether.net> wrote:
On Dec 20, 2022, at 8:38 AM, ripe.net@toppas.net wrote:
On the one hand i agree with Stephane, but on the other hand, it is a fact that there are a few large providers/CDNs who have a significant share... of specific webservices. I don't like that either, but i can't deny it either. I agree, that manually selecting those providers/CDNs for such a measurement could be understood as a statement. But maybe there's another way, to select which one of those providers/CDNs to measure? Instead of manually selecting them, there could be some sort of "threshold" which a provider/CDN has to reach, to be part of this kind of measurement. This way, it wouldn't be a statement, but a static delimitation. I am unsure what kind of threshold it could be, and how to detect it. It should probably be some sort of technical value.
I’m also thinking of those of us that also have multiple CDN planes, etc.. and the fact that if we give out a test-point, those customers tend to get billed for the usage. There’s also a lot of regionality to content and things, it’s unlikely you would see the same performance in a far-flung geolocation to a primarily US property or vice-versa.
Even if you were to do $employer.com that doesn’t mean that content is hosted on all our servers, they have various functions and roles on our side, so you end up with all the measurement bias that would occur from there. I know that things are similar at other CDNs.
- Jared -- ripe-atlas mailing list ripe-atlas@ripe.net https://lists.ripe.net/mailman/listinfo/ripe-atlas
-- Arzhel
participants (8)
-
Arzhel Younsi
-
Barry Raveendran Greene
-
Jared Mauch
-
Lukas Tribus
-
Massimo Candela
-
ripe.net@toppas.net
-
Robert Kisteleki
-
Stephane Bortzmeyer