HI Cynthia I don't take criticism personally. I am known for my wild ideas...occasionally I come up with a good one :) But let's take another look at this geofeed. Any service offered by or through the RIPE Database should be a high quality and reliable service. It doesn't matter if it is an essential service or not. The reputation of the RIPE Database itself rests on the quality of the services derived from it. You say most consumers of this geofeed data will be software capable of validating the csv file. What will this software do when it finds invalid data? Just ignore it? Will this software know who to report data errors to? Will it have any means to follow up on reported errors? Will anyone notice if the quality of data deteriorates over time? What will be done with data that is considered to be inaccurate (by mistake or deliberate intention)? Will it be reported? Will there be any follow up...or just discarded? Services like geofeed are good ideas. But if the data quality or accessibility deteriorates over time it becomes useless to misleading. That is why I believe centralised validating, testing and reporting are helpful. I think the RIRs are well positioned for doing these tasks and should do more of them. Abuse contacts and geofeed are different things....but they are both secondary services facilitated by the RIPE Database and should be trusted. cheers denis co-chair DB-WG On Wed, 7 Apr 2021 at 08:20, Cynthia Revström <me@cynthia.re> wrote:
Hi Denis,
Apologies if this email comes off as rude or harsh, that isn't my intention, but I am not quite sure how else to phrase it.
So when I say things like this, it's not because of cost or anything like that. I say it because I don't think validating the CSV is something that would be a benefit.
Abuse contacts are validated and required (except for some legacy resources iirc) because they are important in order to report abuse.
Having a geofeed service is not a requirement, and additionally I would think that the data consumer would almost always be software dedicated to this. If so, then that software can easily validate the CSV data itself.
I see abuse contacts and geofeed as very different things considering those 2 things.
-Cynthia
On Wed, Apr 7, 2021, 00:45 denis walker <ripedenis@gmail.com> wrote:
Hi guys
I've changed the subject as it goes a bit off topic and becomes more general and reaches out beyond just the DB-WG. I've been going to say this for a while but never got round to it until now. Apologies for saying it in response to your email Job but it's not directed at you.
There are two phrases that frustrate me every time I see them used: "The RIPE NCC is not the 'xyz' police" "It's not the job of the RIPE NCC to do 'abc'"
These are just dramatised ways of saying no to something. But the drama doesn't really add anything. No one is expecting the RIPE NCC to investigate any crimes or arrest anyone. They are not the 'geoip police', the 'internet police', the 'abuse police'. So what are they? I think everyone would agree that what the RIPE NCC does today is not the same as they did when they first started in business. So the job that they do has changed. Their role or mandate has grown, expanded, contracted, moved sideways, diversified, etc. Every time they started to do something different or new, it could have been said (and maybe was said) that it was not their job to do that. But they are doing it now anyway. So I would rather turn these infamous statements round and be positive instead of negative. Let's stop saying what it's not their job to do and ask if it is, or should/could it be, their job to do something helpful or beneficial.
The internet technical infrastructure is like a whole ecosystem now. Lots of different elements all working together and managed or controlled by large numbers of organisations. If anyone wants to have a good life in this cyber world, all parts of this ecosystem need to be operating well. Many of these elements have no checks or monitoring. They run on trust. Trust is hard to build and easy to lose. Once people lose trust in one element they start to call it a swamp, say it's inaccurate, useless, needs to be replaced. These comments have often been made about the RIPE Database as a whole, often by people partly responsible for it's content. It's also been said about parts of the content like abuse contacts. It could end up being said about geofeed data.
One of the reasons people use to justify these infamous statements is the cost or complexity of doing something. They think to do checks needs FTEs sitting behind desks doing laborious tasks. That costs money for the members. They forget this is the 21st century. We have learned now how to use computers to do these tasks for us. Abuse contact checking is a good example. Every proposal to do anything in this area is repeatedly hit with these infamous statements and more. Perhaps because the technical checks now being done are done the wrong way. If an email address fails the checks it triggers manual intervention requiring an FTE to schedule an ARC with the resource holder and follow up discussions. This should be fully automated. If a monthly check fails, software should send an email to the registered contact for the resource holder. If n monthly checks fail the ORGANISATION object in the RIPE Database should be tagged as having an invalid abuse contact. That information should be available for anyone to see. Public disclosure can be the penalty for failing to handle abuse. People can then make informed decisions.
How does this affect geofeed? The same principles apply here. What we have now is a handful of companies providing geolocation data. I am sure they put a lot of effort into ensuring their data is accurate. This geofeed attribute will delegate this information process out to thousands of organisations. Some of these will put a lot of effort into ensuring their data is valid and accurate. Some may put less effort in, especially over time. If a proportion of this data starts to degrade over time, is shown to be inaccurate or syntactically invalid, trust in the whole system dies. If checks and tests can be done to validate the data in any way it may help to keep it up to date and accurate. If each RIR maintains a list of geofeed urls in a file on the FTP site, each RIR can check availability of those urls each month for all the RIRs lists. I don't know if checks from 5 locations is enough. Maybe a third party system can be used for the 'is it up' check? Any repeated failures can be notified to the resource holders' contact. If each RIR downloads the files for their region they can check the syntax, check for conflicting data in multiple files within a hierarchy, etc. Any failures can be reported to the contact. All of this can be automated. If any repeated errors are not fixed the geofeed data in the RIPE Database can again be tagged as invalid or suspect. When anyone accesses this data it comes with a red flag. It is up to them if they will trust any of that data file.
For both abuse contacts and geofeed, a system can be set up for (trusted) users to report problems. Maybe abuse contacts that are valid but never resolve any reported issues. Or geofeed data that is known to be inaccurate. By adding appropriate tags to the meta data in the RIPE Database which can be publicly viewed this becomes a reputational system. Overall it would improve the quality of data available in or through the RIPE Database, which improves the value of the services. There may be other elements in the database that could benefit from this type of tagging and reporting.
I see the RIPE NCC as being in a good position to do these type of checks and tests. It would not be the RIPE Database software doing the checks, but an additional RIPE NCC service. Minimal costs with fully automated checks can give added benefits. I think it is their job to do this for the good of the internet.
cheers denis co-chair DB-WG
On Tue, 6 Apr 2021 at 19:50, Job Snijders <job@sobornost.net> wrote:
Thanks for the extensive note Denis, thanks Cynthia for being first-responder. I wanted to jump in on a specific subthread.
On Tue, Apr 06, 2021 at 06:38:29PM +0200, Cynthia Revström via db-wg wrote:
Questions:
-Should the database software do any checks on the existence/reachability of the url as part of the update with an error if the check fails?
I would say yes as this is not a new concept to the DB as I believe this is already done with domain objects.
I disagree on this one point, what is the RIPE DB supposed to do when it discovers one state or another? Should the URIs be probed from many vantage points to compare? Once you try to monitor if something is up or down it can quickly become complicated.
The content the 'geofeed:' attribute value references to something outside the RIPE DB, this means the RIPE DB software should not be crawling it.
All RIPE NCC's DB software needs to check is whether the string's syntax conforms to the HTTPS URI scheme.
-Should the RIPE NCC do any periodic repeat checks on the continued existence/reachability of the url?
I would say that checking once a month or so could be fine, as long as it just results in a just a nudge email. Like don't enforce it, but nudge people if it is down.
It seems an unnecessary burden for RIPE NCC's business to check whether a given website is up or down. What is such nudging supposed to accomplish? It might end up being busy work if done by an individual RIR.
-Should the RIPE NCC do any periodic checks on the content structure of the csv file referenced by the url?
I don't have a strong opinion either way here but I feel like that is not really something the NCC is responsible for checking. But if the NCC should check then my comments about the repeat reachability checks above apply here too.
The RIPE NCC should not check random URIs, they are not the GeoIP police ;-)
Kind regards,
Job