The Geofeed: field is a URL. It points to a resource. The semantic content of the resource should not be checked, what matters is that the URL is not a 404 at the time of publication. if you want to check it isn't a 404 after that, its like Lame checks: good to do, not strictly essential in the role of Whois/RPSL. if you want to check the semantic intent of the .csv geo data, thats not db-wb. This work is important. it substantially improves the STEERAGE to find the delegates assertions about geo for their INR. This is sufficiently high value in itself its worth doing. Checking the integrity of what they say goes beyond the role of a steerage/directory function. (my opinion) cheers -G On Thu, Apr 8, 2021 at 8:19 AM Cynthia Revström via db-wg <db-wg@ripe.net> wrote:
Hi,
I just wanted to clarify my stance on validation a bit more.
I am totally against trying to validate the data itself, that is not what the NCC is supposed to do. Validating the format of the CSV might be okay but honestly anything beyond validating that it is not a 404 not found is a bit too much in my opinion.
I also agree with Leo's points with regards to fixing the data, I believe that the data publishers have a pretty strong incentive to have the data be accurate. And as Leo also mentions, the tech-c and/or admin-c contacts are also published so finding a reporting mechanism for issues would not be very difficult.
And with regards to misformatted data, yeah I would probably just ignore that entry if I was writing a parser and log the error and report it to an engineer who can then forward it to the admin contact if they determine it to be a real issue.
In order to not infinitely delay this, I feel like while it shouldn't be rushed, I am not sure how realistic this issue would be and how much harm it would cause to anyone.
Also, changing how much validation is done could be changed in the future if it is shown to be an actual real world problem.
-Cynthia
On Wed, Apr 7, 2021 at 10:58 PM Leo Vegoda <leo@vegoda.org> wrote:
Hi Denis,
This message is in response to several in the discussion .
In brief: I have seen network operators distraught because their network was misclassified as being in the wrong geography for the services their customers needed to access and they had no way to fix that situation. I feel that publishing geofeed data in the RIPE Database would be a good thing to do as it helps network operators share data in a structured way and should reduce the overall amount of pain from misclassified networks.
I personally would like to see an agreement on your draft problem statement and some feedback from the RIPE NCC before focusing on some of the more detailed questions you raised.
I also agree with you that accurate and reliable data is important. But...
On Wed, Apr 7, 2021 at 7:19 AM denis walker via db-wg <db-wg@ripe.net> wrote:
[...]
You say most consumers of this geofeed data will be software capable of validating the csv file. What will this software do when it finds invalid data? Just ignore it? Will this software know who to report data errors to? Will it have any means to follow up on reported errors?
I would have thought that anyone implementing a parser for this data would also be able to query the database for a tech-c and report validation failures. Based on my previous interactions with the network operators who have suffered misclassification, I am confident that there is a strong incentive for networks to publish well formatted accurate data and to fix any errors quickly.
That said, there are many possible ways to reduce the risk of badly formatted data. For instance, the RIPE NCC could offer a tool to create the relevant files to be published through the LIR Portal or as a standalone tool. This is why I'd like to see feedback from the RIPE NCC ahead of an implementation discussion.
Services like geofeed are good ideas. But if the data quality or accessibility deteriorates over time it becomes useless to misleading. That is why I believe centralised validating, testing and reporting are helpful. I think the RIRs are well positioned for doing these tasks and should do more of them.
I agree with you that defining what data means and keeping it accurate is important. But in the case of geo data, could the RIPE NCC validate the content as well as the data structures? I'd have thought that the publishers and the users of the data would be in the best position to do that. Am I wrong?
Kind regards,
Leo