Hi Denis, I have CCed Randy Bush as I thought he might be able to clarify what was meant by the following:
To minimize the load on RIR whois [RFC3912] services, use of the RIR's FTP [RFC0959] services SHOULD be the preferred access. This also provides bulk access instead of fetching with a tweezers.
I think one of the most important things in general here is seeing what is within the scope of the db-wg to decide and what should probably be defined by the IETF spec. And also to try to get some kind of implementation out as quickly as possible while still doing it properly. Because as you mention, the remarks format appears to be used quite a bit and I imagine it probably grows in use at a decent rate. (I have no data to back this up though, it is just a guess) I have responded to some of the questions below.
Problem statement
Associating an approximate physical location with an IP address has proven to be a challenge to solve within the current constraints of the RIPE Database. Over the years the community has chosen to consider addresses in the RIPE Database to relate to entities in the assignment process itself, not the subsequent actual use of IP addresses after assignment.
The working group is asked to consider whether the RIPE Database can be used as a springboard for parties wishing to correlate geographical information with IP addresses by allowing structured references in the RIPE Database towards information outside the RIPE Database which potentially helps answer Geo IP Location queries
The IETF is currently discussing an update to RPSL to add a new attribute "geofeed: url". The url will reference a csv file containing location data. Some users have already started to make use of this feature via the "remarks: geofeed: url". It is never a good idea to try to overload structured data into the free format "remarks:" attribute. This has been done in the past, for example with abuse contact details before we introduced the "abuse-c:" attribute. There is no way to regulate what database users put into "remarks:" attributes. So even if the new "geofeed:" attribute is not agreed, the url data will still be included in the RIPE Database.
Currently there are 24,408 INETNUM and 516,354 INET6NUM objects containing a "remarks: geofeed: url" attribute in the database. These have 7,731 distinct values in the INETNUMs and 1,045 distinct values in the INET6NUMs.
Solution definition
Implement a new "geofeed:" attribute according to the IETF's definition. Although the IETF has not yet concluded discussions on this attribute we can still implement it in the RIPE Database RPSL data definition. The RIPE Database already has many local differences to the RPSL standard. As expressed in the Problem statement, users are already using the geofeed data by overloading the "remarks:" attribute. That is a dirty hack which should be avoided.
An invalid formated url will be a syntax error.
The RIPE NCC will perform a one time conversion of the existing data to convert "remarks: geofeed: url" to "geofeed: url".
I think this might need more careful consideration as all software consuming this data might not support this instantly as it is still a draft. Converting while still keeping both is totally fine with me. (as in adding the geofeed: url based on the remarks: geofeed: url)
If an update then contains a "remarks: geofeed: url" attribute, the update will be successful and the response should include an appropriate Warning message. At some point in the (near) future, this could be changed to an update failure as a syntax error.
An update containing a "geofeed:" and a "remarks: geofeed:" attribute or more than one "remarks: geofeed:" will be a syntax error.
I don't agree with this, I don't think there should be any checking on syntax of remarks. If it is just a warning, that is fine, but I am not too positive on the idea of having syntax errors on free form data like remarks. Only one geofeed attribute should be allowed, but other than warnings, I am not too positive on the syntax checking of remarks.
The resource holder should be able to create, modify, delete the "geofeed:" attribute in allocation objects.
I feel this might be a bit vague, and just clarifying that it is inet(6)num objects (and potentially organisation objects) that can have this attribute. And that it is set by the maintainer of the object, provided that it fulfills the syntax requirement and any reachability requirements, just like domain objects. This is just to clarify that it would just be a database thing and wouldn't be done via the LIR portal or similar. And also that if the mnt-by was delegated, that maintainer would have the auth to change it.
Questions:
-Should the database software do any checks on the existence/reachability of the url as part of the update with an error if the check fails?
I would say yes as this is not a new concept to the DB as I believe this is already done with domain objects.
-Should the RIPE NCC do any periodic repeat checks on the continued existence/reachability of the url?
I would say that checking once a month or so could be fine, as long as it just results in a just a nudge email. Like don't enforce it, but nudge people if it is down.
-Should the RIPE NCC do any periodic checks on the content structure of the csv file referenced by the url?
I don't have a strong opinion either way here but I feel like that is not really something the NCC is responsible for checking. But if the NCC should check then my comments about the repeat reachability checks above apply here too.
-Should the Solution definition define how this will be adopted into RDAP or should we simply ask the RIPE NCC to define this in their impact assessment?
I don't mind either way.
-The RIPE Database contains hierarchical address space objects. Should it be acceptable for "geofeed:" attributes to exist at multiple levels within a specific hierarchy?
In my opinion, yes. I have not thought too deeply about this yet but currently I can't think of a good reason not to.
-Suppose a geofeed csv file referenced by a /16 INETNUM object contains location data for the whole /16. Then a more specific /24 INETNUM object references another geofeed csv file that contains conflicting location data for this /24. Should this be a concern for the RIPE Database?
The most specific geofeed should be returned in my opinion.
-Should geofeed data be inherited? If you query for a /24 that does not contain a "geofeed:" attribute, but a less specific /16 does contain a "geofeed:" attribute, should this data be returned? In other words could it be used in a similar way to "abuse-c:"?
I think it should be handled like abuse-c. I have not thought too much about what complications this might have yet though but currently I can't see any issues with it.
-Thinking ahead to how people will actually deploy this data and what short cuts they could make. It is said that when reading a geofeed csv file, consumers of the data should ignore all data within that file not directly concerning the address space queried in the RIPE Database. Could you therefore create a single csv file with location data for all your address space and reference the same file in all your RIPE Database address objects? The address space owner could rely on the data consumer to pick out the correct piece of data for the relevant address space. The manager of the csv file then only has to work with one file. If this is possible and does happen (which the IETF doc 'Finding geofeeds' seems to suggest is possible for unsigned geofeed data), would it therefore make sense to apply "geofeed:" hierarchically as with "abuse-c:"? Allow a single, default "geofeed:" attribute in the ORGANISATION object to be applied to all that organisations address space, with the option of specific localised "geofeed:" attributes in address space objects. That could be a neater solution, and easier to setup, than applying thousands of references to the same geofeed file at a more specific level in the database.
I am well aware that this kind of stuff happens in practice even with IP space between different RIRs (like ARIN and RIPE NCC space in the same CSV). I feel like the NCC shouldn't really be concerned about the data in the CSV file but rather just about publishing the URL to it.
-Relating to the above 3 questions, should geofeed data only be considered applicable if returned by a specific geofeed locater application which takes into account the hierarchical nature of address space in the database? Otherwise do the standard database query mechanisms have to take into account this hierarchy and locate the most specific "geofeed:" attribute from the less specific objects?
I don't quite understand the question here? Could you please clarify?
-Should/could the RIPE Database return the csv file as part of the query? If so should the file be cached (for how long?) to avoid too many downloads?
I don't see a reason for this, especially as I could imagine these lists being huge in some cases. It feels like it might be opening the NCC up to unnecessary liability considering the privacy concerns. (clarification: I don't think the database should return the CSV)
-Should we only allow HTTPS urls? (Which the IETF doc 'Finding geofeeds' seems to suggest)
The current draft seems pretty clear on this to me, and it also makes sense to me, so I would say yes. And if not, then it should be restricted to only http and https (aka not ftp or any other protocols).
-Should the RIPE NCC go ahead and implement this now, with our own set of RIPE rules? Or should we try to coordinate this and agree a set of common rules between all the RIRs before any deployment in the RIPE Database?
I would say that the RIPE NCC should implement this quickly as the remarks method already has quite a bit of use and it will probably keep growing quickly. Especially as this seems sort of basic in terms of the frontend, like how the URLs are validated could change in the future without the WHOIS format changing. With regards to the potential list in the RIR FTP, that should maybe be coordinated with different RIRs or specified in the IETF spec.
-For the legal review, there are 2 statements in the IETF doc 'Finding geofeeds' which may be of concern: *[RFC8805] geofeed data may reveal the approximate location of an IP address, which might in turn reveal the approximate location of an individual user. Unfortunately, [RFC8805] provides no privacy guidance on avoiding or ameliorating possible damage due to this exposure of the user. In publishing pointers to geofeed files as described in this document the operator should be aware of this exposure in geofeed data and be cautious. All the privacy considerations of [RFC8805] Section 4 apply to this document. *It is significant that geofeed data may have finer granularity than the inetnum: which refers to them.
It is clear that the RIPE NCC cannot prevent this data being referenced by objects in the RIPE Database. It is already being referenced from "remarks:" attributes. Perhaps the RIPE NCC should require (as part of their service agreement) that it's members obtain written consent from their customers to publish this location data, or at least inform the customers in writing that it will be published.
Also, although RFC8805 says postcode is deprecated it is still provided for in the csv files. So anyone can still enter location data to this detail.
I am not a lawyer by any means, but I don't see this necessarily being an issue as long as the NCC just links to URLs provided by resource holders. And with regards to it being part of the service agreement I feel like that would be very complicated when you consider PI resources etc. I think trying to get written consent from resource holder's customers should absolutely be avoided if possible. I don't see why anyone would actually be doing this kind of stuff and seems like it would probably be rare if it happens without the customers consent. (as in putting it down to a very specific place)
-The IETF doc 'Finding geofeeds' suggests that geofeed information 'will be' available in bulk accessed whois data. In view of the privacy concerns above, is this likely?
Can you clarify where this is mentioned? Is it part of the quote below?
-The IETF doc 'Finding geofeeds' says "To minimize the load on RIR whois [RFC3912] services, use of the RIR's FTP [RFC0959] services SHOULD be the preferred access." Is the RIPE NCC expected to download all the geofeed files and make them available through their FTP service?
I don't quite interpret it like that, I rather interpret it as the RIPE NCC (and other RIRs) publishing a list of all prefixes and their geofeed URL. I imagine it like the delegated file, but just prefixes and geofeed URLs. But I will say it seems a bit unclear, maybe Randy Bush or one of the other authors could comment on the intention here.
-The IETF doc 'Finding geofeeds' states that consumers of the geofeed data MUST NOT access this data in real time via the RPSL servers 'too frequently' or at 'magic times like midnight'. Some users will do whatever they want to do if they are able to do, regardless of any statements to the contrary. Should the RIPE NCC enforce such access rules by some means?
For access via WHOIS, I would say no as that would probably be way too complicated. If a list like I suggested above was to be implemented, then I guess it could be implemented to make sure people didn't pull it down every 5 minutes as it would probably be a pretty large file.
References
The IETF doc 'Finding geofeeds': https://datatracker.ietf.org/doc/draft-ietf-opsawg-finding-geofeeds/?include...
geofeed file format: https://www.rfc-editor.org/rfc/rfc8805.html
-Cynthia