Re: [db-wg] New NWI for geofeed?

6 Apr 2021

      Hi Denis,

I have CCed Randy Bush as I thought he might be able to clarify what
was meant by the following:
...
To minimize the load on RIR whois [RFC3912] services, use of the
RIR's FTP [RFC0959] services SHOULD be the preferred access.  This
also provides bulk access instead of fetching with a tweezers.
I think one of the most important things in general here is seeing
what is within the scope of the db-wg to decide and what should
probably be defined by the IETF spec.
And also to try to get some kind of implementation out as quickly as
possible while still doing it properly.
Because as you mention, the remarks format appears to be used quite a
bit and I imagine it probably grows in use at a decent rate. (I have
no data to back this up though, it is just a guess)

I have responded to some of the questions below.
...
Problem statement
Associating an approximate physical location with an IP address has
proven to be a challenge to solve within the current constraints of
the RIPE Database. Over the years the community has chosen to consider
addresses in the RIPE Database to relate to entities in the assignment
process itself, not the subsequent actual use of IP addresses after
assignment.
The working group is asked to consider whether the RIPE Database can
be used as a springboard for parties wishing to correlate geographical
information with IP addresses by allowing structured references in the
RIPE Database towards information outside the RIPE Database which
potentially helps answer Geo IP Location queries
The IETF is currently discussing an update to RPSL to add a new
attribute "geofeed: url". The url will reference a csv file containing
location data. Some users have already started to make use of this
feature via the "remarks: geofeed: url". It is never a good idea to
try to overload structured data into the free format "remarks:"
attribute. This has been done in the past, for example with abuse
contact details before we introduced the "abuse-c:" attribute. There
is no way to regulate what database users put into "remarks:"
attributes. So even if the new "geofeed:" attribute is not agreed, the
url data will still be included in the RIPE Database.
Currently there are 24,408 INETNUM and 516,354 INET6NUM objects
containing a "remarks: geofeed: url" attribute in the database. These
have 7,731 distinct values in the INETNUMs and 1,045 distinct values
in the INET6NUMs.
Solution definition
Implement a new "geofeed:" attribute according to the IETF's
definition. Although the IETF has not yet concluded discussions on
this attribute we can still implement it in the RIPE Database RPSL
data definition. The RIPE Database already has many local differences
to the RPSL standard. As expressed in the Problem statement, users are
already using the geofeed data by overloading the "remarks:"
attribute. That is a dirty hack which should be avoided.
An invalid formated url will be a syntax error.
...
The RIPE NCC will perform a one time conversion of the existing data
to convert "remarks: geofeed: url" to "geofeed: url".
I think this might need more careful consideration as all software
consuming this data might not support this instantly as it is still a
draft.
Converting while still keeping both is totally fine with me. (as in
adding the geofeed: url based on the remarks: geofeed: url)
...
If an update then contains a "remarks: geofeed: url" attribute, the
update will be successful and the response should include an
appropriate Warning message. At some point in the (near) future, this
could be changed to an update failure as a syntax error.
An update containing a "geofeed:" and a "remarks: geofeed:" attribute
or more than one "remarks: geofeed:" will be a syntax error.
I don't agree with this, I don't think there should be any checking on
syntax of remarks.
If it is just a warning, that is fine, but I am not too positive on
the idea of having syntax errors on free form data like remarks.
Only one geofeed attribute should be allowed, but other than warnings,
I am not too positive on the syntax checking of remarks.
...
The resource holder should be able to create, modify, delete the
"geofeed:" attribute in allocation objects.
I feel this might be a bit vague, and just clarifying that it is
inet(6)num objects (and potentially organisation objects) that can
have this attribute.
And that it is set by the maintainer of the object, provided that it
fulfills the syntax requirement and any reachability requirements,
just like domain objects.
This is just to clarify that it would just be a database thing and
wouldn't be done via the LIR portal or similar.
And also that if the mnt-by was delegated, that maintainer would have
the auth to change it.
...
Questions:
...
-Should the database software do any checks on the
existence/reachability of the url as part of the update with an error
if the check fails?
I would say yes as this is not a new concept to the DB as I believe
this is already done with domain objects.
...
-Should the RIPE NCC do any periodic repeat checks on the continued
existence/reachability of the url?
I would say that checking once a month or so could be fine, as long as
it just results in a just a nudge email.
Like don't enforce it, but nudge people if it is down.
...
-Should the RIPE NCC do any periodic checks on the content structure
of the csv file referenced by the url?
I don't have a strong opinion either way here but I feel like that is
not really something the NCC is responsible for checking.
But if the NCC should check then my comments about the repeat
reachability checks above apply here too.
...
-Should the Solution definition define how this will be adopted into
RDAP or should we simply ask the RIPE NCC to define this in their
impact assessment?
I don't mind either way.
...
-The RIPE Database contains hierarchical address space objects. Should
it be acceptable for "geofeed:" attributes to exist at multiple levels
within a specific hierarchy?
In my opinion, yes.
I have not thought too deeply about this yet but currently I can't
think of a good reason not to.
...
-Suppose a geofeed csv file referenced by a /16 INETNUM object
contains location data for the whole /16. Then a more specific /24
INETNUM object references another geofeed csv file that contains
conflicting location data for this /24. Should this be a concern for
the RIPE Database?
The most specific geofeed should be returned in my opinion.
...
-Should geofeed data be inherited? If you query for a /24 that does
not contain a "geofeed:" attribute,  but a less specific /16 does
contain a "geofeed:" attribute, should this data be returned? In other
words could it be used in a similar way to "abuse-c:"?
I think it should be handled like abuse-c.
I have not thought too much about what complications this might have
yet though but currently I can't see any issues with it.
...
-Thinking ahead to how people will actually deploy this data and what
short cuts they could make. It is said that when reading a geofeed csv
file, consumers of the data should ignore all data within that file
not directly concerning the address space queried in the RIPE
Database. Could you therefore create a single csv file with location
data for all your address space and reference the same file in all
your RIPE Database address objects? The address space owner could rely
on the data consumer to pick out the correct piece of data for the
relevant address space. The manager of the csv file then only has to
work with one file. If this is possible and does happen (which the
IETF doc 'Finding geofeeds' seems to suggest is possible for unsigned
geofeed data), would it therefore make sense to apply "geofeed:"
hierarchically as with "abuse-c:"? Allow a single, default "geofeed:"
attribute in the ORGANISATION object to be applied to all that
organisations address space, with the option of specific localised
"geofeed:" attributes in address space objects. That could be a neater
solution, and easier to setup, than applying thousands of references
to the same geofeed file at a more specific level in the database.
I am well aware that this kind of stuff happens in practice even with
IP space between different RIRs (like ARIN and RIPE NCC space in the
same CSV).
I feel like the NCC shouldn't really be concerned about the data in
the CSV file but rather just about publishing the URL to it.
...
-Relating to the above 3 questions, should geofeed data only be
considered applicable if returned by a specific geofeed locater
application which takes into account the hierarchical nature of
address space in the database? Otherwise do the standard database
query mechanisms have to take into account this hierarchy and locate
the most specific "geofeed:" attribute from the less specific objects?
I don't quite understand the question here?
Could you please clarify?
...
-Should/could the RIPE Database return the csv file as part of the
query? If so should the file be cached (for how long?) to avoid too
many downloads?
I don't see a reason for this, especially as I could imagine these
lists being huge in some cases.
It feels like it might be opening the NCC up to unnecessary liability
considering the privacy concerns.
(clarification: I don't think the database should return the CSV)
...
-Should we only allow HTTPS urls? (Which the IETF doc 'Finding
geofeeds' seems to suggest)
The current draft seems pretty clear on this to me, and it also makes
sense to me, so I would say yes.
And if not, then it should be restricted to only http and https (aka
not ftp or any other protocols).
...
-Should the RIPE NCC go ahead and implement this now, with our own set
of RIPE rules? Or should we try to coordinate this and agree a set of
common rules between all the RIRs before any deployment in the RIPE
Database?
I would say that the RIPE NCC should implement this quickly as the
remarks method already has quite a bit of use and it will probably
keep growing quickly.
Especially as this seems sort of basic in terms of the frontend, like
how the URLs are validated could change in the future without the
WHOIS format changing.
With regards to the potential list in the RIR FTP, that should maybe
be coordinated with different RIRs or specified in the IETF spec.
...
-For the legal review, there are 2 statements in the IETF doc 'Finding
geofeeds' which may be of concern:
*[RFC8805] geofeed data may reveal the approximate location of an IP
address, which might in turn reveal the approximate location of an
individual user.  Unfortunately, [RFC8805] provides no privacy
guidance on avoiding or ameliorating possible damage due to this
exposure of the user. In publishing pointers to geofeed files as
described in this document the operator should be aware of this
exposure in geofeed data and be cautious. All the privacy
considerations of [RFC8805] Section 4 apply to this document.
*It is significant that geofeed data may have finer granularity than
the inetnum: which refers to them.
It is clear that the RIPE NCC cannot prevent this data being
referenced by objects in the RIPE Database. It is already being
referenced from "remarks:" attributes. Perhaps the RIPE NCC should
require (as part of their service agreement) that it's members obtain
written consent from their customers to publish this location data, or
at least inform the customers in writing that it will be published.
Also, although RFC8805 says postcode is deprecated it is still
provided for in the csv files. So anyone can still enter location data
to this detail.
I am not a lawyer by any means, but I don't see this necessarily being
an issue as long as the NCC just links to URLs provided by resource
holders.
And with regards to it being part of the service agreement I feel like
that would be very complicated when you consider PI resources etc.
I think trying to get written consent from resource holder's customers
should absolutely be avoided if possible.
I don't see why anyone would actually be doing this kind of stuff and
seems like it would probably be rare if it happens without the
customers consent. (as in putting it down to a very specific place)
...
-The IETF doc 'Finding geofeeds' suggests that geofeed information
'will be' available in bulk accessed whois data. In view of the
privacy concerns above, is this likely?
Can you clarify where this is mentioned?
Is it part of the quote below?
...
-The IETF doc 'Finding geofeeds' says "To minimize the load on RIR
whois [RFC3912] services, use of the RIR's FTP [RFC0959] services
SHOULD be the preferred access." Is the RIPE NCC expected to download
all the geofeed files and make them available through their FTP
service?
I don't quite interpret it like that, I rather interpret it as the
RIPE NCC (and other RIRs) publishing a list of all prefixes and their
geofeed URL.
I imagine it like the delegated file, but just prefixes and geofeed URLs.

But I will say it seems a bit unclear, maybe Randy Bush or one of the
other authors could comment on the intention here.
...
-The IETF doc 'Finding geofeeds' states that consumers of the geofeed
data MUST NOT access this data in real time via the RPSL servers 'too
frequently' or at 'magic times like midnight'. Some users will do
whatever they want to do if they are able to do, regardless of any
statements to the contrary. Should the RIPE NCC enforce such access
rules by some means?
For access via WHOIS, I would say no as that would probably be way too
complicated.
If a list like I suggested above was to be implemented, then I guess
it could be implemented to make sure people didn't pull it down every
5 minutes as it would probably be a pretty large file.
...
References
The IETF doc 'Finding geofeeds':
https://datatracker.ietf.org/doc/draft-ietf-opsawg-finding-geofeeds/?include...
geofeed file format:
https://www.rfc-editor.org/rfc/rfc8805.html
-Cynthia

Re: [db-wg] New NWI for geofeed?

Cynthia Revström