New subject: fulltextsearch question

6 Apr 2021

      Colleagues

[Apologies for the length of this email...]

The chairs would like to suggest creating a new NWI for the "geofeed:"
attribute and suggest the following draft Problem statement and
Solution definition. If there is agreement on this the RIPE NCC will
do an impact assessment including legal review and summary of what the
other RIRs are doing.

There has already been quite a discussion in the WG on this issue with
a lot of support. So hopefully we can reach a consensus quickly, at
least on setting up the NWI and starting the impact assessment. Having
read the latest draft IETF docs, there are some outstanding questions.

Comments and changes are welcome...

cheers
denis
co-chair DB-WG

Problem statement

Associating an approximate physical location with an IP address has
proven to be a challenge to solve within the current constraints of
the RIPE Database. Over the years the community has chosen to consider
addresses in the RIPE Database to relate to entities in the assignment
process itself, not the subsequent actual use of IP addresses after
assignment.

The working group is asked to consider whether the RIPE Database can
be used as a springboard for parties wishing to correlate geographical
information with IP addresses by allowing structured references in the
RIPE Database towards information outside the RIPE Database which
potentially helps answer Geo IP Location queries

The IETF is currently discussing an update to RPSL to add a new
attribute "geofeed: url". The url will reference a csv file containing
location data. Some users have already started to make use of this
feature via the "remarks: geofeed: url". It is never a good idea to
try to overload structured data into the free format "remarks:"
attribute. This has been done in the past, for example with abuse
contact details before we introduced the "abuse-c:" attribute. There
is no way to regulate what database users put into "remarks:"
attributes. So even if the new "geofeed:" attribute is not agreed, the
url data will still be included in the RIPE Database.

Currently there are 24,408 INETNUM and 516,354 INET6NUM objects
containing a "remarks: geofeed: url" attribute in the database. These
have 7,731 distinct values in the INETNUMs and 1,045 distinct values
in the INET6NUMs.

Solution definition

Implement a new "geofeed:" attribute according to the IETF's
definition. Although the IETF has not yet concluded discussions on
this attribute we can still implement it in the RIPE Database RPSL
data definition. The RIPE Database already has many local differences
to the RPSL standard. As expressed in the Problem statement, users are
already using the geofeed data by overloading the "remarks:"
attribute. That is a dirty hack which should be avoided.

An invalid formated url will be a syntax error.

The RIPE NCC will perform a one time conversion of the existing data
to convert "remarks: geofeed: url" to "geofeed: url".

If an update then contains a "remarks: geofeed: url" attribute, the
update will be successful and the response should include an
appropriate Warning message. At some point in the (near) future, this
could be changed to an update failure as a syntax error.

An update containing a "geofeed:" and a "remarks: geofeed:" attribute
or more than one "remarks: geofeed:" will be a syntax error.

The resource holder should be able to create, modify, delete the
"geofeed:" attribute in allocation objects.

Questions:
-Should the database software do any checks on the
existence/reachability of the url as part of the update with an error
if the check fails?

-Should the RIPE NCC do any periodic repeat checks on the continued
existence/reachability of the url?

-Should the RIPE NCC do any periodic checks on the content structure
of the csv file referenced by the url?

-Should the Solution definition define how this will be adopted into
RDAP or should we simply ask the RIPE NCC to define this in their
impact assessment?

-The RIPE Database contains hierarchical address space objects. Should
it be acceptable for "geofeed:" attributes to exist at multiple levels
within a specific hierarchy?

-Suppose a geofeed csv file referenced by a /16 INETNUM object
contains location data for the whole /16. Then a more specific /24
INETNUM object references another geofeed csv file that contains
conflicting location data for this /24. Should this be a concern for
the RIPE Database?

-Should geofeed data be inherited? If you query for a /24 that does
not contain a "geofeed:" attribute,  but a less specific /16 does
contain a "geofeed:" attribute, should this data be returned? In other
words could it be used in a similar way to "abuse-c:"?

-Thinking ahead to how people will actually deploy this data and what
short cuts they could make. It is said that when reading a geofeed csv
file, consumers of the data should ignore all data within that file
not directly concerning the address space queried in the RIPE
Database. Could you therefore create a single csv file with location
data for all your address space and reference the same file in all
your RIPE Database address objects? The address space owner could rely
on the data consumer to pick out the correct piece of data for the
relevant address space. The manager of the csv file then only has to
work with one file. If this is possible and does happen (which the
IETF doc 'Finding geofeeds' seems to suggest is possible for unsigned
geofeed data), would it therefore make sense to apply "geofeed:"
hierarchically as with "abuse-c:"? Allow a single, default "geofeed:"
attribute in the ORGANISATION object to be applied to all that
organisations address space, with the option of specific localised
"geofeed:" attributes in address space objects. That could be a neater
solution, and easier to setup, than applying thousands of references
to the same geofeed file at a more specific level in the database.

-Relating to the above 3 questions, should geofeed data only be
considered applicable if returned by a specific geofeed locater
application which takes into account the hierarchical nature of
address space in the database? Otherwise do the standard database
query mechanisms have to take into account this hierarchy and locate
the most specific "geofeed:" attribute from the less specific objects?

-Should/could the RIPE Database return the csv file as part of the
query? If so should the file be cached (for how long?) to avoid too
many downloads?

-Should we only allow HTTPS urls? (Which the IETF doc 'Finding
geofeeds' seems to suggest)

-Should the RIPE NCC go ahead and implement this now, with our own set
of RIPE rules? Or should we try to coordinate this and agree a set of
common rules between all the RIRs before any deployment in the RIPE
Database?

-For the legal review, there are 2 statements in the IETF doc 'Finding
geofeeds' which may be of concern:
*[RFC8805] geofeed data may reveal the approximate location of an IP
address, which might in turn reveal the approximate location of an
individual user.  Unfortunately, [RFC8805] provides no privacy
guidance on avoiding or ameliorating possible damage due to this
exposure of the user. In publishing pointers to geofeed files as
described in this document the operator should be aware of this
exposure in geofeed data and be cautious. All the privacy
considerations of [RFC8805] Section 4 apply to this document.
*It is significant that geofeed data may have finer granularity than
the inetnum: which refers to them.

It is clear that the RIPE NCC cannot prevent this data being
referenced by objects in the RIPE Database. It is already being
referenced from "remarks:" attributes. Perhaps the RIPE NCC should
require (as part of their service agreement) that it's members obtain
written consent from their customers to publish this location data, or
at least inform the customers in writing that it will be published.

Also, although RFC8805 says postcode is deprecated it is still
provided for in the csv files. So anyone can still enter location data
to this detail.

-The IETF doc 'Finding geofeeds' suggests that geofeed information
'will be' available in bulk accessed whois data. In view of the
privacy concerns above, is this likely?

-The IETF doc 'Finding geofeeds' says "To minimize the load on RIR
whois [RFC3912] services, use of the RIR's FTP [RFC0959] services
SHOULD be the preferred access." Is the RIPE NCC expected to download
all the geofeed files and make them available through their FTP
service?

-The IETF doc 'Finding geofeeds' states that consumers of the geofeed
data MUST NOT access this data in real time via the RPSL servers 'too
frequently' or at 'magic times like midnight'. Some users will do
whatever they want to do if they are able to do, regardless of any
statements to the contrary. Should the RIPE NCC enforce such access
rules by some means?

References

The IETF doc 'Finding geofeeds':
https://datatracker.ietf.org/doc/draft-ietf-opsawg-finding-geofeeds/?include...

geofeed file format:
https://www.rfc-editor.org/rfc/rfc8805.html

New NWI for geofeed?

Massimo Candela

Massimo Candela

Massimo Candela

Massimo Candela

Massimo Candela

Massimo Candela

Tyrasuki

Jori Vanneste

tags

participants (12)