Hi denis,

I agree that this might be an issue (although personally I haven't looked into it) but I am not quite sure what we can do about it beyond just removing the field.

Free form text fields are complicated and maybe it should be considered separately as I think we want to get a good picture of how big the problem is and what kind of data there is.

Doing some kind of analysis to get an idea of how big the problem is might not be too difficult by trying to detect common elements of addresses and possibly person names.
A first easy step could be to simply look for country names in descr field as that might imply an address.
This isn't perfect of course and will pull in things like "Big Corp Germany GmbH" (just as an example) so a lot of fine tuning is necessary.
However I think this could possibly get good enough to get a picture of the scale of the issue even if it won't be good enough to use as an input filter.

These are just my initial thoughts on the topic and there are probably many other things that I haven't considered.

-Cynthia 

On Mon, Jul 4, 2022, 13:47 denis walker via db-wg <db-wg@ripe.net> wrote:
Colleagues

I know this has been a long discussion with several elements to
consider. But with just over a week of this discussion period
remaining, are there any comments or thoughts on the "descr:"
attribute containing personal data?

cheers
denis
Proposal author

---------- Forwarded message ---------
From: denis walker <ripedenis@gmail.com>
Date: Wed, 29 Jun 2022 at 13:10
Subject: 2022-01 personal data in the "descr:" attribute
To: Database WG <db-wg@ripe.net>


Colleagues

Up to now we have focused mostly on member's personal data. But this
isn't only about members. The database contains lots of personal data
for end users as well. The INET(6)NUM objects have a "descr:"
attribute. This has been used extensively for (personal) data about
end users, mostly name and address, but maybe also phone numbers.
There may well be more of these who are natural persons than members.
There are 4.2m INETNUM objects in the database. So the scale of the
problem may be far higher here.

This is also a more complicated problem. The data is entered by
resource holders and possibly multiple levels of sub-allocation
holders. The issue of informed consent is hard to judge. Consent may
be in the small print of a contract and it may not be clear to the
data subject just what they are consenting to. The RIPE Database may
contain the home address of many end users without them knowing what
this database is or what the consequences are of publishing their
personal data in it. This data is also unstructured, unverified,
undefined free text.

As with member's data there are pros and cons regarding the usefulness
of this data. Also the defined purposes of the database cover adding
the name and phone number of an end user as part of the public
registry, but again not their home address. With the data being
undefined, it is not clear what any address relates to. We don't know
what address the resource holder asked for or what address the end
user supplied. The address, therefore, has no meaning to anyone
reading this data.

So the question we need to discuss is what do we do with this data in
the "descr:" attributes? Perhaps I can start the discussion with a
suggestion.

First of all we should never overload free text attributes with
structured data. So if we want to add a name and address to resource
objects we should include user name and address attributes. If we are
going to add a new address field we should break it down into a
structured set of address attributes, as all addresses should have
been from the start. Maybe a phone and/or email attribute is also
needed, keeping in mind most end users don't usually have an
ORGANISATION object. Any new attributes can all be optional, as it is
by choice that resource holders currently add any such data into the
"descr:" attributes.

The same rules would apply to this data as we have discussed with the
members' data. Phone and email must be business data and not personal
and they will be verified. (I will explain more about verification in
response to Leo's email.) Name may be personal if the end user is a
natural person. In that case any optional address added must not be
more specific than country and region.

Another possible solution could be to optionally create an
ORGANISATION object for end users. Either way, this data should not be
in the "descr:" attribute.

Most of this can be discussed later with the implementation details.
What is really important as far as this policy proposal is concerned
is the recognition that the same principles of processing personal
data will be applied to end users as it is to members, regardless of
what objects the data is stored in.

cheers
denis
Proposal author

--

To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/db-wg