Re: [db-wg] Removing personal data from bulk output from the RIPE Database

8 May 2013

      Denis,
...
We?ve received feedback from different users and researchers that we are
overdoing the dummification. For example, one can obtain all references
to personal objects without hitting any personal object result limits by
querying the live RIPE Database with proper flags (like -r). This makes
the "dummification" of these references in the data dumps meaningless.
I do not buy this argument.  We know that certain access restrictions
can be circumvented eventually by renting the ultimate botnet and do a mass
harvest.  That doesn't render restrictions useless.
One could argue that if certain access controls were implemented to achieve
a certain goal and other methods open a path around these controls, those other
methods (the -r flag in this case) ought to be reviewed instead.
...
In order to improve the usability of the data dumps and streams, we are
proposing to change the "dummification" algorithm to keep the actual
personal objects and all references to them and only obfuscate the
fields with personal data (for example real names, phone numbers and
addresses). The new algorithm will also try to preserve data that is
useful for researchers, while not revealing any data that might expose
the identity of the date subject. For example, we are proposing to keep
the first half of phone number digits or to keep the domain part of
email addresses.
I am missing a list of data protection goals that were desired to be met
by the original implementation and a serious assessment why they would
still be met by the proposed changed method.  I doubt that obfuscating
the local part of an email address is an adequate measure of anonymization
or pseudonymization.  Similar concerns hold for phone numbers.
On a meta level mangled data is a threat to real data more than
replaced data is. FWIW, i don't see the special case for 'abuse-mailbox'.

With optimizing the 'dummification algorithm' around fuzzy criteria
it occurs to me we're putting the cart before the horse.

-Peter

Re: [db-wg] Removing personal data from bulk output from the RIPE Database

Peter Koch