Denis,
We?ve received feedback from different users and researchers that we are overdoing the dummification. For example, one can obtain all references to personal objects without hitting any personal object result limits by querying the live RIPE Database with proper flags (like -r). This makes the "dummification" of these references in the data dumps meaningless.
I do not buy this argument. We know that certain access restrictions can be circumvented eventually by renting the ultimate botnet and do a mass harvest. That doesn't render restrictions useless. One could argue that if certain access controls were implemented to achieve a certain goal and other methods open a path around these controls, those other methods (the -r flag in this case) ought to be reviewed instead.
In order to improve the usability of the data dumps and streams, we are proposing to change the "dummification" algorithm to keep the actual personal objects and all references to them and only obfuscate the fields with personal data (for example real names, phone numbers and addresses). The new algorithm will also try to preserve data that is useful for researchers, while not revealing any data that might expose the identity of the date subject. For example, we are proposing to keep the first half of phone number digits or to keep the domain part of email addresses.
I am missing a list of data protection goals that were desired to be met by the original implementation and a serious assessment why they would still be met by the proposed changed method. I doubt that obfuscating the local part of an email address is an adequate measure of anonymization or pseudonymization. Similar concerns hold for phone numbers. On a meta level mangled data is a threat to real data more than replaced data is. FWIW, i don't see the special case for 'abuse-mailbox'. With optimizing the 'dummification algorithm' around fuzzy criteria it occurs to me we're putting the cart before the horse. -Peter