Re: [db-wg] Removing personal data from bulk output from the RIPE Database
On Thu, 9 May 2013, denis@ripe.net wrote:
The "changed:" attribute is almost useless.
Disagree, it gave a good indication of how stale an object was likely to be. If an object is is 15 years old, one wouldn't be too surprised if it's inaccurate. If it's from last month, then it's a lot more likely to be accurate. Cheers James
On May 9, 2013, at 5:52 PM, James A. T. Rice <james_r-ripelist@jump.org.uk> wrote:
Disagree, it gave a good indication of how stale an object was likely to be. If an object is is 15 years old, one wouldn't be too surprised if it's inaccurate. If it's from last month, then it's a lot more likely to be accurate.
Hi James, It is true in case one sees a recent date on a "changed:" attribute, but it is not possible to draw any conclusion on last update date for an object if the date is old. Adding additional "changed:" attributes has always been optional, one might have included one when creating the object and then never added or updated it for further updates. The "--list-versions" whois query option is much more accurate for that purpose. Although as Tore detailed his requirements I think "changed:" attribute has enough accuracy for his purpose. As it is shown in the examples for the new dummification proposal (https://labs.ripe.net/Members/kranjbar/proposed-improvements-to-dummificatio...), with the new model we are proposing to keep the changed lines and their dates while obfuscating the domain part of email addresses. Regards, Kaveh. --- Kaveh Ranjbar, RIPE NCC Database Group Manager
Hi All Everyone is correct in what they have said. But when it comes to the "changed:" attribute you have to rely on assumptions, estimates and 'good enough' because the dates are not precisely defined. It is possible 90% of changed dates are spot on, but you don't know that. No one would suggest that anyone would deliberately enter the wrong date. And of course if you don't add a date the software will generate the correct date for you. But if you wrote a template some time ago for creating assignments and that template has a date in it, then everytime you fill in the template to create a new assignment, it will have the wrong date. Or if you 'change' the date on a single "changed:" attribute every time you update the object, it will only show the 'last updated' date. Or as Kaveh pointed out, maybe the object was created 10 years ago, but you updated the information last week, without altering the changed date. Do you consider the object to be stale data? There are so many ways this date can represent something different to what you assume it to be. As Kaveh also pointed out, you can query an object with the '--list-versions' option and get a more accurate, reliable list of dates for when an object was created and each update. But if you want to find which PA assignments were created within a time period, you would have to query them all to get the accurate historical dates. There are 3.7 million assignment objects. For this type of analysis it might be easier if the split file had a well defined date(s) you could rely on. Regards Denis Walker Business Analyst RIPE NCC Database Group
On May 9, 2013, at 5:52 PM, James A. T. Rice <james_r-ripelist@jump.org.uk> wrote:
Disagree, it gave a good indication of how stale an object was likely to be. If an object is is 15 years old, one wouldn't be too surprised if it's inaccurate. If it's from last month, then it's a lot more likely to be accurate.
Hi James,
It is true in case one sees a recent date on a "changed:" attribute, but it is not possible to draw any conclusion on last update date for an object if the date is old.
Adding additional "changed:" attributes has always been optional, one might have included one when creating the object and then never added or updated it for further updates. The "--list-versions" whois query option is much more accurate for that purpose.
Although as Tore detailed his requirements I think "changed:" attribute has enough accuracy for his purpose. As it is shown in the examples for the new dummification proposal (https://labs.ripe.net/Members/kranjbar/proposed-improvements-to-dummificatio...), with the new model we are proposing to keep the changed lines and their dates while obfuscating the domain part of email addresses.
Regards, Kaveh.
--- Kaveh Ranjbar, RIPE NCC Database Group Manager
participants (3)
-
denis@ripe.net
-
James A. T. Rice
-
Kaveh Ranjbar