Andreas Polyrakis wrote:
Denis Walker wrote:
Introduction ------------
According to our database consistency statistics program (dbconstat [1]) we currently have 460,573 unreferenced person/role objects [2]. Some of these may be maintained, but are still unreferenced. Any personal data not referenced by Internet resources do not fit within the purpose of the RIPE Database. They should not be stored in the RIPE Database beyond a reasonable 'work in progress' period.
The RIPE NCC has had a mandate to delete these unreferenced person/role objects since RIPE 40:
http://www.ripe.net/ripe/wg/db/minutes/ripe-40.html (2001)
At RIPE 41, the Database Working Group agreed that "maintained objects will now be removed" and "gave a mandate to the RIPE NCC to continue with the cleanup process."
http://www.ripe.net/ripe/wg/db/minutes/ripe-41.html (2002)
The cleanup process of 2003 (http://www.ripe.net/db/news/unref-cleanup-200304.html) involved using a script to run periodic cleanups. This script was put in place. However, it failed about 18 months ago. Because of other priorities, we have not had the time to examine this issue again until now.
The graph showing the increase in these unreferenced person objects [3] indicates that the cleanup script appeared not to be performing correctly. At the start of 2006, there were about 300,000 unreferenced person/role objects. Since then there has been a steady increase with a large increase of almost 50,000 in February 2007. The graph also shows a slightly higher rate of increases in these objects since February 2007.
Because of the high number now of unreferenced person objects, we want to raise the issue with the community again and propose the below procedure to clean them up.
Because redundant personal data is a serious data protection issue, we want to take a new approach to this in the future. Once the initial cleanup is in progress, we will create a new proposal for a new, regular cleanup procedure. This will be sent to the Data Protection Task Force [4] and then to the rest of the community.
One time bulk cleanup procedure -------------------------------
Month 1
* Select 80,000 unreferenced person/role objects.
Month 2
* Check selected person/role objects. * Those still unreferenced: o Delete using normal update process. o 2000 objects per update message. o Run updates overnight (Saturday/Sunday). o One update every 15 minutes. o This should avoid any unnecessary load on the servers.
Hello,
I suppose you have already foreseen this, but just in case:
What about RIPE DB mirroring? Is there a chance that this mass deletion will cause anomalies in the NRTM mirroring?
We are using the normal update process. So these updates will be passed out on the mirror stream in the usual way. By spreading the updates over several hours each time it should not cause too much load on the mirror servers. Regards denis
Regards, Andreas
* Select next 80,000 unreferenced person/role objects.
Month 3
* Repeat process until complete.
Notes -----
This procedure will take about 6 months to clear the current backlog, not including the extra time that may be necessary due to the increases over that period.
Because of the high numbers involved, we prefer NOT to send out individual e-mail notifications, either before or after deletion. This is to prevent a high load on our mail servers, especially in the event of a high number of bounced e-mails.
This means that there will be no individual announcements to listed e-mail addresses before the deletion and none of the usual update notifications. This also means that even if the objects are maintained, the maintainer will not be notified directly about the deletion of their unreferenced person/role object.
We will, however, announce the cleanup to the Working Group mailing lists and as a news item on our web site home page.
The worst problem that can occur is that someone will enter a reference to their person/role object just as we delete it. However, as we are only deleting unreferenced person/role objects, the time needed to re-create them is minimal. We suspect that a very large proportion of the unreferenced person/role objects that we will be deleting are abandoned objects that are no longer used.
References ----------
[1] dbconstat http://www.ripe.net/projects/dbconstat/index.html
[2] current unreferenced person/role objects http://www.ripe.net/projects/dbconstat/html/cons-current.html
[3] graph of unreferenced person object increase http://www.ripe.net/projects/dbconstat/cons-unrefpero.html
[4] Data Protection Task Force http://www.ripe.net/ripe/tf/dp/index.html