Clean up of unreferenced person/role objects
Introduction ------------ According to our database consistency statistics program (dbconstat [1]) we currently have 460,573 unreferenced person/role objects [2]. Some of these may be maintained, but are still unreferenced. Any personal data not referenced by Internet resources do not fit within the purpose of the RIPE Database. They should not be stored in the RIPE Database beyond a reasonable 'work in progress' period. The RIPE NCC has had a mandate to delete these unreferenced person/role objects since RIPE 40: http://www.ripe.net/ripe/wg/db/minutes/ripe-40.html (2001) At RIPE 41, the Database Working Group agreed that "maintained objects will now be removed" and "gave a mandate to the RIPE NCC to continue with the cleanup process." http://www.ripe.net/ripe/wg/db/minutes/ripe-41.html (2002) The cleanup process of 2003 (http://www.ripe.net/db/news/unref-cleanup-200304.html) involved using a script to run periodic cleanups. This script was put in place. However, it failed about 18 months ago. Because of other priorities, we have not had the time to examine this issue again until now. The graph showing the increase in these unreferenced person objects [3] indicates that the cleanup script appeared not to be performing correctly. At the start of 2006, there were about 300,000 unreferenced person/role objects. Since then there has been a steady increase with a large increase of almost 50,000 in February 2007. The graph also shows a slightly higher rate of increases in these objects since February 2007. Because of the high number now of unreferenced person objects, we want to raise the issue with the community again and propose the below procedure to clean them up. Because redundant personal data is a serious data protection issue, we want to take a new approach to this in the future. Once the initial cleanup is in progress, we will create a new proposal for a new, regular cleanup procedure. This will be sent to the Data Protection Task Force [4] and then to the rest of the community. One time bulk cleanup procedure ------------------------------- Month 1 * Select 80,000 unreferenced person/role objects. Month 2 * Check selected person/role objects. * Those still unreferenced: o Delete using normal update process. o 2000 objects per update message. o Run updates overnight (Saturday/Sunday). o One update every 15 minutes. o This should avoid any unnecessary load on the servers. * Select next 80,000 unreferenced person/role objects. Month 3 * Repeat process until complete. Notes ----- This procedure will take about 6 months to clear the current backlog, not including the extra time that may be necessary due to the increases over that period. Because of the high numbers involved, we prefer NOT to send out individual e-mail notifications, either before or after deletion. This is to prevent a high load on our mail servers, especially in the event of a high number of bounced e-mails. This means that there will be no individual announcements to listed e-mail addresses before the deletion and none of the usual update notifications. This also means that even if the objects are maintained, the maintainer will not be notified directly about the deletion of their unreferenced person/role object. We will, however, announce the cleanup to the Working Group mailing lists and as a news item on our web site home page. The worst problem that can occur is that someone will enter a reference to their person/role object just as we delete it. However, as we are only deleting unreferenced person/role objects, the time needed to re-create them is minimal. We suspect that a very large proportion of the unreferenced person/role objects that we will be deleting are abandoned objects that are no longer used. References ---------- [1] dbconstat http://www.ripe.net/projects/dbconstat/index.html [2] current unreferenced person/role objects http://www.ripe.net/projects/dbconstat/html/cons-current.html [3] graph of unreferenced person object increase http://www.ripe.net/projects/dbconstat/cons-unrefpero.html [4] Data Protection Task Force http://www.ripe.net/ripe/tf/dp/index.html
Denis Walker wrote:
Introduction ------------
According to our database consistency statistics program (dbconstat [1]) we currently have 460,573 unreferenced person/role objects [2]. Some of these may be maintained, but are still unreferenced. Any personal data not referenced by Internet resources do not fit within the purpose of the RIPE Database. They should not be stored in the RIPE Database beyond a reasonable 'work in progress' period.
The RIPE NCC has had a mandate to delete these unreferenced person/role objects since RIPE 40:
http://www.ripe.net/ripe/wg/db/minutes/ripe-40.html (2001)
At RIPE 41, the Database Working Group agreed that "maintained objects will now be removed" and "gave a mandate to the RIPE NCC to continue with the cleanup process."
http://www.ripe.net/ripe/wg/db/minutes/ripe-41.html (2002)
The cleanup process of 2003 (http://www.ripe.net/db/news/unref-cleanup-200304.html) involved using a script to run periodic cleanups. This script was put in place. However, it failed about 18 months ago. Because of other priorities, we have not had the time to examine this issue again until now.
The graph showing the increase in these unreferenced person objects [3] indicates that the cleanup script appeared not to be performing correctly. At the start of 2006, there were about 300,000 unreferenced person/role objects. Since then there has been a steady increase with a large increase of almost 50,000 in February 2007. The graph also shows a slightly higher rate of increases in these objects since February 2007.
Because of the high number now of unreferenced person objects, we want to raise the issue with the community again and propose the below procedure to clean them up.
Because redundant personal data is a serious data protection issue, we want to take a new approach to this in the future. Once the initial cleanup is in progress, we will create a new proposal for a new, regular cleanup procedure. This will be sent to the Data Protection Task Force [4] and then to the rest of the community.
One time bulk cleanup procedure -------------------------------
Month 1
* Select 80,000 unreferenced person/role objects.
Month 2
* Check selected person/role objects. * Those still unreferenced: o Delete using normal update process. o 2000 objects per update message. o Run updates overnight (Saturday/Sunday). o One update every 15 minutes. o This should avoid any unnecessary load on the servers.
Hello, I suppose you have already foreseen this, but just in case: What about RIPE DB mirroring? Is there a chance that this mass deletion will cause anomalies in the NRTM mirroring? Regards, Andreas
* Select next 80,000 unreferenced person/role objects.
Month 3
* Repeat process until complete.
Notes -----
This procedure will take about 6 months to clear the current backlog, not including the extra time that may be necessary due to the increases over that period.
Because of the high numbers involved, we prefer NOT to send out individual e-mail notifications, either before or after deletion. This is to prevent a high load on our mail servers, especially in the event of a high number of bounced e-mails.
This means that there will be no individual announcements to listed e-mail addresses before the deletion and none of the usual update notifications. This also means that even if the objects are maintained, the maintainer will not be notified directly about the deletion of their unreferenced person/role object.
We will, however, announce the cleanup to the Working Group mailing lists and as a news item on our web site home page.
The worst problem that can occur is that someone will enter a reference to their person/role object just as we delete it. However, as we are only deleting unreferenced person/role objects, the time needed to re-create them is minimal. We suspect that a very large proportion of the unreferenced person/role objects that we will be deleting are abandoned objects that are no longer used.
References ----------
[1] dbconstat http://www.ripe.net/projects/dbconstat/index.html
[2] current unreferenced person/role objects http://www.ripe.net/projects/dbconstat/html/cons-current.html
[3] graph of unreferenced person object increase http://www.ripe.net/projects/dbconstat/cons-unrefpero.html
[4] Data Protection Task Force http://www.ripe.net/ripe/tf/dp/index.html
-- ========================================= Andreas Polyrakis GRNET/NTUA NOC apolyr@noc.ntua.gr Office: +302107722409 Cell: +306972832445 =========================================
Andreas Polyrakis wrote:
Denis Walker wrote:
Introduction ------------
According to our database consistency statistics program (dbconstat [1]) we currently have 460,573 unreferenced person/role objects [2]. Some of these may be maintained, but are still unreferenced. Any personal data not referenced by Internet resources do not fit within the purpose of the RIPE Database. They should not be stored in the RIPE Database beyond a reasonable 'work in progress' period.
The RIPE NCC has had a mandate to delete these unreferenced person/role objects since RIPE 40:
http://www.ripe.net/ripe/wg/db/minutes/ripe-40.html (2001)
At RIPE 41, the Database Working Group agreed that "maintained objects will now be removed" and "gave a mandate to the RIPE NCC to continue with the cleanup process."
http://www.ripe.net/ripe/wg/db/minutes/ripe-41.html (2002)
The cleanup process of 2003 (http://www.ripe.net/db/news/unref-cleanup-200304.html) involved using a script to run periodic cleanups. This script was put in place. However, it failed about 18 months ago. Because of other priorities, we have not had the time to examine this issue again until now.
The graph showing the increase in these unreferenced person objects [3] indicates that the cleanup script appeared not to be performing correctly. At the start of 2006, there were about 300,000 unreferenced person/role objects. Since then there has been a steady increase with a large increase of almost 50,000 in February 2007. The graph also shows a slightly higher rate of increases in these objects since February 2007.
Because of the high number now of unreferenced person objects, we want to raise the issue with the community again and propose the below procedure to clean them up.
Because redundant personal data is a serious data protection issue, we want to take a new approach to this in the future. Once the initial cleanup is in progress, we will create a new proposal for a new, regular cleanup procedure. This will be sent to the Data Protection Task Force [4] and then to the rest of the community.
One time bulk cleanup procedure -------------------------------
Month 1
* Select 80,000 unreferenced person/role objects.
Month 2
* Check selected person/role objects. * Those still unreferenced: o Delete using normal update process. o 2000 objects per update message. o Run updates overnight (Saturday/Sunday). o One update every 15 minutes. o This should avoid any unnecessary load on the servers.
Hello,
I suppose you have already foreseen this, but just in case:
What about RIPE DB mirroring? Is there a chance that this mass deletion will cause anomalies in the NRTM mirroring?
We are using the normal update process. So these updates will be passed out on the mirror stream in the usual way. By spreading the updates over several hours each time it should not cause too much load on the mirror servers. Regards denis
Regards, Andreas
* Select next 80,000 unreferenced person/role objects.
Month 3
* Repeat process until complete.
Notes -----
This procedure will take about 6 months to clear the current backlog, not including the extra time that may be necessary due to the increases over that period.
Because of the high numbers involved, we prefer NOT to send out individual e-mail notifications, either before or after deletion. This is to prevent a high load on our mail servers, especially in the event of a high number of bounced e-mails.
This means that there will be no individual announcements to listed e-mail addresses before the deletion and none of the usual update notifications. This also means that even if the objects are maintained, the maintainer will not be notified directly about the deletion of their unreferenced person/role object.
We will, however, announce the cleanup to the Working Group mailing lists and as a news item on our web site home page.
The worst problem that can occur is that someone will enter a reference to their person/role object just as we delete it. However, as we are only deleting unreferenced person/role objects, the time needed to re-create them is minimal. We suspect that a very large proportion of the unreferenced person/role objects that we will be deleting are abandoned objects that are no longer used.
References ----------
[1] dbconstat http://www.ripe.net/projects/dbconstat/index.html
[2] current unreferenced person/role objects http://www.ripe.net/projects/dbconstat/html/cons-current.html
[3] graph of unreferenced person object increase http://www.ripe.net/projects/dbconstat/cons-unrefpero.html
[4] Data Protection Task Force http://www.ripe.net/ripe/tf/dp/index.html
All, On Wed, Apr 18, 2007 at 01:34:54PM +0200, Denis Walker wrote:
One time bulk cleanup procedure -------------------------------
Month 1
* Select 80,000 unreferenced person/role objects.
Month 2
* Check selected person/role objects. * Those still unreferenced: o Delete using normal update process. o 2000 objects per update message. o Run updates overnight (Saturday/Sunday). o One update every 15 minutes. o This should avoid any unnecessary load on the servers. * Select next 80,000 unreferenced person/role objects.
Month 3
* Repeat process until complete.
This makes sense to me, for a one-time cleanup.
The worst problem that can occur is that someone will enter a reference to their person/role object just as we delete it. However, as we are only deleting unreferenced person/role objects, the time needed to re-create them is minimal. We suspect that a very large proportion of the unreferenced person/role objects that we will be deleting are abandoned objects that are no longer used.
A bit of thinking can come up with some scenerios that might be worse(*). I definitely think any minor problems are more than outweighed by the removal of unused personal data! -- Shane (*) For example, a worse problem would be referencing the wrong person/role object. This can happen like so: - Person object X created - Months pass... - Person object X deleted by this process - Person object Y created with same "nic-hdl:" as object X - User who created person object X decides to use it, but actually refers to object Y (since it has the same "nic-hdl:") The user who created person object X assumes it is still fine, because the usual notification was not received, and in any case it was probably protected by "mnt-by:". Of course, this is an unlikely corner case. :)
Shane Kerr wrote:
All,
On Wed, Apr 18, 2007 at 01:34:54PM +0200, Denis Walker wrote:
The worst problem that can occur is that someone will enter a reference to their person/role object just as we delete it. However, as we are only deleting unreferenced person/role objects, the time needed to re-create them is minimal. We suspect that a very large proportion of the unreferenced person/role objects that we will be deleting are abandoned objects that are no longer used.
A bit of thinking can come up with some scenerios that might be worse(*).
I definitely think any minor problems are more than outweighed by the removal of unused personal data!
-- Shane
(*) For example, a worse problem would be referencing the wrong person/role object. This can happen like so:
- Person object X created - Months pass... - Person object X deleted by this process - Person object Y created with same "nic-hdl:" as object X - User who created person object X decides to use it, but actually refers to object Y (since it has the same "nic-hdl:")
The user who created person object X assumes it is still fine, because the usual notification was not received, and in any case it was probably protected by "mnt-by:".
Of course, this is an unlikely corner case. :)
This case can already occur: - Person A creates object X without "mnt-by:" - Months pass... - Person B thinks he creates object Y with same "nic-hdl:" as object X - Person B actually modifies object X and changes it into object Y - Person A who created person object X decides to use it, but actually refers to object Y (since it has the same "nic-hdl:") Person A who created person object X assumes it is still fine, because he did not maintain his data and did not include a "notify:" attribute. This is also an unlikely corner case, but the first part does occasionally happen. denis
On Apr 19, 2007, at 5:22 PM, Shane Kerr wrote: [...]
(*) For example, a worse problem would be referencing the wrong person/role object. This can happen like so:
- Person object X created - Months pass... - Person object X deleted by this process - Person object Y created with same "nic-hdl:" as object X - User who created person object X decides to use it, but actually refers to object Y (since it has the same "nic-hdl:")
The user who created person object X assumes it is still fine, because the usual notification was not received, and in any case it was probably protected by "mnt-by:".
Of course, this is an unlikely corner case. :)
This case could also be avoided by making it impossible to re-use a nic-hdl. I believe that is already the case for organisation objects. Putting it in place for person and role objects would probably be a good thing. But I doubt it's an urgent requirement and I wouldn't want to delay this cleanup. Regards, -- Leo Vegoda IANA Numbers Liaison
On 19 Apr 2007, at 16:22, Shane Kerr wrote:
The worst problem that can occur is that someone will enter a reference to their person/role object just as we delete it. However, as we are only deleting unreferenced person/role objects, the time needed to re-create them is minimal. We suspect that a very large proportion of the unreferenced person/role objects that we will be deleting are abandoned objects that are no longer used.
A bit of thinking can come up with some scenerios that might be worse(*).
I definitely think any minor problems are more than outweighed by the removal of unused personal data!
I personally think it would be a mistake not to at least attempt to notify the contact by email. And sending 80,000 emails isn't exactly a massive amount to send in the grand scheme of things ... and whilst processing 80,000 bounces wouldn't be fun there are ways to mitigate the whole bounce process. If there is concern about load on the current RIPE mail servers, then simply setup another mail server / MX specifically for this project where these mails can be sent from. Personally I think the bigger problem isn't unreferenced objects, but the bigger issue is with "orphaned" objects (*). The "new" role objects overcome a lot of this, but every day I still see objects with contact details of people who left the company 5-10 years earlier and are still referenced .. and unravelling that mess becomes a whole lot harder. -- Jon Morby FidoNet Registration Services Ltd tel: 0845 004 3050 / fax: 0845 004 3051 web: http://www.fido.net/
On Fri, Apr 20, 2007 at 09:36:14AM +0100, Jon Morby wrote:
I personally think it would be a mistake not to at least attempt to notify the contact by email.
For the record, I really don't think it matters much. These person or role objects are ones that are: - unused for months, and - trivially easy to recreate.
Personally I think the bigger problem isn't unreferenced objects, but the bigger issue is with "orphaned" objects (*). The "new" role objects overcome a lot of this, but every day I still see objects with contact details of people who left the company 5-10 years earlier and are still referenced .. and unravelling that mess becomes a whole lot harder.
This is a separate issue, and is IMHO a failure of both database design and policy. The failure of database design is that data needs to be checked for correctness periodically. Objects in the RIPE Database are not. Such a mechanism can be simple. For example, each maintainer could get an e-mail each year pointing to a web page listing objects maintained asking "click to continue to use these values". This points out the failure of policy... what happens if objects are not maintained properly? Right now, there is nothing that can be done. What I would like to see done is the resources made unavailable for use until the maintainer confirms that the objects about them are correct. I think that LIRs will never accept such a policy. But I've been wrong before. -- Shane
Hi, On Fri, Apr 20, 2007 at 11:49:23AM +0200, Shane Kerr wrote:
This points out the failure of policy... what happens if objects are not maintained properly? Right now, there is nothing that can be done. What I would like to see done is the resources made unavailable for use until the maintainer confirms that the objects about them are correct.
Playing devil's advocate: how can the RIPE NCC make an IP address block "unavailable"? Or an AS number? Gert Doering -- NetMaster -- Total number of prefixes smaller than registry allocations: 113403 SpaceNet AG Vorstand: Sebastian v. Bomhard Joseph-Dollinger-Bogen 14 Aufsichtsratsvors.: A. Grundner-Culemann D-80807 Muenchen HRB: 136055 (AG Muenchen) Tel: +49 (89) 32356-444 USt-IdNr.: DE813185279
Gert, On Fri, Apr 20, 2007 at 11:57:15AM +0200, Gert Doering wrote:
On Fri, Apr 20, 2007 at 11:49:23AM +0200, Shane Kerr wrote:
This points out the failure of policy... what happens if objects are not maintained properly? Right now, there is nothing that can be done. What I would like to see done is the resources made unavailable for use until the maintainer confirms that the objects about them are correct.
Playing devil's advocate: how can the RIPE NCC make an IP address block "unavailable"? Or an AS number?
How would *you* do it if someone asked you to set up a revokation procedure? I imagine it would look something like: - Flag the resources as possibly abandonded internally at the NCC. - Try to contact the maintainers (or LIR if possible). - After a time, flag the resources as possibly abandoned externally. - Try harder to contact the maintainers (contact peers to try to get contact information, for instance). - Move the resources to an "abandoned" status, removing them from public databases. - After a time, do a debogonizing effort on the resources. - Mark the resources available for use again. Mind you, this is just a possibility. There are costs and benefits at each step (for example, publically flagging a resource as unmaintained in the Whois gives hijackers an easy way to locate likely targets... but looking at the routing table can do this too). The RIPE NCC's policies and procedures have done a fairly good job of handling the task of issuing new resources, and making sure that active LIRs keep their information accurate. But when resources go to non-LIRs, for both PI blocks and AS numbers, the system basically fails completely. Maybe this will all solve itself in 2 or 3 years, when we run out of new IPv4 space. I imagine then there will be a lot of people hijacking this space, so this problem may disappear. -- Shane
Denis Walker schrieb:
Introduction ------------
According to our database consistency statistics program (dbconstat [1]) we currently have 460,573 unreferenced person/role objects [2]. Some of these may be maintained, but are still unreferenced. Any personal data not referenced by Internet resources do not fit within the purpose of the RIPE Database. They should not be stored in the RIPE Database beyond a reasonable 'work in progress' period. [...]
so, my 0.02EUR: unreferenced person/role objects should not stay in the RIPE whois db forever and should been cleaned up on a regular basis, i agree since 2001 :-) I don't really know how that vast amount of useless(?) data affects the database operation nowadays (lookup/update times, mirror synchronization , databse sizes ect.), if at all with the current database machines, but the issue is more general. It just makes no real sense to keep them in there.
One time bulk cleanup procedure ------------------------------- [...]
But like some other people i'm not really happy about the "no e-mail notification" thing. Not even for the one-time procedure. I really really would like to have at least the regular update/delete-notificiations. It's just a bad thing to delete something without notifying the "owner". I wouldn't like that to happen to my objects - probably i keep one old Internic person object i copied to the RIPE db some decades ago alive but unreferenced since it's without the xyz-RIPE tag and i don't like to really use it anymore but .... at least i'd like to know if it gets deleted and the WHY in the delete: reason filed. Bottom line: If it's agreed on that it's not possible for the RIPE NCC to send out the notifications for whatever reason, i'd actually rather like to keep the objects there - or take smaller steps so it IS possible to send out notifications (hey, it took so long to re-activate that 'project', so it can't be that time-critical ;-). -- ======================================================================== = Sascha Lenz SLZ-RIPE slz@baycix.de = = Network Operations = = BayCIX GmbH, Landshut * PGP public Key on demand * = ========================================================================
On 18 Apr 2007, at 07:34, Denis Walker wrote:
According to our database consistency statistics program (dbconstat [1]) we currently have 460,573 unreferenced person/role objects [2]. Some of these may be maintained, but are still unreferenced. Any personal data not referenced by Internet resources do not fit within the purpose of the RIPE Database. They should not be stored in the RIPE Database beyond a reasonable 'work in progress' period.
Do you have any data on the age of these unreferenced people/ objects ? I don't get warm-and-fuzzies from the thought of deleting objects.... On 19 Apr 2007, at 13:11, Leo Vegoda wrote:
This case could also be avoided by making it impossible to re-use a nic-hdl.
I used to work for LIR-X and now I'm working for new LIR-Y. Whilst internal paperwork is done my person object is orphaned, but I a. want to keep my handle b. want to keep my maintainer stuff intact Please lets try to understand why objects are orphaned before we take a bullet to them. cheers -a
On Apr 22, 2007, at 2:58 PM, Andy Davidson wrote:
On 19 Apr 2007, at 13:11, Leo Vegoda wrote:
This case could also be avoided by making it impossible to re-use a nic-hdl.
I used to work for LIR-X and now I'm working for new LIR-Y. Whilst internal paperwork is done my person object is orphaned, but I
a. want to keep my handle b. want to keep my maintainer stuff intact
I don't think I wrote very clearly. I didn't mean "make it impossible to use the same nic-hdl for two different organisations' objects". I meant that if a person object is deleted then its nic-hdl could not be reassigned to a different person object at a later date. As long as you clean up as you go keeping your handle and maintainer shouldn't be an issue - if this feature was ever added. Regards, -- Leo Vegoda IANA Numbers Liaison
On 22 Apr 2007, at 15:33, Leo Vegoda wrote:
I used to work for LIR-X and now I'm working for new LIR-Y. Whilst internal paperwork is done my person object is orphaned, but I I don't think I wrote very clearly.
Nor did I. :-) My person object is orphaned during my move between the companies. But I don't want it deleting ! cheers -a
On Apr 22, 2007, at 6:25 PM, Andy Davidson wrote:
I used to work for LIR-X and now I'm working for new LIR-Y. Whilst internal paperwork is done my person object is orphaned, but I
I don't think I wrote very clearly.
Nor did I. :-) My person object is orphaned during my move between the companies. But I don't want it deleting !
I suppose the thing to do is to stick it in a role or organisation object so it is no longer unreferenced. It seems a simple thing to do to distinguish an object that is wanted from one that is not. Regards, -- Leo Vegoda IANA Numbers Liaison
Andy, I think it was clearly stated that only objects that are orphaned _for a long time_ are deleted, so it won't hit you. Cheers, Florian
I used to work for LIR-X and now I'm working for new LIR-Y. Whilst internal paperwork is done my person object is orphaned, but I I don't think I wrote very clearly.
Nor did I. :-) My person object is orphaned during my move between the companies. But I don't want it deleting !
cheers -a
participants (9)
-
Andreas Polyrakis
-
Andy Davidson
-
Denis Walker
-
Florian Frotzler
-
Gert Doering
-
Jon Morby
-
Leo Vegoda
-
Sascha Lenz
-
Shane Kerr