about ripe.db on ftp.ripe.net
Hi, I'm doing a statistic work on ripe.db for my University (Faculty of Science of the Lisbon University) and I downloaded ripe.db.inetnum (from the split files directory) so I can do statistics on several portuguese speaking countrys and Portugal itself. The ripe.db.inetnum suits my needs exactly but the admin-c: and tech-c: objects refer to person: objects which can't be found on ripe.db. I've googled a bit and found reference (back in 1993) of a split of ripe.db named ripe.db.person which probably had the additional information I need. Since I don't see that ripe.db.person anywhere on the ftp and the other .db files don't have person: objects, I was wondering if they were removed due to spam issues or something, since they're accessible through normal whois.ripe.net query's (which are bound to massive querying limits). Could someone shed some light into this matter? regards, Francisco Guerreiro
On Wed, 2004-12-15 at 02:55 +0000, Francisco Guerreiro wrote:
Hi, I'm doing a statistic work on ripe.db for my University (Faculty of Science of the Lisbon University) and I downloaded ripe.db.inetnum (from the split files directory) so I can do statistics on several portuguese speaking countrys and Portugal itself. The ripe.db.inetnum suits my needs exactly but the admin-c: and tech-c: objects refer to person: objects which can't be found on ripe.db. I've googled a bit and found reference (back in 1993) of a split of ripe.db named ripe.db.person which probably had the additional information I need. Since I don't see that ripe.db.person anywhere on the ftp and the other .db files don't have person: objects, I was wondering if they were removed due to spam issues or something, since they're accessible through normal whois.ripe.net query's (which are bound to massive querying limits). Could someone shed some light into this matter?
Afaik they where removed because people used them for mass marketing (thus including spam). Nevertheless for statistical analysis you can most likely do without that information anyway. Usually the person's are role's and/or generic mailboxes anyways. If you want to see the amount of 'same persons' you can also compare the handles themselves... Greets, Jeroen
On 15 Dec 2004, at 08:57, Jeroen Massar wrote:
If you want to see the amount of 'same persons' you can also compare the handles themselves...
To some extent. A number of individuals have multiple handles. Possible reasons include: - no housekeeping after someone moved to a new job, - automatic introduction of new handles during bulk operations. Best regards, Niall O'Reilly PGP key ID: AE995ED9 (see www.pgp.net) Fingerprint: 23DC C6DE 8874 2432 2BE0 3905 7987 E48D AE99 5ED9
On Wed, 2004-12-15 at 11:09 +0000, Niall O'Reilly wrote:
On 15 Dec 2004, at 08:57, Jeroen Massar wrote:
If you want to see the amount of 'same persons' you can also compare the handles themselves...
To some extent. A number of individuals have multiple handles. Possible reasons include: - no housekeeping after someone moved to a new job, - automatic introduction of new handles during bulk operations.
In that case you should also assume that the handle itself contains wrong information, thus that it is useless either way ;) Especially now with the free creation of maintainers a lot more junk can and will likely be stored in the db and as it is maintained the junk will stay there too until the periodic email check is run again... Greets, Jeroen
Jeroen Massar wrote:
Nevertheless for statistical analysis you can most likely do without that information anyway. Usually the person's are role's and/or generic mailboxes anyways. If you want to see the amount of 'same persons' you can also compare the handles themselves...
well, if I'm asking about that information, it's because I need it. I don't care about the e-mail contact on the person: object, that's not useful information for my work. And I can't compare the handles since they are several per person, hence the need for the person db. regards, Francisco Guerreiro
On Wed, 2004-12-15 at 15:47 +0000, Francisco Guerreiro wrote:
Jeroen Massar wrote:
Nevertheless for statistical analysis you can most likely do without that information anyway. Usually the person's are role's and/or generic mailboxes anyways. If you want to see the amount of 'same persons' you can also compare the handles themselves...
well, if I'm asking about that information, it's because I need it. I don't care about the e-mail contact on the person: object, that's not useful information for my work.
I wonder what kind of research you are doing? Maybe indexing prefixes based on the administrative contact? Note that it is contact information, not the location of the prefix. Nor does it have any relation on location or whatsoever against the prefix in the database, thus I wonder what the value of that information could be, except for contact or direct marketing purposes...
And I can't compare the handles since they are several per person, hence the need for the person db.
1 person can have multiple handles, but 1 handle maps to 1 person/role. And even if you had the info you would need to apply a very smart filter because if a person has multiple handles you can't do a direct match as a space extra, comma here, different address there and it already breaks your statistical analysis with incorrect assumptions. Greets, Jeroen
Jeroen Massar wrote:
I wonder what kind of research you are doing? Maybe indexing prefixes based on the administrative contact? Note that it is contact information, not the location of the prefix. Nor does it have any relation on location or whatsoever against the prefix in the database, thus I wonder what the value of that information could be, except for contact or direct marketing purposes...
It's for a university paper, the value of the information is the information itself, don't think about practical use for the information, since the practical use is actually writing the paper :) The person(s) who is the admin-c/tech-c or the company behind it, that's the prime of information I want.
1 person can have multiple handles, but 1 handle maps to 1 person/role. And even if you had the info you would need to apply a very smart filter because if a person has multiple handles you can't do a direct match as a space extra, comma here, different address there and it already breaks your statistical analysis with incorrect assumptions.
well, I had to apply a filter to ripe.db.inetnum too :) that's no biggie, best to do is to read a little bit of the whole db, do a parser for common entry's and separate them from different ones. on the person case, it's just a matter of stripping/converting some characters and tolower() them. After that, I can right the regex I need to parse the data and store it on a sql db :) then, it's just a matter of making concept relations between the collected data. Internet Contact information is of no use to me since if any contact with any company should be made in the future on this or other papers, it would be through registered 'snail' mail, as one should always do (at least if you live in Portugal :D) in official communications between Companys/Universitys. It's funny because the actual contact information is of no use to me anyway, if any contact would be made, I'd have to phone the company to find out the person who deals with that kind of mail and address it to that person, or it would be simply discarded. So, as I think I made it clear, my intentions aren't those of a spammer/whatever-alike, I just need legitimate information that's not (that) available anymore due to bad use of it (i guess). regards, Francisco Guerreiro
participants (3)
-
Francisco Guerreiro
-
Jeroen Massar
-
Niall O'Reilly