In message <B335DD85-CED0-41A3-A504-E0A7E6E41D2B@ripe.net>, Edward Shryane <eshryane@ripe.net> wrote:
Is it permitted to have internationalized domain names appear within the database?
Currently the RIPE database supports the Latin-1 (ISO-8859-1) character set only.
Yes. Please forgive me. I asked the Wrong Question entrely. See below.
I have found at least one specific case where an IDN does appear in the data base as a UTF-8 encoded string, but since I had never seen that before, I just wanted to know if that was an anomalous mistake or if it was consider normal, acceptable, and routine.
Mea culpa! I misspoke. What I found was *not* an internationalized domain name, per se. Well, maybe it was/is and maybe it wasn't/isn't. I'll let you all decide, and then you can tell me if I have used improper terminology to descrtbe what I found. The issue came up as I was performing some automated processing relating to certain abuse contact email addresses relating to certain RIPE ASNs. More specifically, one of my automated tools got rather badly confused by the abuse reporting addresses for AS5464 and AS42486, both of which consist of the email address: abuse@zürich.email The domain name portion of this address may or may not be a proper sort of internationalized domain name. I am frankly not sure about that now, one way or the other. I just saw a character that was not a traditional 7-bit ASCII character and then I improperly lept to the conclusion that this must be one of those internationalized domain name that have bedeviled some of my other home-brew tools in the past. The problem, of course, is that one lower-case letter "u" with the associated umlaut above it. On my system here, the "od -c" command indicates that this one character is encoded NOT as any kind of UTF-8 sequence, but rather that it is simply encoded as a single byte with the value 374 (octal). As I now know, that byte value, when construed in accordance with ISO-8859-1, does in fact represent a lower-case "u" with an umlaut. So at least in this limited sense I now know what the person who put that domain name into the data base had intended. However I am not yet persuaded that simply using ISO-8859-1 encoding was either the best choice or even an entirely appropriate choice in this instance. It was certainly convenient for the writer that a lower-case "u" with an umlaut could be represented within ISO-8859-1, thus making it unnecessary to resort to UTF-8 in this particular instance, but it does cause me to wonder a bit about what may transpire on the day when some RIPE member finds it appropriate and necessary to add to the data base some contact email address consisting in part of an IDN, where said IDN is, in its native form, something in Arabic, Farsi, Hebrew or Chinese. For my own part, I am merely an out-of-date and ancient relic of a happier and simpler time, here in the United States, when 7-bit ASCII was sufficient for anything and everything. As such, I cannot help but long for a return to that level of simplicity, parochial as it might be. But since that is not going to happen anytime soon, I can only hope that RIPE and other regions will come to some agreement regarding the proper representation of IDNs within their respective data bases. If ISO-8859-1 is the standard chosen, I wll certainly adjust my tools accordingly. If however some other standard is set, then I merely hope that I will be on the circulation list when that memo is issued. Regards, rfg P.S. Not that anybody should really care, but for this one lone resarcher it would be maximally convenient if all domain names represented within the data base were encoded as punycode, where necessary. In fact, it is my belief that 99.99% of them already are, which thus renders the "transition" to that standard essentially pain free.