Re: [db-wg] Internationalized domain names in the data abase?

6 Nov 2019

      In message <B335DD85-CED0-41A3-A504-E0A7E6E41D2B@ripe.net>, 
Edward Shryane <eshryane@ripe.net> wrote:
...
...
Is it permitted to have internationalized domain names appear
within the database?
Currently the RIPE database supports the Latin-1 (ISO-8859-1) character
set only.
Yes.  Please forgive me.  I asked the Wrong Question entrely.  See below.
...
...
I have found at least one specific case where an IDN does appear
in the data base as a UTF-8 encoded string, but since I had
never seen that before, I just wanted to know if that was an
anomalous mistake or if it was consider normal, acceptable,
and routine.
Mea culpa!  I misspoke.

What I found was *not* an internationalized domain name, per se.  Well,
maybe it was/is and maybe it wasn't/isn't.  I'll let you all decide,
and then you can tell me if I have used improper terminology to
descrtbe what I found.

The issue came up as I was performing some automated processing relating
to certain abuse contact email addresses relating to certain RIPE ASNs.

More specifically, one of my automated tools got rather badly confused
by the abuse reporting addresses for AS5464 and AS42486, both of which
consist of the email address:

    abuse@zürich.email

The domain name portion of this address may or may not be a proper sort
of internationalized domain name.  I am frankly not sure about that now,
one way or the other.  I just saw a character that was not a traditional
7-bit ASCII character and then I improperly lept to the conclusion that
this must be one of those internationalized domain name that have bedeviled
some of my other home-brew tools in the past.

The problem, of course, is that one lower-case letter "u" with the associated
umlaut above it.  On my system here, the "od -c" command indicates that this
one character is encoded NOT as any kind of UTF-8 sequence, but rather that
it is simply encoded as a single byte with the value 374 (octal).

As I now know, that byte value, when construed in accordance with ISO-8859-1,
does in fact represent a lower-case "u" with an umlaut.  So at least in
this limited sense I now know what the person who put that domain name
into the data base had intended.  However I am not yet persuaded that
simply using ISO-8859-1 encoding was either the best choice or even an
entirely appropriate choice in this instance.  It was certainly convenient
for the writer that a lower-case "u" with an umlaut could be represented
within ISO-8859-1, thus making it unnecessary to resort to UTF-8 in this
particular instance, but it does cause me to wonder a bit about what may
transpire on the day when some RIPE member finds it appropriate and
necessary to add to the data base some contact email address consisting
in part of an IDN, where said IDN is, in its native form, something in
Arabic, Farsi, Hebrew or Chinese.

For my own part, I am merely an out-of-date and ancient relic of a happier
and simpler time, here in the United States, when 7-bit ASCII was sufficient
for anything and everything.  As such, I cannot help but long for a return
to that level of simplicity, parochial as it might be.  But since that is
not going to happen anytime soon, I can only hope that RIPE and other
regions will come to some agreement regarding the proper representation of
IDNs within their respective data bases.  If ISO-8859-1 is the standard
chosen, I wll certainly adjust my tools accordingly.  If however some
other standard is set, then I merely hope that I will be on the circulation
list when that memo is issued.

Regards,
rfg

P.S.  Not that anybody should really care, but for this one lone resarcher
it would be maximally convenient if all domain names represented within the
data base were encoded as punycode, where necessary.  In fact, it is my
belief that 99.99% of them already are, which thus renders the "transition"
to that standard essentially pain free.

Re: [db-wg] Internationalized domain names in the data abase?

Ronald F. Guilmette