Re: [db-wg] NWI-11 Internationalised Domain Names
Hello, I would like to suggest an improvement: In the whois database backend, each textual attribute (last name, etc) can have an additional column and in that column the punycode-representation of the attribute value can be written. The punycode-representation can be added in the related column only if there was at least a single non-ascii character in the textual value that was inputted by the user, otherwise it will be empty. The punycode-representation can be in the format of [FIELD-NAME]--[punycode-ascii-characters] instead of xn--[punycode-ascii-characters], in order to avoid duplication with textual strings that starts with "xn--". There shouldn't be "punycode representation" columns in the database for email-related fields, nor for object-reference fields. Using the above, old protocols will still be able to interact with the database as they are now, no ground up development of the whole backend will be needed to support internationalization, the new "punycode-representation" columns encoding is similar to the encoding of the current columns, end-users will be able to use their internationalized names and other internationalized textual strings, RIPE NCC will be able to display internationalized strings online and through supported-services. Regards, Elad ________________________________ From: db-wg on behalf of Edward Shryane via db-wg Sent: Friday, June 26, 2020 5:55 PM To: db-wg Cc: ripedenis@yahoo.co.uk Subject: Re: [db-wg] NWI-11 Internationalised Domain Names Dear Working Group, I'd like to propose the following Solution Definition for NWI-11. Introduction Currently, the RIPE database supports email addresses encoded in the Latin-1 character set. However an email address can have an Internationalised Domain Name (IDN), with characters outside Latin-1 (i.e. Unicode). This causes interoperability problems as non-ASCII characters in an email address may not be accepted by a mail server, and only a small subset of Unicode characters can be encoded as Latin-1. Solution Definition In order to support Internationalised Domain Names (IDN) in an email address in the RIPE database, I propose to automatically encode email addresses in the Punycode format. Punycode (as defined in RFC 3492) is a way to encode strings containing Unicode characters, such as internationalised domain name (IDN) domains, into ASCII. When updating the RIPE database, it is already possible to submit a Punycode encoded value (i.e. with ASCII encoding) for an email address value, but this change automates the conversion of any non-ASCII encoded email address to Punycode. This change will only affect attributes with an email address syntax (i.e. abuse-mailbox, e-mail, irt-nfy, mnt-nfy, notify, ref-nfy, upd-to). Automatic Punycode encoding will only be applied to the domain part of the email address. The local part of the address must only contain ASCII characters. If non-ASCII characters are found in the local part, the address is rejected as invalid. When querying the RIPE database, any Punycode encoded email address is returned in Punycode (i.e it is not decoded). I welcome feedback from the community on this proposal. Regards Ed Shryane RIPE NCC On 22 Jun 2020, at 22:49, ripedenis--- via db-wg <db-wg@ripe.net<mailto:db-wg@ripe.net>> wrote: Colleagues There has been some discussion recently and many times over the years about addressing this issue. The chairs believe there has been enough support shown to move forward with this. We would therefore like to present this as 'NWI-11 Internationalised Domain Names'. We propose a problem statement based on the text provided recently by Leo Vegoda, as shown below. The RIPE NCC has a proposal for a solution to this problem using punycode. We would like to ask the RIPE NCC to present this proposal to the working group. If anyone has any other proposals for a solution, we welcome a discussion on this matter. cheers denis co-chair DB-WG Problem Statement The RIPE NCC service region includes countries whose language is not written using Latin script. Many of the languages used in the RIPE NCC service region are written in Latin script but use diacritical marks that fall outside the US-ASCII character set. Internationalized Domain Names (IDNs) support the use of these scripts in DNS. ICANN began delegating IDN Top-Level Domains as part of a test program in 2007 and the IETF updated the IDNA protocol in 2008 and as of mid 2020, there were over 160 IDN TLDs in the root zone. The IETF published eight standards track RFCs on using IDNs in e-mail in 2012 and 2013. It is reasonable that organizations communicating with people whose preferred script is not Latin-based would want to use an IDN domain for e-mail as well as a web presence. It is also likely that the registry for an IDN TLD would want to use that TLD for its e-mail addresses. RFC 3912 explicitly notes that the WHOIS protocol has not been internationalized while recognizing that some servers attempt to do so. RDAP has been deployed by the RIPE NCC and explicitly supports internationalization by UTF-8 encoding all queries and responses. The RIPE community could decide to ignore EAI by trying to require organizations to deploy a secondary e-mail address for use in the RIPE Database. This would reduce the effectiveness of the RIPE Database as the secondary address is less likely to be monitored and used, and so be ineffective.
participants (1)
-
Elad Cohen