Re: [db-wg] To internationalise or not, that is the question?

6 Aug 2020

      Hi Denis, Colleagues,

As you requested, the Database team will prepare a thorough investigation (impact analysis) of UTF-8 in the RIPE Database, as a starting point for further discussion.

Regards
Ed Shryane
RIPE NCC
...
On 5 Aug 2020, at 14:46, ripedenis@yahoo.co.uk wrote:
Colleagues
We have a problem with UTF-8. Many people keep saying you want it, we should have it, lets do it...But every time we get to these difficult, non technical questions every one goes silent. This is why we have never implemented UTF-8 since it was first mentioned many years ago. No one in the community seems to know how to answer these questions.
So I have a suggestion. The RIPE NCC has the manpower with the expertise to investigate these issues. I propose we put a task on the RIPE NCC to do a thorough investigation of UTF-8 in the RIPE Database from all possible angles and report back to the community. This can be a starting point to a more meaningful discussion.
We need to know what impact having non Latin1 characters in different parts of the data set will have on the RIPE Registry, the RIPE NCC members, the different user groups of the RIPE Database and the social, legal and political impact of such a change. Which parts of the data set can/should/shouldn't be allowed to be in other character sets. Who really needs access to this data and what parts of it need to be understandable or interpreted. Which does bring into question the whole purpose of the RIPE Database and the data contained therein.
Thoughts???
cheers
denis
co-chair DB-WG
On Friday, 31 July 2020, 20:20:10 CEST, Leo Vegoda <leo@vegoda.org> wrote:
Hi Denis,
These are good questions. As so many of the answers lie with the RIPE
NCC or the NRO, I suppose we need input from them to proceed further.
Kind regards,
Leo
On Wed, Jul 29, 2020 at 3:47 PM ripedenis@yahoo.co.uk <mailto:ripedenis@yahoo.co.uk>
<ripedenis@yahoo.co.uk <mailto:ripedenis@yahoo.co.uk>> wrote:
...
Hi Leo
Some of the questions that need to be answered include:
-who needs to be able to read/understand/interpret which parts of the data in the RIPE Database (maybe both the community and the NCC need input to answer this)?
-is any of the data contained in the RIPE Database essential for the operation of the registry and not duplicated anywhere else (maybe the NCC and the NCC Services WG need input to answer this)?
-is any of the data important to LEAs and governments, is that a consideration, do they have the resources to understand the data in any format (community and LEAs input needed for this one)?
-One of the mission statements of the NRO is "Providing and promoting a coordinated Internet number registry system" so if we are going to internationalise the public face of the registry should it be coordinated(is that a community, RIR or NRO question)?
cheers
denis
co-chair DB-WG
On Wednesday, 29 July 2020, 21:09:55 CEST, Leo Vegoda <leo@vegoda.org <mailto:leo@vegoda.org>> wrote:
Hi Denis,
I agree that this is a registry issue and not just a database issue,
which is why I sent the message I did on 8 July.
I'd like to understand how much of this work should be led by the RIPE
NCC versus the community. Also, because of the breadth of the issues,
should the discussion be here or on another list?
Kind regards,
Leo Vegoda
On Wed, Jul 29, 2020 at 10:45 AM ripedenis@yahoo.co.uk <mailto:ripedenis@yahoo.co.uk>
<ripedenis@yahoo.co.uk <mailto:ripedenis@yahoo.co.uk>> wrote:
...
Hi Leo
As I have said many times, internationalising the RIPE Database is not a technical issue, it is a registry issue. I think it does need a separate process from the database requirements. Especially if we consider it as a cross registry issue.
Incidentally I did suggest on this mailing list several months ago that the requirements task force considers the issue of UTF-8. No one from the task force has yet replied to me on that or any other comment I have made about the requirements.
cheers
denis
co-chair DB-WG
On Wednesday, 29 July 2020, 18:20:14 CEST, Leo Vegoda <leo@vegoda.org <mailto:leo@vegoda.org>> wrote:
Hi,
Thanks for providing the impact analysis for this initial change.
What should the process be for introducing greater support for
internationalization in the RIPE Database? George, Cynthia and others
have made good points about the need to improve internationalization
of more than just e-mail addresses. Is that support something that
should be handled through the process that follows the final report of
the Database TF or does it need to be addressed separately?
Thanks,
Leo
On Wed, Jul 29, 2020 at 8:03 AM Edward Shryane via db-wg <db-wg@ripe.net <mailto:db-wg@ripe.net>> wrote:
...
Dear Colleagues,
Here is the impact analysis for the NWI-11 implementation.
The Database team plans to implement NWI-11 as per the Solution Definition:
https://www.ripe.net/ripe/mail/archives/db-wg/2020-June/006525.html <https://www.ripe.net/ripe/mail/archives/db-wg/2020-June/006525.html>
(1) Impact to Whois Update
The implementation will automatically apply Punycode encoding (as per RFC 5891) to the domain part of an email address during Whois update.
The encoding is only applied to an IDN domain name, and changes the current behaviour as follows:
- ASCII encoded values will not be affected (as before).
- Non-ASCII but latin-1 encoded values will be encoded as Punycode.
- Non-latin-1 encoded values (e.g. UTF-8) will also be encoded as Punycode. These values previously were transformed to latin-1, with a '?' substitution.
The local part of an email address must only contain ASCII characters. If non-ASCII characters are found in the local part, the address is rejected as invalid.
This change will only affect attributes with an email address syntax (i.e. abuse-mailbox, e-mail, irt-nfy, mnt-nfy, notify, ref-nfy, upd-to).
If an email address is converted to Punycode, a warning will be included in the update response.
Any Punycode conversion failure will result in the attribute value being rejected as invalid. A workaround in this case is to encode the value before submitting the update.
(2) Impact to Whois Query
When querying the RIPE database, any Punycode encoded email address is returned as-is (i.e it is not decoded).
(3) Impact to Existing Data
We will perform a cleanup to convert any existing non-ASCII (but latin-1 encoded) IDN domain names to Punycode in attributes with an email address syntax. This affects very few objects. The maintainer(s) will be notified by email beforehand.
(4) Impact to Whois Documentation
We will update the database documentation with details of this behaviour change.
(5) Release Timeline
We expect the NWI-11 implementation to take about 1 month (including code changes and testing), and will include the feature in the Whois 1.98 release.
As usual, we will deploy the release to the Release Candidate environment for 2 weeks before production, to allow for testing.
Regards
Ed Shryane
RIPE NCC
On 23 Jul 2020, at 12:00, ripedenis@yahoo.co.uk <mailto:ripedenis@yahoo.co.uk> wrote:
Hi Ed
The chairs see there is a consensus to move forward with implementing Punycode. Can you present an impact analysis explaining what changes you propose, what effect those changes will have on updates and queries (by all the different methods), if anyone needs to modify their software interacting with the database.
cheers
denis
co-chair DB-WG