It's nice to see you here, Marco. I don't think it will be possible to change the default without potentially causing problems for clients not expecting that. Adding that -C option to the server seems appropriate, and unless you expect that at one point there will be no clients not providing that, you will face the issue that if the default changes from iso-8859-1 to utf-8, old clients will be wrong, as you can't get every client to be updated at the same time. The most similar situation would be if you had a predefined epoch at which both server and clients would change the encoding. But it still requires all clients to have been updated, and the date having been agreed far in advance (and not posponed later!). A situation that could help would be if the server marked the encoding so that the client could recognise the output is no longer iso-8859-1, but utf-8, such as including a BOM (although it may be preferable to include that inside a comment rather than at the beginning, I see benefits both ways), and thus old-but-not-ancient clients could autodetect the switch. Most likely, the server default will not change, though. Best regards -------- Original message -------- From: Marco d'Itri via db-wg <db-wg@ripe.net> Date: 12/18/23 00:44 (GMT+01:00) To: Piotr Strzyzewski <piotr@internetsailor.net> Cc: db-wg <db-wg@ripe.net> Subject: Re: [db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes On Dec 03, Piotr Strzyzewski via db-wg <db-wg@ripe.net> wrote:
As the UTF-8 topic was briefly discussed during DB-WG session at RIPE87 in Rome, I would like to propose moving forward with it. If that means a topic for first (?) interim meeting, let it be. Let me know please if this works for you. Thanks in advance. In Rome I talked a bit with Edward about this. Background: I am the author of the whois client used by all Linux distributions.
I fully agree that switching to UTF-8 is desirable, but we cannot just change the encoding of port 43 without major side effects. Since version 5.5.4 (december 2019), the client assumes that the output of whois.ripe.net is Latin 1 and then transcodes it to the system encoding. Receiving unexpected UTF-8 would cause mojibake. My suggestion is to add a new query "command line" option to specify the desired encoding (limiting it to either ISO-8859-1 or UTF-8), as supported by other whois servers. -C is the most common choice, but maybe it would be better to use --charset to not waste a single letter option. See https://github.com/rfc1036/whois/blob/next/servers_charset_list . In a few years then it will be much easier to switch the default from Latin 1 to UTF-8. -- ciao, Marco