Pre-PDP discussion: "All RIPE documents should be plain text"
Dear all, this is the first in a series of emails which I hope will advance into PDPs and then updated policies. Some of them are closely related, but I submit them individually so that if there are any hold-ups with a specific one, the other ones can still progress. My first suggestion is: The canonical and binding format of all documents is plain text in UTF-8 encoding. If a document can not reasonably be stored as plain text, PDF/A-1a[1] will be used. If any document exists in both plain text and PDF/A-1a, plain text is binding. Any other form is considered non-binding as soon as a plain text or PDF/A-1a variant exist. All legacy documents which are still valid or relevant to current policies will be transformed within a year of this proposal becoming policy. The reasoning behind this should be obvious: Ensure that all documents will be in the most simple-to-parse format now and forever. Richard [1] https://en.wikipedia.org/wiki/PDF/A
* Richard Hartmann
this is the first in a series of emails which I hope will advance into PDPs and then updated policies. Some of them are closely related, but I submit them individually so that if there are any hold-ups with a specific one, the other ones can still progress.
My first suggestion is:
The canonical and binding format of all documents is plain text in UTF-8 encoding. If a document can not reasonably be stored as plain text, PDF/A-1a[1] will be used. If any document exists in both plain text and PDF/A-1a, plain text is binding. Any other form is considered non-binding as soon as a plain text or PDF/A-1a variant exist. All legacy documents which are still valid or relevant to current policies will be transformed within a year of this proposal becoming policy.
The reasoning behind this should be obvious: Ensure that all documents will be in the most simple-to-parse format now and forever.
Having recently authored a rather intrusive policy proposal that touches a lot of text, I can only strongly agree with this. The pre-publication procedure appears to be to send complete versions of the proposed new policy document back and forth by e-mail (sometimes in different document formats), with no easy way to pinpoint exactly what has changed between the various versions. This makes the entire process needlessly time-consuming, and increases the risk that some typo or mistake accidentally sneaks into the proposal and makes it all through the PDP. So there should definitively be an authoritative, easy to collaborate on source format for all RIPE documents. Plain text is one obvious alternative, but would preclude text formatting and inclusion of figures, so a document such as ripe-500 can not be converted into plain text without loss of information. I do not like the idea of using PDF for documents like ripe-500 though. PDFs are hard to collaborate on; standard tools like diff cannot meaningfully represent the differences between two versions of a document. I would prefer something like HTML, where you could download an archive file containing the main policy text as a HTML file and any figure or image files referred to by it. I'd also suggest that a conventional 80-character line length limit would be used in the source format, as limiting the line length makes reading diffs easier. That doesn't mean the published document's web page cannot be reformatted to have longer lines of course (something that would happen automatically with HTML). -- Tore Anderson
On Fri, Mar 15, 2013 at 10:01 PM, Tore Anderson <tore@fud.no> wrote:
Plain text is one obvious alternative, but would preclude text formatting and inclusion of figures, so a document such as ripe-500 can not be converted into plain text without loss of information.
I honestly can't see how that picture is really needed to understand the process. Especially as RIPE would be free to keep a version with a picture around if it wanted to. But that diagram would work in text, as well.
I do not like the idea of using PDF for documents like ripe-500 though. PDFs are hard to collaborate on; standard tools like diff cannot meaningfully represent the differences between two versions of a document. I would prefer something like HTML, where you could download an archive file containing the main policy text as a HTML file and any figure or image files referred to by it.
Why not collaborate on a text file and export the result into PDF if it really can not be stored as text? WRT HTML, can we be certain that it will still render the same in ten years?
I'd also suggest that a conventional 80-character line length limit would be used in the source format, as limiting the line length makes reading diffs easier. That doesn't mean the published document's web page cannot be reformatted to have longer lines of course (something that would happen automatically with HTML).
Very good point, though 78 or even the email default of 72 may make more sense, especially once people start quoting in emails, etc. Unified diff already eats one more character, add a single level of quoting and 78 is too long. There's a standard for email hidden somewhere that basically says that trailing whitespace equals word wrap whereas no trailing whitespace means proper end of line. That eats another column, but it would be easy to parse. Though if we go that far, why not retain that width for everything new? Old documents should not be changed for obvious reasons. Alternatively, "one sentence per line" would make diffing easier on the eyes, as well. Richard
Hi, * Richard Hartmann
On Fri, Mar 15, 2013 at 10:01 PM, Tore Anderson <tore@fud.no> wrote:
Plain text is one obvious alternative, but would preclude text formatting and inclusion of figures, so a document such as ripe-500 can not be converted into plain text without loss of information.
I honestly can't see how that picture is really needed to understand the process. Especially as RIPE would be free to keep a version with a picture around if it wanted to. But that diagram would work in text, as well.
It was just one example. Another would be ripe-566, which has a couple of tables. Sure, you can create an ASCII art tables, but doing so and at the same time everything below 72 characters of width might not be easiest thing in the world. A future document might want to include some other figures or other external content which cannot be (easily) represented in plain text.
Why not collaborate on a text file and export the result into PDF if it really can not be stored as text?
Because once that PDF is the authoritative version it's hard for someone else to come along and make changes to it later.
WRT HTML, can we be certain that it will still render the same in ten years?
Reasonably, if you keep the amount of markup to a minimum. The point after all isn't to get RIPE documents that are glitzy AJAX Web2.0 HTML5 interactive stuff, just to allow for the inclusion of basic figures, tables, images and a minimum of formatting.
Alternatively, "one sentence per line" would make diffing easier on the eyes, as well.
I have a nasty tendency to write very long lines, so I think perhaps if you do "max one sentence per line" you want to do it *in addition* to a max line length of N chars. The result will probably look weird in plain text though, but not so in HTML (as newline characters don't get rendered as such). Best regards, -- Tore Anderson
On Fri, Mar 15, 2013 at 10:43 PM, Tore Anderson <tore@fud.no> wrote:
It was just one example. Another would be ripe-566, which has a couple of tables. Sure, you can create an ASCII art tables, but doing so and at the same time everything below 72 characters of width might not be easiest thing in the world. A future document might want to include some other figures or other external content which cannot be (easily) represented in plain text.
The various TeX flavors can generate PDF/A. I understand that this would add complexity for people who need to create PDFs. Which may be a benefit overall as this would result in a stronger incentive to use plain text. All this while still allowing easily diffed collaboration on text files in all cases.
Because once that PDF is the authoritative version it's hard for someone else to come along and make changes to it later.
Fair point; see above.
Reasonably, if you keep the amount of markup to a minimum. The point after all isn't to get RIPE documents that are glitzy AJAX Web2.0 HTML5 interactive stuff, just to allow for the inclusion of basic figures, tables, images and a minimum of formatting.
Even then, fonts may get lost over time; that will not happen with PDF/A which embeds all fonts. I am not saying we need eternal docs that always look the same, but it would be nice to have and as we are debating redoing things anyway.... Richard
Hi,
It was just one example. Another would be ripe-566, which has a couple of tables. Sure, you can create an ASCII art tables, but doing so and at the same time everything below 72 characters of width might not be easiest thing in the world. A future document might want to include some other figures or other external content which cannot be (easily) represented in plain text.
The various TeX flavors can generate PDF/A. I understand that this would add complexity for people who need to create PDFs. Which may be a benefit overall as this would result in a stronger incentive to use plain text. All this while still allowing easily diffed collaboration on text files in all cases.
*if* plain text becomes the authoritative format, the RIPE NCC can then use whatever tools to create whatever formats of policy documents that they deem useful. No need to get dirty and bring TeX into the discussion ;) Alex Le Heux Kobo Inc
On Fri, Mar 15, 2013 at 11:11 PM, Alex Le Heux <aleheux@kobo.com> wrote:
No need to get dirty and bring TeX into the discussion ;)
I am sure watching some poor MS Office person write TeX right before a deadline could be fun. Richard
* Richard Hartmann
The various TeX flavors can generate PDF/A. I understand that this would add complexity for people who need to create PDFs. Which may be a benefit overall as this would result in a stronger incentive to use plain text. All this while still allowing easily diffed collaboration on text files in all cases.
In this case, the TeX format should be the authoritative one, not PDF. (You can convert HTML to PDF, too.) It's been a while since I wrote anything in TeX, but IIRC it's rather forbidding for someone who's never seen it before. HTML is more approachable. But, never mind about that, if we do it Sander's way and define my preferences as non-specific requirements instead: - For documents that are all text, plaintext is preferred - For documents that are not all text or cannot be easily be represented in plaintext, the alternate format should be chosen based on the following criteria: - It should be open and platform-independent - It should easy for a complete beginner to understand and modify - It should keep any text parts of the document as similar to plaintext as possible (to allow for easy diffing and collaboration) - It should be easy to convert into other formats
Even then, fonts may get lost over time; that will not happen with PDF/A which embeds all fonts.
I don't see any problems with fonts changing. Plaintext doesn't have any concept of fonts either... -- Tore Anderson
Hi Richard,
The various TeX flavors can generate PDF/A.
/shiver I was wondering when someone would start about (LA)TeX ... and it didn't even took very long ... I have to agree with Sander, this is something that should start with actual requirements. Currently what I see is someone who is very happy in using CLI, GIT and such to see the difference between all these versions .. and whishing that he would have the output in (LA)TeX, where I'm more a regular Latex kind of guy (fun intended) .. more of a wysiwyg and clicker ... So if the actual end-results need to be published in multiple types of output and that could be done without costing the community an arm and a leg, feel free to write a requirement document (in txt :). One of the things that we would need to avoid imho is that people would have to start publishing PDP proposals in some kind of TeX dialect or anything alike ... or that people would have to get familiar with a CLI and all great GIT options before they would follow what is going on. I'm sure that would work for a select few, but the interaction within the community as is, is already shockingly low. Especially if we want to keep the community open for none-ISP tech savvy people. (think about LEA's or regular enterprises that start to deal with RIPE as their ISP's are running out of v4 and now they need to become an LIR), simplicity to use is key in my opinion and additional interfaces on how to export / filter / Diff etc through the data, is a nice bonus for those that want it. Regards, Erik Bais
Hi,
The canonical and binding format of all documents is plain text in UTF-8 encoding. If a document can not reasonably be stored as plain text, PDF/A-1a[1] will be used. If any document exists in both plain text and PDF/A-1a, plain text is binding. Any other form is considered non-binding as soon as a plain text or PDF/A-1a variant exist. All legacy documents which are still valid or relevant to current policies will be transformed within a year of this proposal becoming policy.
The reasoning behind this should be obvious: Ensure that all documents will be in the most simple-to-parse format now and forever.
For policy documents: +1 Other RIPE documents may be graphics-heavy, although most probably won't be. It would be good to have a provision for non-policy documents that just can't be represented in plain text. Alex
On Fri, Mar 15, 2013 at 11:01 PM, Alex Le Heux <aleheux@kobo.com> wrote:
For policy documents:
+1
Other RIPE documents may be graphics-heavy, although most probably won't be. It would be good to have a provision for non-policy documents that just can't be represented in plain text.
Wouldn't PDF/A work for them? I mainly care about policy documents, though. If we can start with those, it would be an incredible win for all of RIPE. Richard
On 03/15/2013 08:12 PM, Richard Hartmann wrote:
The canonical and binding format of all documents is plain text in UTF-8 encoding. If a document can not reasonably be stored as plain text, PDF/A-1a[1] will be used. If any document exists in both plain text and PDF/A-1a, plain text is binding. Any other form is considered non-binding as soon as a plain text or PDF/A-1a variant exist. All legacy documents which are still valid or relevant to current policies will be transformed within a year of this proposal becoming policy.
I would be very happy if this suggestion would turn into a policy, I can agree with the way it is formulated above. That doesn't mean I can't have some remarks which could still be up for discussion. * I feel it would be even better to have a plain text version be mandatory and authoritative for each and every document. We could allow for a PDF/A appendix to documents (or even a full PDF/A version of said document) which really can't do without. * The RFC style guide [1] could be a good base. Replace ASCII with UTF-8, remove the paging requirement and perhaps some other bits and pieces we don't need and I think we could be there. Having very similar guidelines as other major organizations does seem like a big plus to me. As a relative newcomer to all-things-RIPE, I would have expected there to be a policy about document formats already. Either I can't find it or I'm rightfully surprised there currently isn't one at all. Gerry [1] http://www.rfc-editor.org/rfc-style-guide/rfc-style
On Mar 16, 2013, at 2:20 PM, Gerry Demaret <ml+ripe-list@x-net.be> wrote:
* The RFC style guide [1] could be a good base. Replace ASCII with UTF-8, remove the paging requirement and perhaps some other bits and pieces we don't need and I think we could be there. Having very similar guidelines as other major organizations does seem like a big plus to me.
[...]
FYI (and since it is fun to sign with a specific job-title), Note that the RFC Editor has just finished gathering requirements on how to evolve the series to deal with i18n issues and inclusion of diagrams (the document is approved for publication as RFC last week). http://tools.ietf.org/html/draft-iab-rfcformatreq-03 The RFC community is probably a little conservative as the RFC Series is an 'archival series'. But that draft documents some pros and cons. The issues that Gerry lists above can be found in the requirements draft. --Olaf Kolkman former ARSE (Acting RFC Series Editor) NLnet Labs Olaf M. Kolkman www.NLnetLabs.nl olaf@NLnetLabs.nl Science Park 400, 1098 XH Amsterdam, The Netherlands
On Mon, Mar 18, 2013 at 3:16 PM, Olaf Kolkman <olaf@nlnetlabs.nl> wrote:
Note that the RFC Editor has just finished gathering requirements on how to evolve the series to deal with i18n issues and inclusion of diagrams (the document is approved for publication as RFC last week).
http://tools.ietf.org/html/draft-iab-rfcformatreq-03
The RFC community is probably a little conservative as the RFC Series is an 'archival series'. But that draft documents some pros and cons. The issues that Gerry lists above can be found in the requirements draft.
We would see if we can find consensus on this, but personally, I wouldn't mind following the stricter rules of RFCs in policy documents. This would probably mean using nroff or similar as source format, but the actual documents would be text, if paginated inline. Pagination is a problem though. RFCs do not change very often whereas RIPE NCC policy documents do change quite frequently. Diffs between RFCs are uncommon and mostly useless whereas diffs between policy documents are very useful. This inline pagination would result in horribly mangled diffs, every time a line gets added/removed. I see that this point is being addressed in the draft, but there is no conclusion either way, yet. Other than that, after a quick glance, this draft sounds like a very interesting base for discussion and raises all major points. I know from personal experience how sluggish things are working with IDs and RFCs so I guess there is no ETA for releasing actual results? Richard PS: As noted in this thread, trailing whitespaces would allow reflowing quite easily. A line has a trailing whitespace? Reflow at will. It does not? It's static.
Dear all, the updated, and very belated, version looks like: The canonical and binding format of all policy documents is plain text in ISO-8859-1 encoding. Author names and references may be encoded in UTF-8. If a policy document can not reasonably be stored as plain text, ALTERNATIVE will be used. If any policy document exists in both plain text and ALTERNATIVE, plain text is binding. Any other form is considered non-binding as soon as a plain text or ALTERNATIVE variant exist. All legacy policy documents which are still valid or relevant to current policies will be transformed within a year of this proposal becoming policy. All non-policy documents will be published in plain text or PDF/A-1a with no requirement to try to produce ASCII art or similar. ALTERNATIVE may equal PDF/A-1a or "text file with references to BMPs". Personally, I think I prefer plain text as that's easier to deal with on CLI, but as non-policy documents are in PDF (and if this gets accepted PDF/A-1a) already, it may make more sense to keep both in the same format. Feedback either way would be appreciated. Richard
you're a few days early
Richard, am Fri, Mar 29, 2013 at 05:06:05PM +0100 hast du folgendes geschrieben:
The canonical and binding format of all policy documents is plain text in ISO-8859-1 encoding. Author names and references may be encoded in UTF-8.
ISO-8859-1 *and* UTF-8 content seems a tad ridiculous and backwards. Kind regards Philipp Kern
On 30/03/2013 11:06, Philipp Kern wrote:
ISO-8859-1 *and* UTF-8 content seems a tad ridiculous and backwards.
Would be cool to have them in the same document, though. I love this idea. It's so ... european. But having said that, if Richi wants to aim towards IETF style document processing with text and revisions and all that, could he at least go the whole way and start out with a properly formulated problem statement so that we can look at creating a BoF at RIPE66, which can be used as the basis for creating a working group, which can then aim towards a framework for solving the entire problem of document processing for RIR policy publication. I'm very much in favour of this multiple-incompatible-encoding-per-document idea, but it's clear to me at this stage that we can only solve the problem properly by handling it with a framework specification. Lots of XML too, because we all know that without a functional markup system, free text has plenty of irksome limitations which are frankly a pain in the bum. Structure is a necessity. Honestly, I see scope for an entire stream of RFC style documents, which has the added advantage that we could look at getting research funding for the project and maybe get a couple of uni postgrads in on the act to make sure we're fully buzzword compliant. There's so much scope for argument that College professors will fall over themselves in the rush to get involved. And I've no doubt either that we'll end up having to rewrite git to improve the front-end and ensure that we don't end up with the sort of back-end problems that recently almost trashed the KDE distribution. Reliability is king here. These are important documents which should be enshrined for all eternity, full revision history included, and it would be a gross abnegation of our duty to posterity if we we were to aim for anything other than a 100% solution. This is serious stuff we're discussing, no doubt about it. Careers could be made or broken on the basis of these suggestions. Lives could be lost and someone might not think of the children. Or alternatively, as Randy and Jim suggested, we could drop the whole goddamned thing and concentrate on policy that's important rather than pointlessly wasting time on window dressing, zomg. Nick
side note: the ietf currently allows other formats, e.g. pdf. and the ietf is considering further and more extreme heresies. randy
On Sat, Mar 30, 2013 at 09:59:16PM +0900, Randy Bush wrote:
side note: the ietf currently allows other formats, e.g. pdf. and the ietf is considering further and more extreme heresies.
The IETF also adopted utterly nonsensical marketing terms like "Carrier Grade NAT", so... what to expect? :) SCNR, Daniel -- CLUE-RIPE -- Jabber: dr@cluenet.de -- dr@IRCnet -- PGP: 0xA85C8AA0
On Sat, Mar 30, 2013 at 12:06 PM, Philipp Kern <philipp.kern@kit.edu> wrote:
ISO-8859-1 *and* UTF-8 content seems a tad ridiculous and backwards.
I should have explained the rationale behind this suggestion, sorry. What it boils down to is that normal text would use the standard ASCII chars and only names and references would be allowed to use UTF-8. A different, and maybe better, way to put this is "UTF-8 for the document, but use only ASCII chars within the main text". Richard
participants (10)
-
Alex Le Heux
-
Daniel Roesen
-
Erik Bais
-
Gerry Demaret
-
Nick Hilliard
-
Olaf Kolkman
-
Philipp Kern
-
Randy Bush
-
Richard Hartmann
-
Tore Anderson