HTML

These are my notes on authoring notes in HTML, authored in HTML. As usual, I am following my web design checklist, but also targeting semantic web here.

History

As with other technologies, it is useful to inspect history in order to understand it better. A history of HTML, www-talk archive, CERN 2019 WorldWideWeb Rebuild, initial HTML tags, HTML specification draft, RFC 1866 (HTML 2.0), HTML history in Wikipedia are helpful for that, though unfortunately the period when it was getting shaped (particularly when images and forms were introduced, between 1992 and 1995) is missing from the www-talk archive.

As a rant about its more recent history, XHTML used to be XML-based, machine-friendly, well-integrated with XSLT, easy to manipulate. Then HTML 5 incorporated it as its XML syntax, then HTML LS replaced that and discouraged use of the XML syntax, and now we are back to regular messy HTML. RDFa similarly seemed promising, embedding machine-readable semantic data into regular web documents, but it went a similar route. The state of web publishing is not great, and maybe other markup languages could work better, but this is what we have, and authoring in HTML (or with aid of XSLT and a little custom XML) provides more control.

I used to publish this website as a valid HTML 5 with XML syntax, then switched to non-XML syntax as that was retired, but now it fails validation due to stray closing tags (and I fail to find a proper, non-hacky way to get rid of those) and some xmlns attributes. Yet it is probably more compatible with common tools this way, than with the XML syntax and without errors.

Metadata

RDF

One should be careful with CURIEs – that is, read that section. I have spent some time debugging a document, mostly because of skipping it. Other than that, it's pretty simple, as can be seen in this page's source: multiple vocabularies can be used rather easily, and there is a choice of terms. Document metadata goes into the head element, the rest gets embedded via RDFa attributes.

HTML

Some of the HTML-specific metadata (see standard metadata names) is redundant while there is RDFa, yet some software may rely on it, so it might be worthwhile to cover.

Mixing duplicate attributes such as rel and property leads to strange results, so perhaps it's better to avoid. Though in some cases the HTML ones should be used together with RDFa ones: for instance, the link elements must have href attributes, so one gets limited to single plain URIs without prefixes in those, while they are the primary way to set document metadata – yet the property attribute is still handy to use.

DOCTYPE and DTDs

DOCTYPE is reduced to a "legacy string" now, but it is still there. And I have not found DTDs for HTML 5.

Sectioning

Explicit and semantic sectioning (section elements) looks neat, but I have not seen it actually being used by clients, and it complicates editing, so I ceased adding it.

The header, footer, and nav elements also look neat on the first sight: one can put creation and modification dates there, license information, navigational links (just a "home" link would be sufficient for this website). But those are common enough for client software to deal with them based on metadata; otherwise it is like bloating the documents, but marking the bloat, so that it can be removed.

Editing and preprocessing

Hyperlinks make HTML editing awkward, so I composed the html-wysiwyg minor mode.

There is duplicate data in the documents, too much to write it manually each time. A skeleton document can be used, but it may get tricky to introduce global changes into the resulting documents then (though still possible to do reliably, since the data is structured). So I've composed an XSLT to translate a simpler XML into the resulting files, and published it in my homepage repository, along with XSLTs to produce indexes and atom feeds. Work with file paths gets a bit awkward with those.

Paragraphs are annoying to compose, but not sure if there's a reliable way to detect and mark those automatically. Though inserting them is easy in the emacs html-mode: likely because of the annoyance, their insertion is bound to C-c RET by default. "Skeleton commands" in general are handy when there is repetition.

Setting fill-column to 80 in .dir-locals.el helps to compensate for the nesting-caused indentation.