These are my notes on authoring notes in HTML, authored in HTML. As usual, I am following my web design checklist, but also targeting semantic web here.
As with other technologies, it is useful to inspect history in order to understand it better. A history of HTML, www-talk archive, CERN 2019 WorldWideWeb Rebuild, initial HTML tags, HTML specification draft, RFC 1866 (HTML 2.0), HTML history in Wikipedia are helpful for that, though unfortunately the period when it was getting shaped (particularly when images and forms were introduced, between 1992 and 1995) is missing from the www-talk archive. The adjacent Gopher history provides more context: e.g., The Internet Gopher from Minnesota.
As a rant about its more recent history, XHTML used to be XML-based, machine-friendly, well-integrated with XSLT, easy to manipulate. Then HTML 5 incorporated it as its XML syntax, then HTML LS replaced that and discouraged use of the XML syntax, and now we are back to regular messy HTML. RDFa similarly seemed promising, embedding machine-readable semantic data into regular web documents, but it went a similar route. The state of web publishing is not great, and maybe other markup languages could work better, but this is what we have, and authoring in HTML (or with aid of XSLT and a little custom XML) provides more control.
I used to publish this website as a valid HTML 5 with XML syntax, then switched to non-XML syntax as that was retired, but now it fails validation due to stray closing tags (and I fail to find a proper, non-hacky way to get rid of those) and some xmlns attributes. Yet it is probably more compatible with common tools this way, than with the XML syntax and without errors.
One should be careful with CURIEs – that is, read that
section. I have spent some time debugging a document, mostly
because of skipping it. Other than that, it's pretty simple, as
can be seen in this page's source: multiple vocabularies can be
used rather easily, and there is a choice of terms. Document
metadata goes into the head
element, the rest gets
embedded via RDFa attributes.
Some of the HTML-specific metadata (see standard metadata names) is redundant while there is RDFa, yet some software may rely on it, so it might be worthwhile to cover.
Mixing duplicate attributes such as rel
and property
leads to strange results, so perhaps
it's better to avoid. Though in some cases the HTML ones should
be used together with RDFa ones: for instance,
the link
elements must have href
attributes, so one gets limited to single plain URIs without
prefixes in those, while they are the primary way to set
document metadata – yet the property
attribute is
still handy to use.
DOCTYPE is reduced to a "legacy string" now, but it is still there. And I have not found DTDs for HTML 5.
Explicit and semantic sectioning (section
elements)
looks neat, but I have not seen it actually being used by
clients, and it complicates editing, so I ceased adding it.
The header
, footer
,
and nav
elements also look neat on the first sight:
one can put creation and modification dates there, license
information, navigational links (just a "home" link would be
sufficient for this website). But those are common enough for
client software to deal with them based on metadata; otherwise
it is like bloating the documents, but marking the bloat, so
that it can be removed.
Hyperlinks make HTML editing awkward, so I composed the html-wysiwyg minor mode.
There is duplicate data in the documents, too much to write it manually each time. A skeleton document can be used, but it may get tricky to introduce global changes into the resulting documents then (though still possible to do reliably, since the data is structured). So I've composed an XSLT to translate a simpler XML into the resulting files, and published it in my homepage repository, along with XSLTs to produce indexes and atom feeds. Work with file paths gets a bit awkward with those.
Paragraphs are annoying to compose, but not sure if there's a reliable way to detect and mark those automatically. Though inserting them is easy in the emacs html-mode: likely because of the annoyance, their insertion is bound to C-c RET by default. "Skeleton commands" in general are handy when there is repetition.
Setting fill-column to 80 in .dir-locals.el
helps to compensate for the nesting-caused indentation.