Semantic Web

For a while I kept seeing references to Semantic Web, RDF (+ RDF article on Wikipedia), and related projects (such as OWL, wikidata, dbpedia, dublin core, WebID). Used to mostly ignore them, not seeing an immediate application, being a bit appalled by XML being around it, being uncertain regarding sufficiency of RDF expressiveness. The "web" part itself was appalling as well, being associated with bloating, poor accessibility, everything broken and poorly reinvented: from <marquee> and frames to JS frameworks and SPAs. But "web" in "semantic web" stands for the concept of interlinked documents, not for the mess that currently dominates HTTP. There's plenty of potential applications, too: some of those are similar to the ones I've tried (e.g., the "Semantic UI" note), to the ones I thought would be handy for various tasks (knowledge representation in general), and finally – there's FOAF, potentially useful—among other things—for search in distributed systems. Now it seems that I've missed quite a large chunk of nice technologies, as it happens from time to time, but it's nice to finally discover them.

While RDF is a framework, there's a few formats for its serialisation: some are more readable (I quite like Turtle); some are simple and handy for streaming (particularly N-Triples); RDFa (or RDFa Lite) allows embedding into other XML/HTML documents; they can be linked as alternate versions of HTML documents, of course, and there's a few other formats. That's actually a bit of an issue, too, since some of the software that works with RDF then has to support multiple formats (I've already stumbled upon that while adding RDF support into pancake) – but that's the price of having a choice.

RDF is indeed not as expressive as some languages, but it's very simple instead.

To try it out, one could grab librdf-based CLI tools and libraries (packages in system repositories may be called raptor2 and rasqal; rapper is handy to just parse/convert the documents, roqet is for SPARQL experimentation, and rdfproc – for storage/retrieval/processing), Apache Jena for Java, rdf4h and swish (and a few more) packages for Haskell (I've also started writing librdf bindings), OpenLink Virtuoso as a database. There are online/web-based tools for search and browsing, too, such as FOAF search and Semantic forms. There's also sparql-mode for Emacs, and OpenLink Structured Data Sniffer add-on for Firefox.

Unfortunately some of the links in specifications and other related documents are dead, and generally this whole thing is not very popular and well-maintained, but SWIG has an active and helpful channel on Freenode/Libera.chat (#swig), there are well-written—and apparently thought-through—materials/specifications/standards.

When composing a new RDF document, apparently it's suggested to link corresponding DBpedia articles, and there's a list of popular RDF namespace prefixes to find additional common predicates.

Some of the common vocabularies are included into RDFa Core Initial Context. In addition to those, there are the ones like The Music Ontology and Food Ontology around.

On the same day when I've finished writing the initial version of this note, ActivityPub (one of Social Web Protocols) got standardised – based on (or perhaps merely compatible with) RDF as well. The protocols on which it is based are used for federated microblogging, in which I wasn't interested, but it looks fine (though I'm not a fan of using HTTP for everything, while JSON-LD is awkward and SemWeb integration was not a priority there (see the JSON-LD and Why I Hate the Semantic Web article by a JSON-LD creator), there's no FOAF integration, authentication is not specified, with the suggested way being OAuth 2.0, and a few other strange/unpleasant bits).

An irritating issue I've noticed is that many web pages which provide RDF metadata, do so using ontologies oriented on (that is, developed by, and rather specific to) centralised services.

Tim Berners-Lee's "Design Issues" include interesting Semantic Web posts, too.