Distributed systems

There are quite a few definitions of "distributed" and "decentralized" in use, in this note I'm using the following ones:

Centralized
Clients interacting with a single server (either physical or controlled by the same entity).
Decentralized
Clients interacting with multiple servers (controlled by different entities), which often build a federated network.
Distributed
Clients interacting with other clients directly, acting as servers themselves.

A "system" may also mean different things; here I focus on network protocols, on systems of network-connected independent actors.

Distributed systems are useful for various purposes, but the commonly considered and achievable niceties are:

These are mostly shared with federated systems, but take it further.

The common advantages of centralized systems over these seem to be search/discovery, often sort-of-free hosting for end users, greater UI uniformity in some cases, easier/faster introduction of new features.

Usable systems

Actually usable (reliably working, specified, having users and decent software) systems so far are usually federated/decentralized; those can, in principle, be quite close to distributed systems (simply by setting their servers on user machines). So, generally it seems more useful to focus on those if the intention is to get things done: SMTP (email, possibly with OpenPGP), NNTP (Usenet), XMPP (jabber), and HTTP (World Wide Web; possibly together with RSS or Atom, and their aggregators, and/or RDF) are relatively well-supported, standardized, and usable for various kinds of communication.

Sometimes even centralized but non-commercial projects and services are okay: OpenStreetMap, The Internet Archive, Wikimedia Foundation projects (Wikipedia, Wiktionary, Wikidata, Wikibooks, etc), arXiv, FLOSS projects, possibly LibGen and Sci-Hub (though they infringe copyright), possibly Libera.Chat (but they had issues arising out of centralization, which is why it is not Freenode anymore). As long as they are easy (and legal, and free) to fork and aren't in a position to extort users, centralization can be fine. Conversely, there can be technically distributed systems effectively controlled by a single entity (e.g., a distributed PKI with a single root, or anything legally restricted). While this note is mostly about distributed network protocols, they are neither necessary nor sufficient for a community control over a system, but rather just may be a useful tool to achieve it.

Existing systems

There are quite a few of them; I am going to write mostly about those that work over Internet. There's also the "Distributed computing architecture" Wikipedia category, including thing slike cluster computer, grid computing, etc.

Generic networks

Tor and I2P: both support "hidden services", on top of which many regular protocols can be used, but it is more about privacy (and a bit about routing) than about decentralisation: they provide NAT traversal, encryption, and static addresses. Tor documentation is relatively nice, and there are I2P docs. Tor provides a nice C client, I2P uses Java.

Mesh networks

Some mesh networks, like Telehash, provide routing as well, though advantages for decentralisation seem to be similar to those of Tor and I2P; just better in that they extend it beyond the existing networks, aiming to build more. Telehash documentation is also pretty nice and full of references.

Cjdns (or its name, at least) seems to be relatively well-known, but it relies on node.js. Netsukuku and B.A.T.M.A.N. are two more protocols the names of which are known.

One of the large Wi-Fi mesh networking projects is Freifunk, but apparently it's only widespread in DACH countries.

Those would be nice to get someday, but they would require quite a lot of users to function, and various government restrictions seem to complicate their usage (this varies from jurisdiction to jurisdiction and from year to year, but seems to be pretty bad in Russia in 2018, and even worse by 2023).

And then there are the ones working over Internet, building overlay networks, usually with technologies similar to those used for VPNs (though yet again, in Russia by 2023 they seem to be about to start blocking protocols used for VPNs, with occasional outages/likely testing reported). Yggdrasil is like that. There is an overview of similar mesh networks: "Easily Accessing All Your Stuff with a Zero-Trust Mesh VPN".

IM and other social services

See also: Distributed state and network topologies in chat systems.

File sharing and websites

Web crawling

YaCy and a few more (some of which are dead by now) distributed search engines exist. I have only tried YaCy, and it works, though haven't managed to find its technical documentation – so it's not clear how it works.

Other information

These networks include search for files, but by their names, not content-addressable (so they can't be easily verified, which brings additional challenges).

Related papers:

Cryptocurrencies

Plenty of those popped up recently. Bitcoin-like ones (usually with a proof of work and block chaining) look like quite a waste of resources (and perhaps a pyramid scheme) to me, though the idea itself is interesting. I was rather interested in "digital cash" payment systems before, but those didn't quite take off so far.

As of 2021, Bitcoin-like cryptocurrencies seem to be eating other distributed projects: many of those are merged with their custom cryptocurrencies, or occasionally piggyback on existing ones, but either way they become more complicated and commercialized. As of 2022, the "crypto" clipping seems to be associated more widely with cryptocurrencies and related technologies than with cryptography in general. But as of 2024, it seems that the hype wave is mostly over, with "AI" (generative stuff) filling up all the hype slots.

General P2P networking tools

GNUnet

Not sure how to classify it, but here are some links: gnunet.org, GNUnet article in Wikipedia, "A Secure and Resilent Communication Infrastructure for Decentralized Networking Applications". Seems promising, but tricky to build, to figure how it all works, and to do anything with it now (a lack of documentation seems to be the primary issue, though probably there are others). Apparently it is also being blocked in Russia by 2024, at least the gnunet.org website is (via TSPU, it seems), which makes it yet harder to debug. Apparently it is easier to setup in a single-user mode, but none of the retrieved bootstrap peer addresses seem to be available. An up-to-date hostlist can be found (having to use some proxying to access lists.gnu.org from Russia, where it is blocked as well), and then bootstrapping works.

Taler and secushare (using PSYC) are getting built on top of it, but it's not clear how's it going, how abandoned or alive it is, etc. Their documentation also seems to be obsolete/outdated/abandoned/incomplete. Update (January 2018): apparently secushare prototype won't be released this year.

libp2p

libp2p apparently provides common primitives needed for peer-to-peer networking in the presence of NATs and other obstructions. At the time of writing there's no C API (so it's only usable from a few languages) and its website is quite broken. At the same time worldwide IPv6 adoption reaches more than 32%, so possibly NATs will disappear before workarounds will become usable.

General tools useful for P2P networking

Many netowrking-related tools can be used for peer-to-peer networking. socat(1) is among particularly flexible tools for relaying, which can be combined with many other Unix tools for ad hoc networking: openssl, gnutls-cli, and netcat for data encryption and transmission, sox, opusenc, rec, play, pw-record, pw-play, ffplay for audio capture, encoding, decoding, and playback.

Generic protocols

There are more or less generic network protocols that may be used, possibly together with Tor, to get working and secure peer-to-peer services.

SSH is quite nice and layered. Apparently its authentication is not designed for distributed systems (such as distributed IMs or file sharing), its connection layer looks rather bloated, and generally it's not particularly simple. Those are small bits of a large protocol, but they seem to make it not quite usable for peer-to-peer communication.

TLS may provide mutual authentication, and there are readily available tools to work with it.

IPsec uses similar to TLS, but is a generally better way to solve the same problems. Individual addresses (which IPv6 should bring) are needed to use it for P2P widely though. IPv6 gets adopted, but slowly. Once computers will become addressable individually (again), and transport layer encryption will be there by default, it may render plenty of the contemporary higher-level network protocols obsolete.

Pretty much every distributed IM tries to reinvent everything, and virtually none are satisfactory, but at least some of the problems are already solved separately: one can use dynamic DNS, Tor, or a VPN to obtain reachable addresses (even if the involved IP addresses change, and/or are behind NAT), and then use any basic/common communication protocol on top. Or even set a VM and rely on SSH access, communicating inside that system then.

Search, FOAF, and the rest of RDF

Some kind of a distributed search/directory may connect small peer-to-peer islands into a usable network. While it is hard to decide on an algorithm, lists of known and/or somewhat trusted nodes are common for both structured and unstructured networks, as well as for use of social graphs: if those would be provided by peers, a client may decide by itself which algorithm to apply. This reduces the task to just including known nodes into local directory entries, which can be shipped over any other protocols (e.g., HTTP, possibly over Tor).

Knowledge representation, which is needed for a generic directory structure, is tricky, but there is RDF (resource description framework) already. There is FOAF (friend of a friend ontology), specifically for describing persons, their relationships (including linking the persons they know), and other social things. A basic FOAF search engine must be fairly straightforward to set: basically a triple store filled with FOAF data. See also: Semantic Web.

Hubs and addressing

As mentioned in the "usable systems" section above, the systems relying on peering seem to fare better in practice: they are still distributed, on the level of servers (or hubs generally), which then take care of tricky parts on behalf of the users. This is also how postal systems, telephone ones, and the Internet itself are organized. And some of those federated systems can be quite close to distributed ones: for instance, it is easy and viable to set an XMPP or a WWW server on one's personal machine, although normally addressing is centralized in those cases.

The Magnet URI scheme combines content addressing, which is not centralized, with a list of addresses to bootstrap from. Perhaps one can similarly use public keys, with claims signed by those, which would be very similar to certificates and key servers. No nice and human-readable addresses that way, as usually is the case with distributed addressing, but this creates a decentralized identity, decoupled from any particular nodes.

There is the similar concept of self-sovereign identity, with decentralized identifiers (DIDs) as a fairly generic framework. Similarly to Activity Streams, they are based on the awkward (but RDF-compatible) JSON-LD. See DID Methods for more specific specifications, though many of those are blockchain-based (probably because DID appeared when those were particularly hyped/popular).

GNUNet's GNS (RFC 9498) has a DID method defined. It combines local "pet names" (aliases) and memorable labels (subdomains), with public keys as unique zones (identifiers). For DID identifiers, they simply use GNS zone keys, and store DID documents as records of type DID_DOCUMENT under the "apex label". Zone delegation is similar to that of regular DNS. Both GNS and R5N (GNUNet's DHT) look fine. But TLSA records don't seem to work with its dns2gns, and even if they did, they would not be trusted without DNSSEC, while CAs do not support GNS. So the software would have to support GNS explicitly, at which point it could as well use GNUNet's CADET instead of TLS. But the main GNUNet implementation is under AGPL, which is not likely to help a wide adoption via embedding into existing software.

Another effort to organize name lookups not dependent on ICANN is OpenNIC, but there is an alternative DNSSEC hierarchy, including the keys at root, which breaks usual validation for ICANN domains. And it is still a centralized system. Maybe memorizable and human-readable addresses are not that important anyway: it seems that people rarely remember those, do not operate those directly (using non-unique nicknames instead), and happily use phone numbers, sometimes even preferring those over memorizable addresses.

But back to more practical (readily usable) systems, OpenPGP certificates actually are quite similar to Magnet links, in that they ship a public key, along with one or more identities, which usually are email addresses, and those can be retrieved by various means (DANE, WKD, various key servers, manual exchange, etc). I think it keeps being "pretty good", for many use cases.

Weather data

Except for common messaging and file sharing, one of the distributed (or at least federated) system applications I keep considering is weather data sharing: it'd be useful, and it's quite different from those other applications.

Weather data is commonly of interest to people, and it's right out there, not encumbered by patents or copyright laws, just has to be measured and distributed. But commercial organizations working on that try to extract some profit, so they don't simply share that data with anyone for free. There are state agencies too, paid out of taxes, but at least in Russia apparently you can't easily get weather data out of it either -- only a lot of bureaucracy, and even if it was possible, there are many awkward custom formats and ways to access the data, which won't make a reliable system. People sharing this data with each other would solve that problem.

Though there is at least one nice exception: the Norwegian Meteorological Institute shares weather data freely and for the whole globe. While Germany has Deutscher Wetterdienst: API, and the US has weather.gov. Also open-meteo.com appeared recently.

The challenges/requirements also differ from those with messaging or file sharing, since there's a lot of data regularly updated by many people, and potentially being requested many times, but confidentiality isn't needed. There already are protocols somewhat suitable for that: NNTP (which is occasionally used for weather broadcasts, just in a free form), DNS, and IRC explicitly aim relaying; SMTP (with mailing lists) and XMPP (with pubsub) may be suitable too, possibly with ad hoc relaying.

For reference, as of 2022 there are about 1200 cities with a population of more than 500 thousand people; individual hourly measurements from each of those would constitute a message per 3 seconds. Wouldn't harm to have more than one weather station per city, to cover smaller cities, and so on, but the order seems to be manageable even with modest resources and without much of caching or relaying, assuming that there are not too many clients receiving all the data just as it arrives.

The links/peering can be set manually, and/or data can be signed (DNSSEC, OpenPGP, etc) and verified by end users with a PKI/WOT; the former may just be simpler, and appears to work in practice.

Collaboration/coordination/organization is likely to be tricky, though possible: plenty of people contribute their computing resources to BOINC projects, OONI, file sharing networks, and so on. But weather collection is different in requiring special equipment (at least a temperature sensor) being set outside, complicating contribution.

Post-quantum cryptography

Many of the protocols mentioned here rely on asymmetric cryptography, which is particularly vulnerable to attacks by a quantum computer, and it seems that at this rate we may have usable quantum computers before widely used distributed systems. Use of symmetric cryptography, or at least cryptographic agility of the protocols, is needed to mitigate that.

Beyond technologies

Primarily technologies are covered here, but non-technical means may be quite helpful as well. Social skills and connections may be more useful to stay connected, and to actually engage into social activities. While a decent government is supposed to help people, rather than to be a threat actor, both online and offline. Throw in good ISPs, and a few centralized systems maintained by well-meaning and competent people, and one wouldn't even need any channel encryption for most tasks.

People don't quite work that way though, with governments apparently trying to turn into autocracies, any non-awful ISPs being acquired by awful ones, people in general being prone to mischief, and some of them engaging into crime, so some technical measures are needed, but some social and organizational help is important as well.

Additionally, the combination of social connections and relatively basic technologies allows to build friend-to-friend networks, reducing network abuse.

Yet another approach to consider is the focus on a more delayed communication, through years or centuries, via books and similar larger works: near-real-time communication can be blocked or otherwise disrupted relatively easily, but if the delays are already longer than the regimes that impose network blocking, or than transient network issues, such communication is unaffected. Related notes: personal data storage.

Users

Distributed systems, particularly when used for social activities, require users – so that there would be somebody to send messages to in case of an IM. That's quite a problem, since even by sticking to federated protocols it is easy to lose or decrease contact with people.

People in general are capable of dealing with even more complicated and less sensible systems, as digital bureaucracies demonstrate, but apparently not motivated enough. I am somewhat interested and motivated myself, yet occasionally after looking at software with many dependencies, reinventing many parts, and generally going against what I view as good practices, I do not feel motivated enough to try those.

Search in particular is tricky in such systems, though usually some form of communication with strangers and self-organization (e.g., via multi-user chats, web pages) is possible, so that people can find groups with shared interests. Perhaps being sociable is easier and more useful than technical solutions there, too.

See also

Not quite about collaborative protocols like those listed above, but just about distributed computing (including software design aiming multiple servers controlled by a single entity), there's a nice "A Note on Distributed Computing" paper.