Here are some notes on provision of online services such as email, XMPP, shell accounts, DVCS and general file/rsync hosting, etc; the focus is on properly set software and decent public services. I don't have much of experience with public ones, but the notes are mostly on technologies rather than practices, aiming primarily GNU/Linux systems. For private ones, see the notes on private server setup and simpler server setup.
Usually service providers are obliged to assist governments with surveillance and/or censorship, and possibly to follow additional laws on user information handling. Which is not necessarily bad, but worse in some cases than in others (that is, getting servers confiscated, engaging into mass surveillance and/or censorship, and/or setting backdoors to enforce laws that don't make sense would be less desirable than just rarely helping with actual crime investigations, once warrant is provided and targeting individual users), so this should be investigated. Apparently the corresponding Russian law is such that it's better to keep services as far away from it as possible. Estonia provides "e-residency", which possibly may help to provide services under its laws.
Although perhaps even with an oppressive government it is possible (acceptable) to provide a service to those whom it does not affect (i.e., pretty much anyone outside), while following all the regulations, and presenting clearly what they are, as opposed to the common practices of not mentioning it at all, being security- and privacy-oriented but under the radar, in a grey area, and/or (partially) blocked. or boasting security and privacy while in fact following regulations opposing those. A slightly sarcastic presentation picturing benevolent supervisors providing a useful service by filtering "extremist content" and suchlike may also be quite fun. Among features, in place of strict privacy laws it could list some of the local nonsense, but from the point of view of a hypothetical happy citizen (though in order to keep it light, will have to pick something that sounds silly/peculiar, yet not particularly bigoted). Maybe also presenting uncertainty and instability as excitement. Although as of 2021 and in Russia, since many foreign mail servers are being blocked, it seems that delivery failure rate would be unacceptable for a mail service. And then 2022 with Russia's "special operation" happened, at which point having anything to do with Russia became an edgy choice in much of the world. Money transfers became inconvenient and limited, too; it is better to be in a sane jurisdiction, after all.
A relevant discussion (though probably there are plenty more around): "Ask HN: What is the best jurisdiction for internationally distributed teams?".
An user agreement should be prepared carefully, yet be readable.
Payment processors tend to be an issue as well, though some of their issues are just inherited from the bank cards (and most of the others – from trying to mitigate those with fraud detection). The options (e.g., PayPal) are bad, but they work sometimes, more or less.
Service abuse is what brings up some of the legal issues (and even when it doesn't, it's highly undesirable), but apparently it can be mitigated by requiring a small payment for confirmation, which is straightforward with regular bills, but viable with donations as well (e.g., as sdf.org does).
Though from the perspective of someone reporting network abuse, it seems pretty good if an abuse reporting email exists, is checked, and something is done about it at least after reporting. Probably those who don't care much just go ahead and run services without sorting out the abuse, and those who care too much don't even try to run such services; a good balance is needed.
SSH is one of the most widespread protocols with good authentication and software implementations, useful for both regular shell accounts and the ones restricted to provide specific functionality (email and DVCS, for instance), needed for pubnix-style systems.
Better isolation and restrictions than regular file permissions are desirable in systems shared among strangers. Some of the ways to set such restrictions can be observed in the hashbang/shell-server's "security" task, and here is the list I have collected:
sshd(8)
can (and does by default on Debian,
see sshd_config(5)
) use pam(8)
, including
session management modules such as pam_limits(8)
(which
sets ulimit
and nice
,
see limits.conf(5)
) and pam_namespace(8)
(which sets polydirs such as per-user tmp
directories, see
namespace.conf(5)
). These are user-space and not
necessarily reliable, "PAM escape" via certain programs is possible --
so those should be limited too.
hidepid=2
for proc(5)
, newinstance
for devpts
(documented in mount(8)
), etc.
Mounting /tmp/
into memory and avoiding swap can be
useful for both performance and security. Disk partition encryption
with LUKS/dm-crypt would also be useful to reduce the risk of
compromising user data, though that applies to computing in general.
systemctl(1)
can be used to set those
(see systemd.resource-control(5)
) and limit resource
usage for PAM sessions. I wonder why hashbang.sh only seems to set
that for non-interactive sessions.
iptables-extensions(8)
there's
the owner
extension, which allows to match
outbound packets on local users and groups. This seems
useful for limiting user network capabilities without
limiting system services.
For more restricted services, there may be no need in shell
access, or in system users altogether, but other SSH uses may
still be desired. There are SSH server libraries for that
(e.g., libssh, or a Haskell ssh library; libssh2 may be better
to avoid, with its rather bad track record and regularly found
vulnerabilities; though years after writing this, I used
libssh2 for an SFTP client, and ran into memory leaks with
versions 1.7.0 and 1.9.0, then used libssh, and ran into an
infinite loop in sftp_open with version 0.7.3, though
apparently not in 0.9.8; so no feature-complete SSH library
seems to have a particularly good track record), and many
per-key restrictions can be defined
in authorized_keys
files or encoded into
certificates with OpenSSH (see sshd(8)
for the
documentation), including command restrictions. It may be too
restrictive for some programs (where the arguments should be
dynamic), but wrappers could be used for those.
Gitea, for instance, forces execution of its own command
(via command
in ~git/.ssh/authorized_keys
for each added
user), and disallows everything but command execution (as used
by git), manually ensuring that commands are git ones, and
checking repository access privileges using its own
rules. While rsync provides the rrsync
script,
also to be set via command
, only allowing rsync
to be used, and restricting it to a certain
directory. rssh
similarly restricts commands
available over SSH, mostly to file transfer ones.
VPN (IPsec, WireGuard, etc) usually provides both encryption and authentication, convenient for running simple protocols (unencrypted, maybe with host-based authentication) on top of it. Additionally, it may be convenient for connections between users.
PAM authentication may be nice to reuse for everything (possibly via SASL), especially if shell access is provided, but unfortunately it's only usable for plaintext authentication.
SASL is nice for uniform authentication across services. Usually it is not tied to system users, and can be used with LDAP (and so can PAM). See the "user authentication" note for more on the topic.
To detach users from the underlying operating system (that is, to avoid using system users), possibly using a shared user directory across multiple servers, LDAP is a common option.
Applicability of different methods depends on the kinds of data stored. Some of the common ones are rsync, database replication and other built-in/specialized backup/synchronization methods, mirroring with RAID (1 in particular), DRBD (see the section on HA).
A decent service shouldn't trap users, so horizontal scaling should be as easy as setting identical systems, relying on federated protocols for interoperation. Configuration management systems such as Ansible are useful for that. Though high availability (see the next section) usually involves redundancy, which can easily provide scaling in some cases as well.
There are nice tools for highly-available (HA) clusters around: pgpool-II (for PostgreSQL), DRBD + GFS2/OCFS2 (for a distributed filesystem), Pacemaker (for general resource management/failover, including services and automated setting of load balancing via IP multicast). All those are available from Debian repositories, and seem to be maintained, used fairly widely.
It is rather hard to be certain that a complex system would
function properly under unexpected loads. Stress testing
should be performed, and other iptables
extensions could be useful here, such
as hashlimit
to set per-IP limits.
Monitoring (with munin, Zabbix, or something along those lines) should be helpful for capacity planning.