Work-related voice conferences never seemed handy to me: they are strictly real-time (quite an issue if you don't maintain a sleep regimen or any of participants have anything else scheduled), effectively half-duplex (only one person can talk at a time for the speech to be intelligible), there is no reliable and easy way to get greppable logs (transcriptions), and unless it is combined with textual chat, there is no way to copy and paste texts, to share links or program output.
They are probably good for multiplayer games, when you are busy controlling a character, but also need to coordinate actions timely. They may also be nice for those who are not used to reading and typing, for a casual chat, and while using less convenient devices (e.g., phones) to communicate.
Here are my notes on nicer protocols and software for those.
An unpleasant thing about voice communication is speaker recognition: coupled with unencrypted or unknown protocols, surveillance, and data breaches, it can be quite uncomfortable to use. So my initial requirements are end-to-end encryption, an open protocol, at least an open source (preferably libre) client for GNU/Linux in existence, preferably a FLOSS server that is easy to deploy (if the protocol needs a server).
Apparently the requirements imposed by the majority of users, which should also be taken into account in order to actually use such a protocol, are that it should be extremely easy to set and to use on various systems: not more than a few mouse clicks or touchscreen taps. Perhaps being well-known is another thing that is important to inexperienced users, since the less known things they tend to find tend to be malware even by the relaxed, non-RMS definition of malware.
And the obvious requirement for it is to work well: acceptable sound/video quality (no perceivable noise, pauses, or delays) even over poor connection, perhaps NAT traversal, etc.
There is a comparison of VoIP software and a few more lists in Wikipedia, and in the YBTI map. Apparently casual users mostly think in terms of client software that implements those protocols, so the clients are even more widely known.
WebRTC looks like bloat in web browsers, but it is handy: NAT traversal (ICE, STUN, TURN) is present, end-to-end encryption (DTLS), voice and video conferences, supported by common web browsers for a while now, making it relatively easy to use: a single mouse click to get into a conference. It is not perfect, but open and standardized, and reuses other standards.
I found it quite painful to use with public servers sometime around 2015, and UDP hole punching did not seem to work well, with random ports making it harder to fix manually, and without relevant IRC channels or XMPP conferences in sight, ultimately failing to actually use it, but possibly things have improved since.
After 2020, I observed that Jitsi Meet uses WebRTC (and Jitsi Videobridge bridges it to Jitsi's regular SIP), and works fine. As of 2024, it is not in Debian repositories (because of its many JVM-based dependencies complicating the packaging, and those would be quite heavy anyway), but there is Janus, a WebRTC server, and Jangouts to go with it (along with coturn, nginx, etc). They are relatively lightweight, for WebRTC software, but there are awkward bugs (e.g., I noticed the Jangouts issue #439), Jangouts looks abandoned, and WebRTC in a web browser seems unneecessarily complicated for such a task.
Opus is a good codec, though even PCM would work fine in many cases. RTP and Ogg are both fine for streaming, but normally require an external negotiation to pass metadata, for which SIP is employed sometimes. Though just as with reading from files, sometimes the identification data can be retrieved from stream packets directly (such as Opus headers, magic signatures). RTP can be used with SRTP, while anything can be used over more general (and preferably UDP-based) encryption protocols: DTLS, IPsec, WireGuard. Custom container formats replace RTP sometimes, and those are fairly simple. Hacks for NAT traversal have to be employed commonly.
But much of that is not easily applicable, given that software choices are often limited by the users' systems, experience, time availability and willingness to experiment and to tinker with it.
In addition to protocols (and related software) covered here, one should set a microphone (see computer hardware notes), and noise and/or echo cancellation (see, for instance, the CentOS 7 workstation notes).
As of 2024, perhaps the only FLOSS option for voice conferencing I observed working fine in practice (smoothly, even with casual computer users) is Jitsi Meet with web browsers, which uses WebRTC, even though it is awkward to deploy. Mumble looks like it should work, too, but I had no opportunity to try it with others in a non-test setting.
The problem does not look hard, and its parts are somewhat solved, yet actually having a voice conference is challenging still. Perhaps even more so than file transfer between users.