What do you want from a picture in a video call? Basically, you don’t want it to pixelate, freeze or hang up the whole call.These are the basics of real-time video, and it’s not that hard to achieve.
The fun part starts when you want as many people as you want on one call. And that everyone could turn on the video, not just watch it; that the screen-sharing resolution would be 4K; that the sharing would stay super crisp with any internet connection, etc. And for calls to work on all platforms and devices on mobile unstable internet.
How we achieve all this in VKontakte calls, what kind of hacks we use in the settings, how we save traffic and CPU, how we fight for latency and where we had to bypass WebRTC, read this below.
Clearly, things can be different on clients. But the delays on our backend are just in the order of the range of undetectable desynchronization.
That’s why we gave up lipsync. We transmit audio and video independently from the server, and they are not synchronized in any special way on the client – we hope that the differences will be insignificant and users will not notice them. If we find out that it bothers users, we can turn it on. But so far we are doing fine without lipsync and, most importantly, with less lag.
Pipeline for working with video
As a result, our video pipeline enabled us to support 4K including for screencasting, saving more than 60% of client-server traffic and 10-20% of client and server CPU. So soon we will be able to completely remove the limit on the number of participants in a call, and all of them will be able to connect with video.
Pipeline includes :
adaptive codec selection depending on network and device: use VP9 for high quality video at low bitrates and H.264 at high bitrates for less load on the device;
SFU-topology for group calls with server-side transcoding of video to lower resolutions – this saves client resources and combats the reference frame storm;
Quality on-demand , or encoding outgoing video in quality-on-demand, – also to reduce the load on clients and save traffic.
screen demonstration bypassing WebRTC – This lowers the FPS, but keeps the image sharp and supports 4K.
The pipelining does not include audio and video synchronization, which allows us to have the lowest latency among popular video conferencing services.
In the following article, let’s gather all the knowledge about working with real-time audio and video and build the architecture of a video calling service with no limit on the number of participants.
Acknowledgements: Ivan Grigoriev (@Ivan_A), Andrey Petukhov, Nikita Tkachenko, Andrey Morzhukhin helped to collect data and prepare this article.
P. S. Today is a big day : we are releasing our call SDK to the market. In it we packed all our technologies to implement group video calls. Previously, this library could only be embedded into VKontakte by Mail.ru Group services, such as Uchi.ru, Sphereum, Odnoklassniki and Mail.ru Calls. Now that all the features have been tested on these large-scale platforms, we offer them to other IT businesses as well.