Empowering Yandex Cloud Video with AI: Unveiling Our Latest Service Enhancements

The year was 2025. After enduring an unfortunate comparison to the human brain, struggling with the XOR problem, experiencing complete obscurity, competing with 120 dog breeds in ImageNet, and navigating through memes about Optimus Prime alongside a variety of revolutions, neural networks emerged into the world—a part of that very force that eternally seeks malevolence, as evidenced by various prompts, yet somehow continues to deliver good. It’s no surprise that nearly every service wanted to incorporate this goodness.

Yandex Cloud Video is no exception—a platform for managing video content that our Yandex Infrastructure team launched within Yandex Cloud a year ago. However, as we approached the release of a new version, we found it too mundane to simply add neural networks as an extra feature. Our aim for this release was to make the service better understand human tasks in every sense. Consequently, we:

— Introduced smart features to enhance content usability for users without adding extra work for editors or administrators;
— Factored in information security needs for those hosting videos on the platform;
— Offered monetization options for users who earn from their videos;
— Implemented player customization and various other useful enhancements.

As a result of all these improvements, we embraced neural networks, trained the service to monetize through the Yandex Advertising Network and AdFox, paid special attention to latency on the client side, and made the video management platform accessible to all cloud users. I’m Alexey Gusev, leading the development team for the Yandex media platform, which includes video platforms, telephony, WebRTC, and CDN. Today, I’ll share insights into some of the most interesting developments in the revamped Cloud Video.

Numerous diverse teams utilize Cloud Video, ranging from Yandex services like Kinopoisk and Yandex TV to external users within Yandex Cloud (with even Habr users being able to embed the player in their articles). Since the initiation of our testing, we’ve gathered a wealth of scenarios that helped refine aspects that weren’t always evident from a single perspective.

Many of the new neural features derive from the Yandex Browser team, who previously demonstrated how they learned to create subtitles for any video and preserve the speaker’s tone and intonation during neural translation. However, since Cloud Video users are not only viewers but also content editors, our primary focus was on adding summarization and automatic chapter tagging with timestamps.

Here’s a clear example of how this functionality works.

Though clickable timestamps can be manually set, we aim to save the editor’s time. By default, the original language is recognized automatically, but it can also be specified manually via the API for enhanced recognition accuracy.

Now, let’s complicate things and make the characters in the video speak in multiple languages at once. We tasked the service with generating subtitles in real-time, translating video into 11 languages, and overlaying audio:

To hear the video translated into Russian, select the corresponding audio track in the player settings.

Audio track labels and subtitle names can be configured through the API.

Now, let’s delve into what has changed. In the previous version, our video processing pipeline looked like this:

When a video file is uploaded to the service, it is processed by a transcoder whose job is to prepare the video and audio tracks for subsequent streaming in the video player. To generate automatic subtitles and translations, we needed to integrate calls to Yandex’s neural network models and pass intermediate data to them.

The output from these models is then fed through the transcoder again to prepare it for distribution to users. This results in content that includes additional audio tracks and subtitles, synchronized with one another. Since these tracks are equivalent to the original content, we can seamlessly integrate them into the playlists served by the platform and display them in the player as if uploaded by the content owner themselves.

Live broadcasts are crucial for service users. While not all broadcasts reach the level of world championships, it’s still important to consider stream latency. For instance, if a Cloud Video client conducts a live lesson, a delay of even 15 seconds can turn the educational experience into a torment: the teacher asks questions, and it takes 15 seconds for the students to receive them, followed by another 15 seconds to respond and await the teacher’s feedback. This leads to an effective latency of 30 seconds for the user, doubling the experience.

To stabilize these metrics at no more than four seconds, we leveraged existing Yandex technologies to optimize content delivery.

Dmitry Kravtsov and I have already discussed low latency streaming for Yandex services during our talk at VideoTech.

What differentiates our implementation from that at Kinopoisk is the greater flexibility offered by Cloud Video. While Yandex services employ a somewhat complex toolkit that necessitates ongoing adjustments on the player side, our cloud service allows users to integrate their own player. Therefore, we essentially have two primary mechanisms to combat delays:

Regardless of the player used, we reduce the length of transmitted video segments, limiting latency even if it’s not our player.

If a client uses the Cloud Video player, they gain additional capabilities, such as mechanisms for keeping the viewer synchronized at the edge of the stream with minimal delay. Here’s how it works: if a user’s internet connection falters during a live broadcast, once the connection is restored, the player discreetly accelerates playback (for example, from 1x to 1.05x). Consequently, that user gradually catches up to the right edge of the stream and syncs with other viewers.

Thanks to these mechanisms, we achieve our target latency metrics across various service usage scenarios.

For content protection, we’ve implemented two mechanisms:

— Generation of short-lived links: Imagine you wish to offer access to your videos via a time-sensitive subscription. Users will have access to content only through a temporally signed link with a defined expiration.

— Embedding video restrictions based on a whitelist of specific domains: For example, if it’s crucial that educational videos from your course cannot be taken off your website. This works by defining a list of allowed sites in the channel settings. Then, the Cloud Video backend checks whether the domain of the site where the player is embedded is on that list. If it’s not, the viewer will see an error message instead of the player.

Protection against unauthorized internal access is ensured through Identity and Access Management capabilities, allowing granular rights to be set both for the channel and its analytics.

During the testing phase, one of the most frequent requests we received as the service creators was for the ability to customize the player to match their website’s design. We have since implemented this feature, allowing design adjustments through embedding parameters.

Lastly, let’s touch on monetization: one common usage scenario for Cloud Video involves hosting entertainment videos on media platforms where advertising revenue can be generated. Sites with daily traffic exceeding 10,000 visitors can now connect to the Yandex Advertising Network through their personal account, embedding ads into video content hosted via Cloud Video. To activate ad displays, one must register in the advertising cabinet, provide their service ID, and undergo moderation. Updates are already available in the documentation.

I hope this overview has been beneficial for everyone working with video. I would love to hear what topics you would like to learn more about in future articles and presentations.