Video-on-demand streaming has benefitted from Content-Adaptive Encoding (CAE), i.e., adaptation of resolution and/or quantization parameters for each scene based on convex hull optimization. Unlike traditional video-on-demand (VoD) services, where the use of buffering can ensure a smooth playback experience with high coding efficiency, interactive game streaming (IGS) requires real-time encoding and delivery to support the interaction between the user controls and the cloud server running the game.

We propose the first CAE approach for resolution adaptation in IGS based on compact encoding metadata from past frames, CAE-IGS. Specifically, we train a convolutional neural network (CNN) to infer the best resolution from the options available for the upcoming scene based on a running window of aggregated coding block statistics from the current scene. Our proposal makes the following contributions:

  • Lightweight model that runs on CPU within 1ms: CNN training to infer the best resolution for the upcoming scene based on the optimal resolution per bitrate (selected offline from the available ones). Minimal computation overhead is added as inference time is within 1ms on a single CPU core, thereby imposing no latency overhead.
  • Utilizing Past Scene Encoding Statistics: Instead of utilizing video frames as input (which is computationally infeasible in IGS), the proposed CNN infers the resolution to use for the next scenes by ingesting macroblock line-aggregated statistics, i.e., coding tree block (CTB) stats for High Efficiency Video Codec (HEVC), that are produced from past  frames by the encoder without incurring additional overhead.
  • Preserving temporal smoothness: Resolution adaptation takes place at scene changes, where an IDR (instantaneous decoder refresh) frame is de-facto used, thereby avoiding any extra IDRs that could cause adverse effects in Rate-Quality (RQ) control (such as frame drops or re-buffering). 

CAE-IGS is evaluated with challenging top-tier 1080p-60fps gaming content and HEVC encoding. Our test conditions incorporate the occurrence of frame drops due to the strict congestion control conditions of ultra-low latency game streaming. This scenario corresponds to IGS for portable devices, such as the Sony PlayStation Portal Remote Player, which experience a range of RQ conditions when users move to different environments.

Under the same bitrate targets, the results demonstrate 2.3-point improvement in visual quality measure in VMAF versus the static ladder method, without detrimenting (and even slightly improving) frame drop statistics. This improvement is predominantly in the most active bitrate zones. Finally, we note that this methodology can be customised to different streaming use cases and generalises to any hybrid video codec with small adjustments.

For more details on the methodology and results, check out the paper presented at PCS2025: https://www.arxiv.org/abs/2511.22327