The subjective quality of video has been the subject of study for decades within both academia and industry. Key players such as video-on-demand providers or television broadcasters are devoting numerous efforts to researching how to provide the best possible experience for their users while optimizing the use of physical resources. In this sense, international expert forums have been created to discuss and analyze objective metrics that provide accurate, versatile, and simple ways to estimate user perception of certain video impairments (such as the Video Quality Experts Group). However, the problem is far from being solved.
Data for the delivery of video over the Internet was estimated at a sum of 120 ExaBytes in the year 2019, which represented more than 75% of the total IP traffic. This percentage is expected to increase in the coming years. There is also an increase in image quality (among other causes, due to the appearance of televisions with very advanced features), which represents an increase in the cost of transportation over the network. The rise of some high-capacity compression schemes, such as H.264/AVC or H.265/HEVC, can help lower the cost of transporting these types of signals. This reduction should not degrade the quality of experience (QoE) of users when visualizing a video.
The cost of video distribution is assumed by three entities: the Internet provider assumes the direct cost of transporting the data over the network, which results in an increase in expenses for the service provider, which ultimately ends up being passed on to the end-users. Besides, the higher the quality demanded, the more expensive the delivery will be. However,
it is not always necessary to increase the objective quality since it can reach a saturation point. In other words, no matter how much the bit rate or video resolution is increased, the human eye will have reached the limit of its visual possibilities. The point at which this limit is reached depends on multiple factors, from the user’s particular physiognomy and the viewing distance from the screen to the difficulty of the image or its movement. Since some of these parameters are controllable (such as the number of bits used in the video encoding), it is interesting to analyze these characteristics to provide optimal video quality in any context. Applying the proposed solution researched in Mateo Cámara’s Final Master’s Thesis (TFM) as a preliminary step allows optimizing the objective video parameters to the point of transmitting only the precise quality. The cost savings are immediately reduced, contributing one more step towards the responsible interconnected society and reducing the carbon footprint due to the electrical consumption of its transmission.
Let’s take an example. A user has a 100 Mbps optical fiber and wants to watch videos at maximum quality. We, as a service provider, must determine which bit rate meets his maximum quality standards. Without making any effort, we could send the encoded video to the highest possible level of his network, 100 Mbps, and we would certainly have overestimated the bit rate. At what speed do we transcode then? Let’s take a look at the content he is requesting. If it is a very simple image, such as a cartoon, it will most likely require far fewer bits than if he is requesting a very detailed documentary. Someone could argue that the encoders already take care of this, and they would be right, but there is still the capacity to save binary stream by attending to the subjective perception of the users (as it was demonstrated in a related investigation). We would have squeezed the maximum out of the video’s capabilities and an intelligent reduction in costs would have been achieved.
In the TFM, neural networks trained with subjective quality of video databases were used. In particular, architectures based on two- and three-dimensional convolutional networks were developed (in which filtering was performed at the video or image-level), recurrent networks (that included the time variable), and sequential networks based on the previous extraction of characteristic parameters of the video.
Mateo Cámara‘s TFM was awarded with the ERICSSON Prize to the best final work in Innovation for the Connected Society Responsible of the Official Association of Telecommunication Engineers (COIT) and the Spanish Association of Telecommunication Engineers (AEIT).