www.mmtlab.comXJTLU










»

Locations of visitors to this page
Currently, the research in the multimedia technology lab mainly focus on three topics: Error-Resilient image/video coding, 3D image/video processing and representation, medical image processing.


Error-Resilient image/video coding

Error-resilient video coding with end-to-end rate-distortion optimized at macroblock level

This research toptic focus on video coding and transmission for unreliable networks, i.e., INTERNET, wireless network. As for these networks, the delivery of the video packets is not guaranteed. Therefore, tailoring the existing video coding techniques, i.e.,H.264/AVC, lossy networks is essential. In this topic, many works have been done by our lab, we have published several paper in TOP journals, such as IEEE T-IP and IEEE T-CSVT.

To have a better understanding of our recent research, one of our latest works is presented here. This work is called "Error resilient video coding with end-to-end rate-distortion optimized at macroblock level". Intra macroblock refreshment is an effective approach for error resilient video coding. In this work, in addition to Intra coding, we propose to add two marcoblock coding modes to enhance the transmission robustness of the coded bitstream, which are inter coding with redundant macroblock and intra coding with redundant macroblock. The selection of coding modes and the parameters for coding the redundant version of the macroblock (QP) are determined by the rate-distortion optimization. Therefore, typically, each macroblcok will have different QP for the redundant version. The end-to-end distortion, which considers the channel conditions, is employed in the optimization procedure for determining the mode selection and the redundant QP. The following figure compere the results of this work with previous approach RS-MDC and Optimal Intra. It is important to mention that, RS-MDC is a redundant coding based approach, authored by Tammam TIILO, and published in T-CSVT in 2008; and the Optimal Intra approach is one classical error-resilient method, which is widely used in video transmission systems.

Reference paper "Jimin Xiao, Tammam Tillo, Chunyu Lin, Yao Zhao, Error resilient video coding with end-to-end rate-distortion optimized at macroblock level, EURASIP Journal on Advances in Signal Processing, 2011:80, doi:10.1186/1687-6180-2011-80"

Image

Top

Dynamic Sub-GOP Forward Error Correction Code for Real-time Video Applications

Reed-Solomon erasure codes are commonly studied as a method to protect the video streams when transmitted
over unreliable networks. As a block-based error correcting code, on one hand, enlarging the block size can enhance the performance of the Reed-Solomon codes; on the other hand, large block size leads to long delay which is not tolerable for real-time video applications.

In this paper, a Dynamic Sub-GOP FEC Coding (DSGF) approach is proposed, and in this approach systematic Reed-Solomon erasure code is used to protect the video packets in real-time mode, while allowing to provide an error free version of the reference frame to stop the propagation error.

In order to enlarge the RS coding block size, all frames in one Sub-GOP are used as RS coding block. The length of the Sub-GOP is dynamically tuned, according to the Sub-GOP position, the probability of packet loss, and other encoding parameters, so as to minimize the expected total distortion of this GOP.

On the encoder side, for the systematic RS code, the data is left unchanged and the parity packets are appended, therefore, there is no encoding delay. Meanwhile, at the receiver end, to decode and display one frame in the Sub-GOP, the video decoder only needs packets belonging to this frame.

If some packets of this frame get lost during transmission, error concealment is applied to conceal the lost packets. In this manner, the decoder does not need to wait for all the packets belonging to this Sub-GOP. Therefore, there is no delay on the decoder side. Later, when the transmission of all packets of this Sub-GOP is finished, the systematic RS decoder would try to recover the lost packets. If enough packets are received, the RS decoder will be able to recover all the lost packets of this Sub-GOP, and the video decoder will re-decode this Sub-GOP with all the received and recovered packets, updating the reference frame, so the concealment distortion would not propagate to later frames.

RS allocation example with the greedy algorithm; packet loss rate p= 5%; one GOP has 30P-frames; RS parity packet rate µ= 20%; each frame includes Sslices, (a) S= 5, (b) S= 10.

RS allocation example with the greedy algorithm; packet loss rate p= 5%; one GOP has 30P-frames; RS parity packet rate µ= 20%; each frame includes Sslices, (a) S= 5, (b) S= 10.

The figure shows two practical examples of how to divide P-frames into Sub-GOP and allocate the RS parity packets among all the Sub-GOPs with the greedy algorithm. These results have been obtained by assuming that one GOP has 30 P-frames, each P-frame includes5or10slices, the value of αis 0.95, the packet loss rate is 5% with i.i.d model, the parity packet rate is 20%. It is interesting to find some regular patterns behind the allocations.

Firstly, in general, the P-frames at the beginning of the GOP have more RS parity packets than those at the end of the GOP. In figure(a), the first 2 Sub-GOPs have 4 RS parity packets for each Sub-GOP, the subsequent Sub-GOP has3RS parity packets, and so on. Whereas, for the last frame in the GOP, no RS parity packets are allocated. This is because any distortion in the front frames will propagate to the following frames, and usually losing one packet in the front frame would lead to more distortion for the whole GOP than losing one in the end. Therefore, it is reasonable to allocate more RS parity packets to the frames at the beginning of GOP.

Secondly, it is important to note that at the beginning of the GOP, one Sub-GOP usually contains more frames than the Sub-GOP in the end of the GOP. In figure(a), the first8Sub-GOPs include3frames, the 9th Sub-GOP contains 2frames, while the10th and 11th Sub-GOPs contain only one frame. This is also because the distortion propagation paths in the frames at the beginning of a GOP are long.

So putting more frames into one Sub-GOP can make the value of Klarge, which means that the RS code can recover the lost packets with higher probability, and eventually effectively cut down the error propagation. Thirdly, comparing results in figure(a) with figure(b), the average Sub-GOP length in figure(a) is larger than that in figure(b). This is because the number of slices in each frame,S, is large in figure(b), and there is no need to put as many frames into one Sub-GOP as in figure(a).

Video quality versus frame number in one GOP with length 30; Packet loss rate is5%, parity packet rate µis 20%; (a) CIF Foreman sequence;QP=26; bitrate for the proposed approach and the Evenly FEC approach is 707.9Kbps, for RS-MDC is746.5Kbps; (b) CIF Stefan sequence; QP=32; bitrate for the proposed approach and the Evenly FEC approach is 845.4 Kbps; for RS-MDC is 870.5Kbps.

Video quality versus frame number in one GOP with length 30; Packet loss rate is5%, parity packet rate µis 20%; (a) CIF Foreman sequence;QP=26; bitrate for the proposed approach and the Evenly FEC approach is 707.9Kbps, for RS-MDC is746.5Kbps; (b) CIF Stefan sequence; QP=32; bitrate for the proposed approach and the Evenly FEC approach is 845.4 Kbps; for RS-MDC is 870.5Kbps.


In this figure, for the Foreman and Stefan sequences, the PSNR of each frame in one GOP are plotted. In each Sub-GOP, the video quality degrades frame by frame gradually because of the random packet loss. However, at the end of each Sub-GOP, with high probability, the RS parity packets will be able to recover all the lost packets of this Sub-GOP, so the PSNR of the last frame of each Sub-GOP is higher than other frames in this Sub-GOP. All these factors make the video frame PSNR fluctuate, with a period same as the Sub-GOP length. Nevertheless, for the majority of the frames in one GOP, PSNR of the proposed approach is higher than that of Evenly FEC approach and RS-MDC. In fact, among the 30frames, only 6 and3 frames have PSNR lower than that of the Evenly FEC approach for the Foreman and Stefan sequences, respectively; whereas almost all frames have better video quality than RS-MDC, although RS-MDC has some extra bitrate. It is worth noticing that for some video frames, PSNR of the proposed approach is more than3dB higher than that of the Evenly FEC approach, and for the second half of the GOP, our approach outperforms the Evenly FEC approach and RS-MDC significantly. Note that for the first frame in this GOP, which is I-frame, the video quality of the proposed approach and Evenly FEC approach is same, more than 0.8dB better than RS-MDC. This is because, the slice number in the I-frame is large, and that makes the RS code efficient, thereby providing higher PSNR than RS-MDC. It is worth noticing that similar results have been obtained for the Bus sequence. With the proposed approach, although the video quality rises and falls, this would not lead to inferior visual perception.


Reference paper "Jimin Xiao, Tammam Tillo, Chunyu Lin, Yao Zhao, Dynamic Sub-GOP Forward Error Correction Code for Real-time Video Applications, to appear in IEEE Transactions on Multimedia".


Real-Time Video Streaming Using Randomized Expanding Reed-Solomon Code

Forward error correction (FEC) codes are widely studied to protect streamed video over unreliable networks. Typically, enlarging the FEC coding block size can improve the error correction performance. For video streaming applications, this could be implemented by grouping more than one video frame into one FEC coding block. However, in this case, it leads to decoding delay, which is not tolerable for real-time video streaming applications. In this paper, to solve this dilemma, a real-time video streaming scheme using randomized expanding Reed-Solomon code is proposed. In this scheme, the Reed-Solomon coding block includes not only the video packets of the current frame, but could also include all the video packets of previous frames in the current group of pictures. At the decoding side, the parity-check equations of the current frame are jointly solved with all the parity-check equations of the previous frames. Since video packets of the following frames are not encompassed in the RS coding block, no delay will be caused for waiting for the video or parity packets of the following frames both at encoding and decoding sides. Experimental results show that the proposed scheme outperforms other real-time error resilient video streaming approaches significantly, specifically, for the Foreman sequence, the proposed scheme could provide 1.5 dB average gain over the state-of-the-art approach for 10% i.i.d packet loss rate, whereas for the burst loss case, the average gain is more than 3 dB.

The key novelty of this work is that the parity-check equations of the current frame are jointly solved with all the parity-check equations of the previous frames. To ensure that these equations are linear independent, randomization process is used before the normal RS encoding. It was demonstrated in the paper that the randomization process works well.


(a) Evenly FEC

(a) Evenly FEC


(b) Sub-GOP based FEC

(b) Sub-GOP based FEC


(c) proposed RE-RS scheme

(c) proposed RE-RS scheme


Examples of different FEC schemes, where each frame has 4 video packets and redundant packet rate is= 0.5. (a) Evenly FEC (allocating parity packers evenly among frames) (b) Sub-GOP based FEC (our previous work published in TMM); (c) proposed RE-RS scheme.

Image

Video quality comparison for the proposed RE-RS scheme, the average packet loss rate is 10% with average burst length 2, and the redundancy is 20%. The left image is the original video frame (error free), the middle one is the Evenly FEC, the last one is the proposed RE-RS scheme, the whole video sequence is available for download at:
Chinese web: http://pan.baidu.com/s/18AzfE
English web: https://www.dropbox.com/s/ozo3e8usbqid2gb/foreman_compare.rar
Please note the resolution of the YUV file is 1056*288.


Reference paper "Jimin Xiao, Tammam Tillo, Yao Zhao, Real-Time Video Streaming Using Randomized Expanding Reed-Solomon Code, accepted in IEEE Transactions on Circuits and Systems for Video Technology". The accepted version of this work and the matlab code is available for download at the download page.


Top

3D image/video processing and representation

Current 3-D video (3DV) technology is based on stereo systems. These systems use stereo video coding for pictures delivered by two input cameras. Typically, such stereo systems only reproduce these two camera views at the receiver and stereoscopic displays for multiple viewers require wearing special 3-D glasses. On the other hand, emerging autostereoscopic multiview displays emit a large numbers of views to enable 3-D viewing for multiple users without requiring 3-D glasses. For representing a large number of views, a multiview extension of stereo video coding is used as shown in the following figure, typically requiring a bit rate that is proportional to the number of views.

Image

However, since the quality improvement of multiview displays will be governed by an increase of emitted views, a format is needed that allows the generation of arbitrary numbers of views with the transmission bit rate being constant. Such a format is the combination of video signals and associated depth maps. The depth maps provide disparities associated with every sample of the video signal that can be used to render arbitrary numbers of additional views via view synthesis. For easy illustration, the following figure shows two video frames and the associated depth maps.

Image

Top

Virtual View Assisted Video Super-Resolution and Enhancement

3D multiview video provides users an experience different from traditional video, however, it puts huge burden on the limited bandwidth resources. Mixed-resolution video in multiview system can alleviate this problem by using different video resolutions for different views. However, in order to reduce visual uncomfortableness and to make this video format more suitable for free-viewpoint television, the low resolution views need to be super-resolved to the target full resolution. In this paper, we propose a virtual view assisted super resolution algorithm, where the inter-view similarity is used to determine whether to fill the missing pixels in the super-resolved frame by virtual view pixels or by spatial interpolated pixels. The decision mechanism is steered by the texture characteristics of the neighbors of each missing pixel. Furthermore, the inter-view similarity is used, on one hand, to enhance the quality of the virtual-view-copied pixels by compensating the luminance difference between different views, and on the other hand, to enhance the original low resolution pixels in the super-resolved frame by reducing their compression distortion. Thus, the proposed method can recover the details in regions with edges while maintaining good quality at smooth areas by properly exploiting the high quality virtual view pixels and the directional correlation of pixels. The experimental results demonstrate the effectiveness of the proposed approach with a PSNR gain of up to 3.85dB.

The framework of proposed SR method.

The framework of proposed SR method.


Reference paper "Zhi JIN, Tammam TILLO, Chao YAO, Jimin XIAO, Yao ZHAO, Virtual View Assisted Video Super-Resolution and Enhancement, accepted in IEEE Transactions on Circuits and Systems for Video Technology". The accepted version of this work and the .exe file are available for downloading at the download page.

Top
  Name Size
- foreman_comparison.jpg 60.54 KB
- framework_SR.jpg 55.90 KB
- paper_2.png 27.39 KB
- paper_21.png 107.78 KB
- results_1.jpg 38.77 KB
- results_2.jpg 50.00 KB
- results_3.jpg 110.83 KB
- RS_allocation_example.jpg 204.05 KB
- Video_quality_versus_frame_number.jpg 308.00 KB

ScrewTurn Wiki version 3.0.4.560. Some of the icons created by FamFamFam. 苏ICP备11062770号
Mailbox Edit Login