數位三維視訊 Digital 3 D Video Chapter 11 Advanced

數位三維視訊 Digital 3 D Video: Chapter 11 Advanced Video Coding: H. 264/AVC/SVC/MVC 楊家輝 Jar-Ferr Yang 電腦與通信程研究所電機程學系國立成功大學 Institute of Computer and Communication Engineering Department of Electrical Engineering National Cheng Kung University, Tainan, Taiwan

ITU-T Video Coding Standards National Cheng Kung University, Tainan, Taiwan 13 -2 • H. 261: for ISDN, p x 64 kbps • H. 263: for PSTN (very low bit rate video, < 64 kbps) – Four optional modes • H. 263 Version 2 (H. 263+): an extension of H. 263 – 12 more negotiable modes – Scalable bit streams – Enhance performance over packet-switched networks – Support custom picture size and clock frequency – Provide supplemental display and external usage capabilities • H. 263 Version 3 (H. 263++): almost done, backward compatible to H. 263+ • H. 26 L: Not backward compatible to H. 263+ • H. 264/AVC, Joint Video Techniques (JVC) Department of Electrical Engineering, Institute of Computer and Communication Engineering

Comparison to MPEG-2, H. 263, MPEG-4 Tempete CIF 30 Hz 38 37 35 Quality Y-PSNR [d. B] National Cheng Kung University, Tainan, Taiwan 36 34 33 32 31 30 29 JVT/H. 264/AVC 28 MPEG-4 MPEG-2 H. 263 27 26 25 0 500 1000 1500 2000 2500 Bit-rate [kbit/s] 13 -3 Department of Electrical Engineering, Institute of Computer and Communication Engineering 3000 3500

Profiles & Levels Concepts National Cheng Kung University, Tainan, Taiwan 13 -4 • Many standards contain different configurations of capabilities – often based in “profiles” & “levels” – Profile is used to defined a set of algorithmic features – Level is used to defined a degree of capability (e. g. , pixel resolution or speed of decoding, # of Macroblocks per second) • H. 264/AVC currently has specified three profiles – Baseline (good for applications up through D-Cinema) – Main (adds interlace, B-Slices and CABAC efficiency gains) – Profile X (the so-called streaming profile) • H. 264/AVC defines many (11) levels – Built to match popular international production and emission formats – From QCIF to D-Cinema Department of Electrical Engineering, Institute of Computer and Communication Engineering

Baseline Profile in H. 264/AVC National Cheng Kung University, Tainan, Taiwan 13 -5 • • • I and P picture types In-loop deblocking filter [with dithering removed] Progressive-scan pictures Interlaced-scan pictures (only for decoders supporting Level 2. 1 and above) - frame/field adaptive at picture level only (GI) 1/4 -sample motion compensation Tree-structured motion segmentation down to 4 x 4 block VLC-based entropy coding (Real Networks) Arbitrary Slice Order (ASO) Flexible Macroblock Ordering (FMO) (max 8 slice groups) Chrominance format 4: 2: 0 Redundant slices (D 101) (Nokia) Department of Electrical Engineering, Institute of Computer and Communication Engineering

Main Profile in H. 264/AVC National Cheng Kung University, Tainan, Taiwan 13 -6 • All features included in the Baseline Profile, except, Arbitrary slice order (ASO), Flexible Macroblock Ordering (FMO), Redundant Slices • B pictures • CABAC • Weight Prediction • Alternate Scan • Adaptive block-size transforms (ABT) • Adaptive Bi-Prediction (ABP) (JVT-D 122) • Interlaced pictures (only for decoders supporting Level 2. 1 and above) - frame/field adaptive at picture level and macroblock level Department of Electrical Engineering, Institute of Computer and Communication Engineering

Profile X of H. 264/AVC National Cheng Kung University, Tainan, Taiwan 13 -7 • • • All features included in the Baseline Profile B pictures SP and SI pictures Data partitioning Adaptive Bi-Prediction (ABP) All video decoders supporting X profile shall also support the Baseline profile. The level number corresponding to the baseline profile will not be less than the level number supported for the X profile. Department of Electrical Engineering, Institute of Computer and Communication Engineering

H. 264/AVC Profile Relationship Context-based Adaptive Binary Arithmetic Coding (CABAC) National Cheng Kung University, Tainan, Taiwan CABAC Interlace Main 13 -8 Adaptive Block-size Transforms (ABT) Adaptive Bi-Prediction (ABP)) Profile X ABT ABP I, P Picture Types In-loop Deblocking Filter Frame Pictures ¼ Sample MC VLC based entropy coding Bi Predictive Slices Weighted Prediction Tree Structured Motion Segmentation down to 4 x 4 block size 4: 2: 0 Redundant Slices SP and SI Data Partitioning Baseline Arbitrary Slice Order (ASO) Flexible Macroblock Ordering (FMO) ASO FMO Redundant Slices Switching Intra (SI) Picture Switching Prediction (SP) Picture Department of Electrical Engineering, Institute of Computer and Communication Engineering

Levels in H. 264/AVC National Cheng Kung University, Tainan, Taiwan Max Sample Frame Processing Size Level Rate Max. FS # (MB/s) (MBs) Decoded picture buffer memory Max. KB (1024 bytes) Max Video Bitrate (1000 bits/sec ) Max CPB Size (1000 bits) Horizontal MV Range (full samples) Vertical MV Range (full samples) Minimum luma Bipredictive block size 1 485 99 148. 5 64 175 [-2048, 2047. 75] [-64, +63. 75] 8 x 8 1. 1 2 970 396 891. 0 128 325 [-2048, 2047. 75] [-128, +127. 75] 8 x 8 1. 2 5 940 396 891. 0 768 2 000 [-2048, 2047. 75] [-128, +127. 75] 8 x 8 2 11 880 396 891. 0 2 000 [-2048, 2047. 75] [-128, +127. 75] 8 x 8 2. 1 19 800 792 1 782. 0 4 000 [-2048, 2047. 75] [-256, +255. 75] 8 x 8 2. 2 20 250 1 620 3 037. 5 4 000 [-2048, 2047. 75] [-256, +255. 75] 8 x 8 3 40 500 1 620 3 037. 5 8 000 [-2048, 2047. 75] [-256, +255. 75] 8 x 8 3. 1 108 000 3 600 6 750. 0 20 000 14 000 [-2048, 2047. 75] [-512, +511. 75] 8 x 8 3. 2 216 000 5 120 7 680. 0 20 000 [-2048, 2047. 75] [-512, +511. 75] 8 x 8 4 245 760 8 192 12 288. 0 20 000 25 000 [-2048, 2047. 75] [-512, +511. 75] 8 x 8 5 13 -9 1 786 432 32 768 61 440. 0 150 000 450 000 [-2048, 2047. 75] [-512, +511. 75] 8 x 8 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Basic Macroblock Coding Structure Input Video Signal Coder Control National Cheng Kung University, Tainan, Taiwan Transform/ Scal. /Quant. Split into Macroblocks 16 x 16 pixels Decoder New key features : • Enhanced motion compensation Control • Small blocks for transform coding Data • Improved de-blocking filter Quant. • Enhanced entropy coding Transf. coeffs Scaling & Inv. Transform Entropy Coding Intra-frame Prediction Intra/Inter Motion. Compensation De-blocking Filter Output Video Signal Motion Data Motion Estimation 13 -10 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Motion Compensation n National Cheng Kung University, Tainan, Taiwan n n Various block sizes and shapes for motion compensation 1/4 sample accuracy (sort of per MPEG-4, Pt. 2 V. 2) – 6 tap filtering to 1/2 sample accuracy • simplified filtering to 1/4 sample accuracy • special position with heavier filtering Multiple reference pictures (per H. 263++ Annex U) Temporally-reversed motion and generalized Bframes B-frame prediction weighting ¼ pixel and 6 -tap filtering greatly improves quality. 13 -11 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Motion Compensation Accuracy Input Video Signal Coder Control National Cheng Kung University, Tainan, Taiwan Transform/ Scal. /Quant. Split into Macroblocks 16 x 16 pixels Control Data Decoder Quant. Transf. coeffs Scaling & Inv. Transform ： Entropy Coding 16 x 16 De-blocking Intra-frame MB Filter 0 Types Prediction Intra/Inter Motion. Compensation 8 x 8 Types 0 16 x 8 0 Output 1 Video Signal 8 x 4 0 1 8 x 16 0 4 x 8 0 1 2 Department of Electrical Engineering, Institute of Computer and Communication Engineering 3 4 x 4 0 1 2 Motion vector accuracy 1/4 (6 -tap filter) Data Motion Estimation 13 -12 1 8 x 8 0 1 3

Variable Blocks for Motion Compensation • Seven Modes with different block sizes for motion prediction: 16 x 16, 16 x 8, 8 x 16, 8 x 8, 8 x 4, 4 x 8, 4 x 4 National Cheng Kung University, Tainan, Taiwan Mode 1 Mode 2 Mode 4 0 0 Mode 5 0 2 4 6 1 2 3 1 Mode 6 0 1 2 3 0 1 0 4 5 6 7 13 -13 Mode 3 1 3 5 7 Mode 7 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Department of Electrical Engineering, Institute of Computer and Communication Engineering

H. 264: Motion Vector Accuracy 1/4 Pixel Accuracy: – Generation of 1/2 pixel positions: National Cheng Kung University, Tainan, Taiwan • Using 6 -tap filter: (1, -5, 20, -5, 1)/32 • First generate horizontal pixels and then vertical pixels – Generation of 1/4 pixel positions: • bilinear interpolation from half pixel grid • First generate horizontal pixels and then vertical pixels – At the decoder, Direct Interpolation can be used to generate a specific 1/4 -pixel value, if needed. Illustration of interpolation for fractional pel accuracy frame resolution 1: 1 2: 1 4: 1 1/4 -pel interpolation: 13 -14 Filter 1 Filter 2 Department of Electrical Engineering, Institute of Computer and Communication Engineering

$AVC/H. 264 Fractional Pel Accuracy Integer samples (shaded blocks with upper-case letters) and fractional$

AVC/H. 264 Fractional Pel Accuracy Integer samples (shaded blocks with upper-case letters) and fractional sample positions (un-shaded blocks with lower-case letters) for quarter luma interpolation National Cheng Kung University, Tainan, Taiwan A aa B C bb D K dd L a kk c e f g ii i jj b mm n cc F G d E p q r M ss N R gg hh I J ee ff P Q S U T 13 -15 H Department of Electrical Engineering, Institute of Computer and Communication Engineering

H. 264 Median Prediction of MVs • Median Prediction is used to predict motion vectors for all square motion compensation blocks (16 x 16, 8 x 8, 4 x 4). National Cheng Kung University, Tainan, Taiwan • Normally, Predictor (E) = Median(A, B, C) D B C A E • Exceptions: – If A and D are outside the picture, A=D=0. – If D, B, C are outside the picture, Predictor(E) =A. – If C is outside the picture or not available, C=D. – If only one of A, B, C has the same reference frame as E, it is used to predict E. • Chroma vectors are derived from the luma vectors. 13 -16 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Directional Segment. Prediction of MV National Cheng Kung University, Tainan, Taiwan • Directional Segmentation Prediction is used to predict MVs for rectangular blocks (16 x 8, 8 x 16, 8 x 4, 4 x 8). • Chroma vectors are derived from the luma vectors. 8 x 16 B C 16 x 8 8 x 4 4 x 8 A 13 -17 A If in the same reference frame, A, B and C are used as predictors; Otherwise, Pred(E)=Median(A, B, C) For white blocks: Median Prediction For shaded blocks: Predicted as above Department of Electrical Engineering, Institute of Computer and Communication Engineering

AVC/H. 264 B-Pictures • Advantages: National Cheng Kung University, Tainan, Taiwan – Improve coding efficiency – Provide temporal scalability • 5 Modes: – – – Direct Mode: derived forward and backward MVs, none transmitted Forward Mode: prediction from a previous reference frame Backward Mode: prediction from a subsequent reference frame Bi-directional Mode: separate forward and backward MVs Intra Prediction Mode • MVs in Direct Mode: – MVF = (TRB * MV)/TRD – MVB = (TRB - TRD) * MV/TRD P/I B MV MVF TR B 13 -18 P TR D Department of Electrical Engineering, Institute of Computer and Communication Engineering Time

Intra Prediction Input Video Signal Coder Control National Cheng Kung University, Tainan, Taiwan Transform/ Scal. /Quant. Split into Macroblocks 16 x 16 pixels Decoder § Directional spatial prediction Quant. (9 types for luma, 1 chroma) Transf. coeffs Scaling & Inv. Q A B C D E F G H I a b c d Transform Intra-frame Prediction Intra/Inter Control Data Motion. Compensation De-blocking Filter J e f g h K i j k l L m n o p M N O P Output 0 7 2 8 Video Signal 4 Motion 6 1 5 Data Motion Estimation 13 -19 Entropy Coding Department of Electrical Engineering, Institute of Computer and Communication Engineering 3

Intra Prediction for 4 x 4 Luma Blocks 13 -20 Mode Name 0 National Cheng Kung University, Tainan, Taiwan Mode DC 1 Vertical 7 2 Horizontal 2 3 Diagonal down/right 8 4 Diagonal down/left 5 Vertical –right 6 Vertical-left 7 Horizontal- Up 8 Horizontal-down 4 6 1 5 Department of Electrical Engineering, Institute of Computer and Communication Engineering 3

Intra Prediction for 4 x 4 Luma Blocks National Cheng Kung University, Tainan, Taiwan Mode 0: DC Prediction • If all samples A, B, C, D, I, J, K, L, are available, a=b=c=…=p = (A+B+C+D+I+J+K+L+4) / 8. • If A, B, C, and D are not available and I, J, K, and L are available, a=b=c=…=p =(I+J+K+L+2) / 4. • If I, J, K, and L are not available and A, B, C, and D are available, a=b=c=…=p =(A+B+C+D+2) /4. • If all eight samples are not available, a=b=c=…=p = 128. A B C D I a b c d 2 J e f g h 8 K i j k l L 13 -21 M E F G H m n o p 7 4 6 Department of Electrical Engineering, Institute of Computer and Communication Engineering 1 5 3

Intra Prediction for 4 x 4 Luma Blocks Mode 1: Vertical Prediction National Cheng Kung University, Tainan, Taiwan This mode shall be used only if A, B, C, D are available. The prediction in this mode shall be as follows: n a, e, i, m are predicted by A, n b, f, j, n are predicted by B, n c, g, k, o are predicted by C, n d, h, l, p are predicted by D. A B C D I a b c d J e f g h K i j k l L 13 -22 Q m n o p E F G H 7 2 8 4 6 Department of Electrical Engineering, Institute of Computer and Communication Engineering 1 5 3

Intra Prediction for 4 x 4 Luma Blocks National Cheng Kung University, Tainan, Taiwan Mode 3: Diagonal Down/Right prediction This mode is used only if all A, B, C, D, I, J, K, L, Q are inside the picture. This is a 'diagonal' prediction. n m is predicted by: (J + 2 K + L + 2)/4 n n n Q i, n are predicted by: (I + 2 J + K + 2)/4 e, j, o are predicted by: (Q + 2 I + J + 2)/4 a, f, k, p are predicted by: (A + 2 Q + I + 2)/4 b, g, l are predicted by: (Q + 2 A + B + 2)/4 c, h are predicted by: (A + 2 B + C + 2)/4 A B C D a b c J e f g i j k n o p H 7 l m G h K F d L 13 -23 I E 2 8 4 6 Department of Electrical Engineering, Institute of Computer and Communication Engineering 1 5 3

Intra Prediction for 4 x 4 Chroma Blocks National Cheng Kung University, Tainan, Taiwan 13 -24 • Only one mode: DC Prediction – A, B, C, D are four 4 x 4 blocks in a 8 x 8 chroma block. – S 0, S 1, S 2, S 3 are the sums of 4 neighboring pixels. S 0 S 2 A S 3 C If S 0, S 1, S 2, S 3 are all inside the frame: A = (S 0 + S 2 + 4)/8 B = (S 1 + 2)/4 C = (S 3 + 2)/4 S 1 D = (S 1 + S 3 + 4)/8 If only S 0 and S 1 are inside the frame: A = (S 0 + 2)/4 B B = (S 1 + 2)/4 C = (S 0 + 2)/4 D = (S 1 + 2)/4 If only S 2 and S 3 are inside the frame: A = (S 2 + 2)/4 D B = (S 2 + 2)/4 C = (S 3 + 2)/4 D = (S 3 + 2)/4 If S 0, S 1, S 2, S 3 are all outside the frame: A = B = C = D = 128 Department of Electrical Engineering, Institute of Computer and Communication Engineering

16 x 16 Intra Prediction Mode National Cheng Kung University, Tainan, Taiwan • Especially suitable for smooth areas • Prediction Modes – Mode 0 =Vertical Prediction – Mode 1 = Horizontal Prediction – Mode 2 = DC prediction – Mode 3 = Plane prediction • Residual coding – Another 4 x 4 transform is applied to the 16 DC coefficients – Only single scan is used. 13 -25 Department of Electrical Engineering, Institute of Computer and Communication Engineering

16 x 16 Intra Prediction Mode • Mode 0 =Vertical Prediction – Pred(i, j) = P(i, -1), i, j = 0, 1, . . . , 15 National Cheng Kung University, Tainan, Taiwan • Mode 1 = Horizontal Prediction – Pred(i, j) = P(-1, j), i, j = 0, 1, . . . , 15 • Mode 2 = DC prediction – Pred(i, j) = i, j = 0, 1, . . . , 15 • Mode 3 = Plane prediction – Pred(i, j)= max(0, min(255, (a+b× (i-7)+c ×(j-7) +16)/32 ) ), where a=16×(P(-1, 15)+P(15, -1)), b=5 ×(H/4)/16, c=5 ×(V/4)/16 13 -26 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Transform Coding Input Video Signal Coder Control National Cheng Kung University, Tainan, Taiwan - Control Data Transform/ Scal. /Quant. Decoder Split into § 4 x 4 Block Integer Transform Quant. Transf. coeffs Scaling & Inv. Transform Macroblocks 16 x 16 pixels Entropy Coding § Main Profile: Adaptive Block Size Transform (8 x 4, 4 x 8, 8 x 8) Intra-frame Prediction § Repeated transform of DC coeffs for 8 x 8 chroma and 16 x 16 Intra Motion. Compensation luma blocks Intra/Inter De-blocking Filter Output Video Signal Motion Data Motion Estimation 13 -27 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Multiple Reference Frames Input Video Signal Coder Control National Cheng Kung University, Tainan, Taiwan Transform/ Scal. /Quant. Split into Macroblocks 16 x 16 pixels Decoder Quant. Transf. coeffs Scaling & Inv. Transform Entropy Coding Intra-frame Prediction Intra/Inter Motion. Compensation De-blocking Filter Output Video Signal Motion Data Motion Estimation 13 -28 Control Data Multiple Reference Frames for Motion Compensation Department of Electrical Engineering, Institute of Computer and Communication Engineering

Residual Coding Input Video Signal Coder Control Data National Cheng Kung University, Tainan, Taiwan Transform/ Scal. /Quant. Split into Macroblocks 16 x 16 pixels § Residual coding is based on 4 x 4 blocks § Integer transform Control Decoder Quant. Transf. coeffs Scaling & Inv. Transform Entropy Coding Intra-frame Prediction Intra/Inter Motion. Compensation De-blocking Filter Output Video Signal Motion Data Motion Estimation 13 -29 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Residual and Intra Coding n EXACT MATCH Simplified Transform • Based primarily on 4 x 4 transform (all prior standards: 8 x 8) National Cheng Kung University, Tainan, Taiwan • Requires only 16 -bit arithmetic (including intermediate values) • Expanded to 8 x 8 for chroma by 2 x 2 transform of the DC values • Easily extensible to 10 -12 bits per component n Adaptive block transform sizes for Main Profile n Intra Coding Structure • Directional spatial prediction (10 types luma, 1 chroma) • Expanded to 16 x 16 for luma intra by 4 x 4 transform of the DC values 13 -30 Department of Electrical Engineering, Institute of Computer and Communication Engineering

4 x 4 Integer Transform • New and simpler 4 x 4 integer transform: National Cheng Kung University, Tainan, Taiwan 13 -31 • New 4 x 4 inverse transform: The appropriate normalization factors must be applied to the transform coefficients before quantization and after de-quantization. Such factors are absorbed by the quantization and de-quantization scaling factors. Department of Electrical Engineering, Institute of Computer and Communication Engineering

4 x 4 Integer DCT 4 x 4 DCT: National Cheng Kung University, Tainan, Taiwan 13 -32 DCT: To be included in quantization table Department of Electrical Engineering, Institute of Computer and Communication Engineering

Inverse 4 x 4 DCT (Unitary Transform) National Cheng Kung University, Tainan, Taiwan 4 x 4 DCT: 4 x 4 IDCT: 13 -33 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Residual Coding of 4 x 4 Blocks • For each macroblock, the order of 4 x 4 blocks for residual coding is as follows: National Cheng Kung University, Tainan, Taiwan Luma residual coding 4 x 4 block order U 2 x 2 DC V 16 0 1 4 5 2 3 6 7 8 9 12 17 13 10 13 -34 4 x 4 DC -1 Chroma residual coding 4 x 4 block order 11 14 15 18 19 22 23 20 21 24 25 AC Department of Electrical Engineering, Institute of Computer and Communication Engineering

4 x 4 Transform of Luma DC Coeffs. • Hardmard Transform (HT) National Cheng Kung University, Tainan, Taiwan 13 -35 where the symbol // denotes division with rounding to the nearest integer • Inverse Transform: Department of Electrical Engineering, Institute of Computer and Communication Engineering

2 x 2 Transform of Chroma DC Coeffs. National Cheng Kung University, Tainan, Taiwan 13 -36 • Somewhat a spirit of Hadamard transform • DC 0 DC 1 2 x 2 Transform DDC(0, 0) DDC(1, 0) DC 2 DC 3 DDC(0, 1) DDC(1, 1) • Forward Transform: DCC(0, 0) = (DC 0+DC 1+DC 2+DC 3)/2 DCC(1, 0) = (DC 0 - DC 1+DC 2 -DC 3)/2 DCC(0, 1) = (DC 0+ DC 1 -DC 2 -DC 3)/2 DCC(1, 1) = (DC 0 - DC 1 -DC 2+DC 3)/2 • Inverse Transform: DC 0 = (DCC(0, 0)+DCC(1, 0)+ DCC(0, 1) + DCC(1, 1))/2 DC 1 = (DCC(0, 0)- DCC(1, 0)+ DCC(0, 1) - DCC(1, 1))/2 DC 2 = (DCC(0, 0)+DCC(1, 0) - DCC(0, 1) - DCC(1, 1))/2 DC 3 = (DCC(0, 0)- DCC(1, 0) - DCC(0, 1) + DCC(1, 1))/2 Department of Electrical Engineering, Institute of Computer and Communication Engineering

AVC Quantization/Dequantization National Cheng Kung University, Tainan, Taiwan • Also take care of the normalization for transform. • QP values range [-12, 39] with [0 32] similar to H. 263 • An increase of step size of about 12% from one QP to the next (double when QP increase 6). • The smallest step size is about four times smaller than in H. 263, allowing for very high fidelity reconstruction • The largest is about 60% larger than in H. 263, allowing for more rate control flexibility. • No dead zone • QP for luma is signaled and QP for chroma is derived as QPluma=18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 QPchroma=17 18 19 20 20 21 22 22 23 23 24 24 25 25 25 26 26 26 27 27 with QPluma= QPchroma if QPluma < 17. 13 -37 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Quantization of 4 x 4 Coefficients National Cheng Kung University, Tainan, Taiwan 13 -38 Since the step size doubles for every increment of 6 in QP, a periodic quantization table is used. Thus, our indices into quantization coefficient tables depend only on (QP+12) mod 6 = QP mod 6, and the quantization and de-quantization formulas depend both on QP mod 6 and (QP+12)/6. In that way, the table size is minimized. Quantization is performed according to the following equation: where Y are the transformed coefficients, YQ are the corresponding quantized values, Q(m, i, j) are the quantization coefficients listed below, and |f| is in the range 0 to 214+(QP+12)/6/2, with f having the same sign as the coefficient that is being quantized. Recall that QP+12 is the value signaled to the decoder, Note that while the intermediate value inside square brackets in the equation above has a 32 -bit range, the final value YQ is guaranteed to fit in 16 bits. Department of Electrical Engineering, Institute of Computer and Communication Engineering

Dequantization of 4 x 4 Coefficients Dequantization is performed according to the following equation: National Cheng Kung University, Tainan, Taiwan 13 -39 where the R(m, i, j) are the dequantization coefficients listed below. After dequantization, the inverse transform specified above is applied. Then the final results are normalized by: The dequantization formula can be performed in 16 -bit arithmetic, because the coefficients R(m, i, j) are small enough. Furthermore, the dequantized coefficients have the maximum dynamic range that still allows for the inverse transform to be performed in 16 -bit arithmetic. Thus, the decoder needs only 16 -bit arithmetic for dequantization and inverse transform. Department of Electrical Engineering, Institute of Computer and Communication Engineering

Scans of 4 x 4 Coefficients National Cheng Kung University, Tainan, Taiwan 13 -40 0 1 5 6 0 1 2 5 2 4 7 12 0 2 3 6 3 8 11 13 1 3 4 7 9 10 14 15 4 5 6 7 Simple Scan Double Scan (to better match 1 -bit EOB) Used for luma with QP<24 • Use of 2 -D model (RUN, LEVEL) plus EOB for coefficient coding. Department of Electrical Engineering, Institute of Computer and Communication Engineering

In-loop Deblocking Filter Input Video Signal Coder Control National Cheng Kung University, Tainan, Taiwan Transform/ Scal. /Quant. Split into Macroblocks 16 x 16 pixels Control Data Decoder Quant. Transf. coeffs Scaling & Inv. Transform Entropy Coding Intra-frame Prediction Intra/Inter Motion. Compensation De-blocking Filter § Designed In Coding Loop § Adaptive Deblocking Filter Output Video Signal Motion Data Motion Estimation 13 -41 Department of Electrical Engineering, Institute of Computer and Communication Engineering

In-loop Deblocking Filter Deblocking filter: Highly compressed decoded inter picture National Cheng Kung University, Tainan, Taiwan 13 -42 1) Without Filter 2) with H 264/AVC Deblocking Department of Electrical Engineering, Institute of Computer and Communication Engineering

AVC/H. 264 Deblocking Filter National Cheng Kung University, Tainan, Taiwan • After the reconstruction of a macroblock a conditional filtering of this macroblock is taking place, that effects the boundaries of the 4 x 4 block structure. • Filtering is done on a macroblock level. • In a first step the 16 pel of the 4 vertical edges (horizontal filtering) of the 4 x 4 raster are filtered. • After that, the 4 horizontal edges (vertical filtering) follow. • This process also affects the boundaries of the already reconstructed macroblocks above and to the right of the current macroblock. • Frame edges are not filtered. • Note, that intra prediction of the current macro block takes place on the unfiltered content of the already decoded neighbouring macroblocks. • Depending on the implementation, these values have to be stored before filtering. . 13 -43 Department of Electrical Engineering, Institute of Computer and Communication Engineering

AVC/H. 264 Deblocking Filter Content Dependent Thresholds: A filtering Boundary Strength Bs is assigned for luma: National Cheng Kung University, Tainan, Taiwan Block boundary between blocks j and k between blocks YES Block Boundary is also marcoblock boundary? Block j or K NO Intra coded? NO YES Coefficients coded in block j or k? YES Bs = 4 Bs = 3 13 -44 Bs = 3 Bs = 2 NO R(j) ≠ R(k) or |V(j, x)-V(k, x)|≧ 1 pixel or |V(j, y)-V(k, y)|≧ 1 pixel ? Bs = 1 Department of Electrical Engineering, Institute of Computer and Communication Engineering NO Bs = 0 (Skip)

AVC/H. 264 Deblocking Filter The set of eight pixels across a 4 x 4 block horizontal or vertical boundary is denoted as p 4, p 3, p 2, p 1 | q 1, q 2, q 3, q 4 National Cheng Kung University, Tainan, Taiwan with the actual boundary between p 1 and q 1. Filtering across a certain 4 x 4 block boundary is skipped all together if the corresponding Bs is equal to zero. Sets of pixels across this edge are only filtered if the condition: Bs ≠ 0 AND |p 1 – q 1| < AND |p 2 - p 1| < AND |q 2 - q 1| < The QP dependant thresholds and can be found as: QP 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 3 3 3 3 3 4 4 5 6 8 10 13 0 0 0 0 4 4 4 5 5 5 7 7 7 18 19 20 21 22 23 24 25 26 27 28 29 30 31 17 21 26 30 35 43 48 55 64 77 96 128 192 8 13 -45 17 9 9 10 10 11 11 12 12 13 13 14 14 15 15 Department of Electrical Engineering, Institute of Computer and Communication Engineering

AVC/H. 264 Deblocking Filtering Process National Cheng Kung University, Tainan, Taiwan 13 -46 Two sorts of filter are defined. In the default case the following filter will be used for p 1 and q 1 = Clip(– C, C, ((q 1 – p 1) << 2 + (p 2 – q 2) + 4) >> 3 ) P 1= Clip(0, 255, (p 1+ ) ) Q 1 = Clip(0, 255, (q 1 – ) ) The two intermediate threshold variables ap = |p 3 – p 1| aq = |q 3 – q 1| are used to decide whether p 2 and q 2 are filtered. These pixels are only processed for luminance. If (Luma && ap < ), p 2 is filtered by: P 2 = p 2 + Clip(– C 0, (p 3+P 1 – p 2<<1) >> 1) If (Luma && aq < ), q 2 is filtered by: Q 2 = q 2+ Clip(– C 0, (q 3+Q 1 – q 2<<1) >> 1) where Clip( ) denotes a clipping function with the parameters Clip( Min, Max, Value) and the clipping thresholds: C 0 = Clip. Table[QP, Bs] C is set to C 0 and than incremented by one if p 2 will be filtered and again by one if q 2 will be filtered. Department of Electrical Engineering, Institute of Computer and Communication Engineering

Entropy Coding Input Video Signal Coder Control National Cheng Kung University, Tainan, Taiwan Transform/ Scal. /Quant. Split into Macroblocks 16 x 16 pixels Control Data Decoder Quant. Transf. coeffs Scaling & Inv. Transform Entropy Coding Intra-frame Prediction Intra/Inter Motion. Compensation De-blocking Filter Output Video Signal Motion Data Motion Estimation 13 -47 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Variable Length Coding § Exp-Golomb Code is used universally for all symbols except for transform coefficients National Cheng Kung University, Tainan, Taiwan § Context Adaptive VLCs for coding of transform coefficients • No end-of-block, but number of coefficients is decoded • Coefficients are scanned backwards • Contexts are built dependent on transform coefficients Entropy Coding: 1. Universal VLC (UVLC) 2. Context-based Adaptive VLC (CAVLC) Will be discussed later!! 13 -48 Department of Electrical Engineering, Institute of Computer and Communication Engineering

CABAC Context-based Adaptive Binary Arithmetic Coding National Cheng Kung University, Tainan, Taiwan • Context modeling provides estimates of conditional probabilities of the coding symbols. Utilizing suitable context models, given intersymbol redundancy can be exploited by switching between different probability models according to already coded symbols in the neighborhood of the current symbol to encode. • Arithmetic codes permit non-integer number of bits to be assigned to each symbol of the alphabet. Thus the symbols can be coded almost at their entropy rate. • Adaptive arithmetic codes permit the entropy coder to adapt itself to non-stationary symbol statistics. Hence, an adaptive model taking into account the cumulative probabilities of already coded motion vectors leads to a better fit of the arithmetic codes to the current symbol statistics. Update Probability Estimation Context Modeling Binarization Average bit-rate saving over CAVLC 10 -15% 13 -49 Probability Estimation Coding Engine Adaptive Binary Arithmetic Coder Department of Electrical Engineering, Institute of Computer and Communication Engineering

AVC/H. 264 Syntax Picture sync: TR, PQP, EOS National Cheng Kung University, Tainan, Taiwan Picture Level Microblock Level Ptype CBP RUN Dquant MB_Type Tcoeff_luma Intra_pred_mode Tcoeff_chroma_DC Omit Ref_frame Loop 13 -50 MVD Tcoeff_chroma_AC Department of Electrical Engineering, Institute of Computer and Communication Engineering

H. 264 Macro-Block Type (MB_Type) National Cheng Kung University, Tainan, Taiwan 13 -51 • For Intra Picture (Signaled by Ptype) – Intra 4 x 4 – Intra 16 x 16: (Imode, nc, AC), total 4 x 3 x 2=24 types • Imode: 16 x 16 Intra Prediction Mode (0 -3) • nc: CBP for chroma (0 -2) • AC: 0 if no AC coefficients, 1 otherwise • For Inter Picture (Signaled by Ptype) – Skip (Signaled by RUN) – Motion compensation block size (Nx. M, e. g. , 8 x 4): 7 types – Intra 4 x 4 – Intra 16 x 16: 24 types Department of Electrical Engineering, Institute of Computer and Communication Engineering

AVC/H. 264 CBP and Dquant § CBP: Coded Blocks Pattern National Cheng Kung University, Tainan, Taiwan – Signal which 8 x 8 blocks contain non-zero coefficients – CBPY: for luma blocks, 1 bit for each block. – nc: for chroma coefficients • nc=0: No chroma coefficients • nc=2: There are 2 x 2 DC coefficient, but not AC coefficients • nc=3: At least one non-zero AC coefficient – CBP (=CBPY+16 x nc) is coded by UVLC. § Dquant: Difference of Quantization Parameter – Range from -16 to +16 – Allow to change QP to any of 32 values on the macro-block level. – QPnew= Modulo 32(QPold+Dquant+32) TML-7: Miscellaneous § Multiple Reference Frames like in the Annex U of H. 263++ § Deblocking loop filter § Content-based Adaptive Arithmetic Coding in software but not document, only for maximal performance demo 13 -52 Department of Electrical Engineering, Institute of Computer and Communication Engineering

H. 264 Encoder Issues National Cheng Kung University, Tainan, Taiwan 13 -53 • Motion Prediction Mode (Block Size) Selection • Motion Prediction Mode Selection for B-pictures • Motion Estimation – Reference frames selection (from multiple reference frames) – Integer pixel search – Fractional pixel search • Residual Coding Mode Selection – Intra/Inter Selection – Intra Coding Mode Selection • Rate/Distortion Optimization – Elimination of single coefficients in inter macroblocks • Rate Control Department of Electrical Engineering, Institute of Computer and Communication Engineering

H. 264 Data Partitioning National Cheng Kung University, Tainan, Taiwan 13 -54 • Concatenate all VLC symbols of one data type and one slice. • Each data type segment in a slice starts byte aligned. • Purposes: – Error Resilience – Interim File Format for a clean Interface with different networks • Data Types: – – – – 0 1 2 3 4 5 6 7 TYPE_HEADER TYPE_MBHEADER TYPE_MVD TYPE_CBP TYPE_2 x 2 DC TYPE_COEFF_Y TYPE_COEFF_C TYPE_EOS All Picture/Slice Header Information Macroblock header information Motion Vector Data Coded Block Patterm 2 x 2 DC Coefficients Luma AC Coefficients Chroma AC Coefficients End-of-Stream Symbol Department of Electrical Engineering, Institute of Computer and Communication Engineering

H. 264 Network Adaptation Layer • For transport over IP networks using RTP • Two packets per slice: National Cheng Kung University, Tainan, Taiwan 13 -55 – First Packet: higher priority TYPE_HEADER, TYPE_MBHEADER, TYPE_MVD, TYPE_EOS – Second Packet: lower priority TYPE_CBP, TYPE_2 x 2 DC, TYPE_COEFF_Y, TYPE_COEFF_C • Packet Structure: – Packet Header (32 bits): Bit 31: 1 if contain a picture header; Bit 30: 1 if contain a slice header; Bits 29 -25: Reserved; Bits 24 -10: Start. MB; Bits 9 -0: Slice. ID – Part-of-Partition Structure (POPs): • POP Header (16 bits): Bits 15 -12: Data Type; Bits 11 -0: Length of VLC-coded POP payload Department of Electrical Engineering, Institute of Computer and Communication Engineering

數位三維視訊 H. 264 Entropy Coding: UVLC and CAVLC Entropy_coding_mode: 0: UVLC + CAVLC; 1: CABAC

UVLC and CAVLC Parameters Description National Cheng Kung University, Tainan, Taiwan 13 -57 Sequence, Picture and Fixed or variable-length binary codes Slice-layer syntax elements Marcoblock Type mb_type Prediction method for each coded MB Coded Block Pattern Indicates which blocks within a MB contain code coeff. Quantizer Parameter Delta value from the previous value of QP Reference Frame Index Identify reference frame(s) for inter prediction Motion Vector Difference (MVD) from predicted motion vector Residual Data Coefficient data for each 4 x 4 or 2 x 2 block Department of Electrical Engineering, Institute of Computer and Communication Engineering

Slice Syntax National Cheng Kung University, Tainan, Taiwan slice header MB MB slice data skip_run mb_type UVLC 13 -58 MB mb_pred . . . MB MB coded residual CAVLC Department of Electrical Engineering, Institute of Computer and Communication Engineering

UVLC (Exp-Golomb Coding) National Cheng Kung University, Tainan, Taiwan Bit string 1 Code. Num 0 0 1 1 2 1 0 0 3 0 0 0 1 4 1 0 0 1 1 0 5 2 0 0 1 1 1 6 3 0 0 0 1 0 0 0 7 0 0 1 8 1 0 0 0 1 0 9 2 … … M = ëlog 2 (Code. Num +1) û INFO 13 -59 1 INFO 0 = Code. Num +1 - 2 M 0 1 X 0 0 0 0 1 X 2 X 1 X 0 Leading zeros … … … INFO 00. . . 0 1 XM-1. . . X 0 M Prefix bits Code. Num M Suffix bits = INFO -1 + 2 M Department of Electrical Engineering, Institute of Computer and Communication Engineering

Unsigned and Signed Exp-Colomb Codes National Cheng Kung University, Tainan, Taiwan Unsigned Codes: Used for macroblock type, reference index and others. If the syntax element is coded as ue(v), the value of the syntax element is equal to code. Num. Syntax Element 0 0 1 1 2 – 1 3 2 4 – 2 5 3 6 – 3 k 13 -60 code. Num (– 1)k+1 ceil( k÷ 2 ) Signed Codes: Used for MVD, Delta QP and others Syntax element values are derived from table!!! ceil(k/2)= Department of Electrical Engineering, Institute of Computer and Communication Engineering

Mapped Exp-Colomb Coding Code number coded_block_pattern assignment to macroblock prediction types Tcoeff_chroma_DC 1 Tcoeff_chroma_AC 1 Tcoeff_luma 1 Zig-zag scan National Cheng Kung University, Tainan, Taiwan Intra, SIntra Pred, SPred Level Run 0 47 0 EOB - 1 31 16 1 0 2 15 1 -1 0 3 0 2 2 0 1 1 4 23 4 -2 0 -1 1 5 27 8 1 1 1 2 6 29 32 -1 1 -1 2 7 30 3 3 0 2 0 8 13 -61 7 5 -3 0 -2 0 Input: Code. Number Output: syntax element values Department of Electrical Engineering, Institute of Computer and Communication Engineering

CAVLC for Transform Coefficients CAVLC: Context-based Adaptive Variable Length Coding National Cheng Kung University, Tainan, Taiwan 13 -62 4 x 4 Block 0 3 -1 0 0 -1 1 0 0 0 Reordered Block Data: • Used to encode residual, zig-zag ordered 4 x 4 (2 x 2) blocks of transform coefficients • No end-of-block, but number of coefficients is decoded • Coefficients are scanned backwards • Contexts are built dependent on transform coefficients 0, 3, 0, 1, -1, 0 , 0 , … Total. Coffs = 5 (indexed from the highest frequency [4] to the lowest frequency [0] Total. Zeros = 3 Trailing. Ones= 3 (in fact there are 4 trailing ones but only 3 can be encoded as a “special case”) Department of Electrical Engineering, Institute of Computer and Communication Engineering

CAVLC Encoding Procedure read. Syntax. Element_Num. Coeff. Trailing. Ones National Cheng Kung University, Tainan, Taiwan Trailing. Ones Trailing. One. Sign read. Syntax. Element_Level_VLC 0 levabs read. Syntax. Element_Level_VLCN sign read. Syntax. Element_Total. Zeros total_zeros read. Syntax. Element_Run 13 -63 Total. Coeffs run_before Department of Electrical Engineering, Institute of Computer and Communication Engineering

CAVLC Encoding Procedure (1) Element Value Code National Cheng Kung University, Tainan, Taiwan Coeff_token Total. Coffs=5, T 1 s=3 0001 011 T 1 sign (4) + 0 T 1 sign (3) 1 T 1 sign (2) 1 Level (1) +1 (use Level_VLC 0) 1 Level (0) +3 (use Level_VLC 1) 0010 Total. Zeros 3 111 Run_before(4) Run_before(3) Run_before(2) Run_before(1) Run_before(0) Zeros. Left=3; run_before=1 Zeros. Left=2; run_before=0 Zeros. Left=2; run_before=1 Zeros. Left=1; run_before=1 sign levabs total_zeros 10 1 1 run_before 01 No Code required; Last coefficient Quantized DCT: 0, 3, 0, 1, -1, 0 , 1, 0, 0, … 13 -64 Total. Coeffs Trailing. Ones Department of Electrical Engineering, Institute of Computer and Communication Engineering

Codes for Levels and Trailing. Ones (coeff_token ) National Cheng Kung University, Tainan, Taiwan T 1 s 13 -65 Total Coeff 0 < NC < 2 0 0 1 2 3 0 1 2 3 0 1 1 2 2 2 3 3 4 4 5 5 1 0000 11 01 0000 0111 0001 001 0000 0100 1 0000 0110 0001 1 0000 0100 0 0000 0101 0 0000 0010 1 0000 10 0001 11 0000 0101 0 0000 0010 0 0001 011 We have to find out NC to index the table!!! 2 < NC < 4 4 < NC < 8 8 < NC NC = -1 11 0000 10 0001 1 010 01 0010 00 0010 10 101 1000 0010 11 1001 01 0011 0000 0111 1000 010 0001 0 0011 0000 011 0010 0000 010 1011 10 1101 0000 11 1010 01 0101 10 1100 0000 10 1010 11 0100 01 1111 1011 01 1010 11 0100 00 1110 0000 11 0000 00 0000 01 00 0001 01 0001 10 00 0010 01 0010 10 0010 11 00 0011 01 0011 10 0011 11 0100 00 01 0100 10 0100 11 Department of Electrical Engineering, Institute of Computer and Communication Engineering 01 0001 1 0000 1 0011 1 01 0011 0 0000 01 0010 10 0010 0 0000 001 0000 0001 0010 11 -

Determination of Context Number, NC Block B (for Predicting Number of Coefficients) National Cheng Kung University, Tainan, Taiwan NB NA NA = the number of non-zero transform coefficient levels in block A NC NB = the number of non-zero transform coefficient levels in block B Block A 13 -66 NC = (NA + NB+1)>>1 (NA and NB both available) NC = NA or NB (either NA or NB available) NC = 0 (Neither NA nor NB available) NC = – 1 (If the CAVLC parsing process is invoked for Chroma. DCLevel) Department of Electrical Engineering, Institute of Computer and Communication Engineering

Codes for Levels and Trailing. Ones (1) 0≦ NC < 2 Num. CoefT 1 s 0 1 2 3 - - - 1 000011 01 - - 2 00000111 0001001 - 3 000001001 00000110 0001000 00011 4 000001000 000001011 000000101 000010 5 0000000111 000001010 000000100 0001011 6 0000111 0000000110 0000001101 00010101 7 00001001 0000110 0000001100 00010100 8 00001000 00000001001 00001010 000000111 9 00000111 00001011 00000101 0000000101 10 000001101 000001111 00000001000 11 00000011 000001100 000001110 00000100 12 000000100 000000110 00000101 13 000000101 000000111 0000010001 000001001 000000011 000000010 000001000000011 15 13 -67 1 14 National Cheng Kung University, Tainan, Taiwan 0 0000000 001 00000000 11 000000001 0 0000000101 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Codes for Levels and Trailing. Ones (2) 2≦ NC < 4 Num. CoeffT 1 s National Cheng Kung University, Tainan, Taiwan 13 -68 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 1 2 3 11 - - - 000011 - - 000010 00011 010 - 001001 001000 001010 101 1000001 001011 100101 0011 00000111 1000000 1000010 00000110 1000011 1001101 10001 000001001 10011100 100100 000001011 000000101 1001100 0000000111 000001010 000000100 10011111 0000000110 0000001101 0000001100 10011110 0000101 0000111 00000001001 000000111 0000100 0000110 00000001000 0000000101 0000011 00000100 00000111 00000011 00000101 00000010 000001101 00000001 0000000 000000111 000001100 000000000000100 0000001101 0000001 1 100 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Codes for Levels and Trailing. Ones (3) 4≦ NC < 8 National Cheng Kung University, Tainan, Taiwan 13 -69 Num. CoeffT 1 s 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 0011 0000011 0000010 000011 000010 101101 101100 101111 0110100 0110111 01111001 01111000 000000011 000010 000000 1 1 2 3 0010 101110 101001 101000 101011 101010 010101 010100 010111 0110110001 01111011 000000010 000000101 0000011 0000001 1101 010110 010001 010000 010011 010010 011101 011100 0110000 01111010 01100101 000000100 0000001101 0000010 0000000 1 1100 1111 1110 1001 1000 00011 00010 011111 0110011 01100100 000000111 0000001100 000001 0000000 0 Department of Electrical Engineering, Institute of Computer and Communication Engineering

CAVLC Encoding Procedure (2) Element Value Code National Cheng Kung University, Tainan, Taiwan 13 -70 Coeff_token Total. Coffs=5, T 1 s=3 0001 011 T 1 sign (4) + 0 T 1 sign (3) 1 T 1 sign (2) 1 Level (1) +1 (use Level_VLC 0) 1 Level (0) +3 (use Level_VLC 1) 0010 Total. Zeros 3 111 Run_before(4) Run_before(3) Run_before(2) Run_before(1) Run_before(0) Zeros. Left=3; run_before=1 Zeros. Left=2; run_before=0 Zeros. Left=2; run_before=1 Zeros. Left=1; run_before=1 Total. Coeffs Trailing. Ones sign levabs total_zeros 10 1 1 run_before 01 No Code required; Last coefficient Department of Electrical Engineering, Institute of Computer and Communication Engineering

Thresholds for VLC Tables Thresholds for determining whether to increment Level Table Number: National Cheng Kung University, Tainan, Taiwan 13 -71 Current VLC Table Threshold to Increment Table VLC 0 VLC 1 VLC 2 VLC 3 VLC 4 VLC 5 VLC 6 0 3 6 12 24 48 N/A (highest table) Department of Electrical Engineering, Institute of Computer and Communication Engineering

VLC Table - Level 0 (VLC-0) Lev-VLC 0 National Cheng Kung University, Tainan, Taiwan Code No Code Level. Code (± 1, ± 2. . ) 1 1 1 01 -1 2 001 2 3 0001 -2 . . . 13 00000001 -7 14 -29 00000001 xxxx ± 8 to ± 15 30 -> 13 -72 0 000000001 xxxxxx ± 16 -> Department of Electrical Engineering, Institute of Computer and Communication Engineering

VLC Table - Level 1 (VLC-1) Lev-VLC 1 National Cheng Kung University, Tainan, Taiwan Code Level. Code (± 1, ± 2. . ) 0 -1 1 s ± 1 2 -3 01 s ± 2 4 -5 001 s ± 3 . . . 28 -29 00000001 s ± 15 30 -> 13 -73 Code No. 000000001 xxxxxxs ± 16 -> Department of Electrical Engineering, Institute of Computer and Communication Engineering

VLC Table - Level 2&3 (VLC-2&3) Lev-VLC 2 National Cheng Kung University, Tainan, Taiwan Code no Code Level. Code (± 1, ± 2. . ) 0 -3 1 xs ± 1 to ± 2 4 -7 01 xs ± 3 to ± 4 . . s . . 56 -59 00000001 xs ± 29 to ± 30 60 -> 000000001 xxxxxxs ± 31 -> Lev-VLC 3 Code Level. Code (± 1, ± 2. . ) 0 -7 1 xxs ± 1 to ± 4 8 -16 01 xxs ± 5 to ± 8 . . . 112 -119 13 -74 Code no 00000001 xxs ± 57 to ± 60 000000001 xxxxxxs ± 61 -> 120 -> Department of Electrical Engineering, Institute of Computer and Communication Engineering

VLC Table - Level 4&5 (VLC-4&5) Lev-VLC 4 National Cheng Kung University, Tainan, Taiwan 13 -75 Code no 0 -15 16 -31. . 224 -239 240 -> Code 1 xxxs 01 xxxs. . 00000001 xxxs 000000001 xxxxxxs Level. Code (± 1, ± 2. . ) ± 1 to ± 8 ± 9 to ± 16. . ± 113 to ± 120 ± 121 -> Lev-VLC 5 Code no 0 -31 32 -63. . 448 -479 480 -> Code 1 xxxxs 01 xxxxs. . 00000001 xxxxs 000000001 xxxxxxs Level. Code (± 1, ± 2. . ) ± 1 to ± 16 ± 17 to ± 32. . ± 225 to ± 240 ± 241 -> Department of Electrical Engineering, Institute of Computer and Communication Engineering

VLC Table - Level 6 (VLC-6) Lev-VLC 6 National Cheng Kung University, Tainan, Taiwan Code no Code Level. Code (± 1, ± 2. . ) 0 -63 1 xxxxxs ± 1 to ± 32 64 -127 01 xxxxxs ± 33 to ± 64 . . . 896 -959 00000001 xxxxxs ± 449 to ± 480 960 -> 000000001 xxxxxxs ± 481 -> Parsing Process for Level Information • We get Level_code from tables 13 -76 if( first_coefficient && trailing_ones( ) < 3 ) coeff_level = (|level_code| + 1) * sign(level_code) To save bits, by else reducing the first coeff_level = level_code element one! end Department of Electrical Engineering, Institute of Computer and Communication Engineering

CAVLC Encoding Procedure (3) Element Value Code National Cheng Kung University, Tainan, Taiwan 13 -77 Coeff_token Total. Coffs=5, T 1 s=3 0001 011 T 1 sign (4) + 0 T 1 sign (3) 1 T 1 sign (2) 1 Level (1) +1 (use Level_VLC 0) 1 Level (0) +3 (use Level_VLC 1) 0010 Total. Zeros 3 111 Run_before(4) Run_before(3) Run_before(2) Run_before(1) Run_before(0) Zeros. Left=3; run_before=1 Zeros. Left=2; run_before=0 Zeros. Left=2; run_before=1 Zeros. Left=1; run_before=1 Total. Coeffs Trailing. Ones sign levabs total_zeros 10 1 1 run_before 01 No Code required; Last coefficient Department of Electrical Engineering, Institute of Computer and Communication Engineering

Table of Total_Zeros total_ zeros 0 National Cheng Kung University, Tainan, Taiwan 1 2 3 4 5 6 7 8 9 10 11 12 13 14 13 -78 Total. Coeff( coeff_token ) 1 1 010 0011 0010 0001 1 0001 0 0000 11 0000 10 0000 011 0000 010 0000 0011 0000 0010 0001 1 0000 0001 0 2 111 110 101 100 011 0100 0011 0010 0001 1 0001 0 0000 11 0000 10 0000 01 3 0101 110 101 0100 0011 100 011 0010 0001 1 0001 0 0000 01 0000 00 4 0001 1 111 0100 110 101 100 0011 0010 0001 0 0000 1 0000 0 5 0101 0100 0011 110 101 100 011 0010 0000 1 0000 0 6 0000 01 0000 1 110 101 100 011 010 0001 0000 00 Department of Electrical Engineering, Institute of Computer and Communication Engineering 7 0000 01 0000 1 100 011 11 010 0001 0000 00

Table of Total_Zeros Total_ Zeros National Cheng Kung University, Tainan, Taiwan 13 -79 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total. Coeff( coeff_token ) 8 9 10 11 12 13 14 15 101000 101001 1011 110 00 111001 1111 00 01 10000 10001 101 01 11 11000 11001 111 0 10 1001 101 0 11 100 101 11 0 - 00 01 1 - 0 1 - 111 01 100 - 10 110 - 00 - - - Department of Electrical Engineering, Institute of Computer and Communication Engineering

CAVLC Encoding Procedure (4) Element Value Code National Cheng Kung University, Tainan, Taiwan 13 -80 Coeff_token Total. Coffs=5, T 1 s=3 0000100 T 1 sign (4) + 0 T 1 sign (3) 1 T 1 sign (2) 1 Level (1) +1 (use Level_VLC 0) 1 Level (0) +3 (use Level_VLC 1) 0010 Total. Zeros 3 111 Run_before(4) Run_before(3) Run_before(2) Run_before(1) Run_before(0) Zeros. Left=3; run_before=1 Zeros. Left=2; run_before=0 Zeros. Left=2; run_before=1 Zeros. Left=1; run_before=1 Total. Coeffs Trailing. Ones sign levabs total_zeros 10 1 1 run_before 01 No Code required; Last coefficient Department of Electrical Engineering, Institute of Computer and Communication Engineering

Initialization of Zeros. Left National Cheng Kung University, Tainan, Taiwan 13 -81 • zeros. Left=0 when (Total. Coeff == max. Num. Coeff) • else zeros. Left = total_zeros Department of Electrical Engineering, Institute of Computer and Communication Engineering

Table of Zeros. Left National Cheng Kung University, Tainan, Taiwan Zeros. Left Run_before 0 1 2 3 4 5 6 7 8 9 10 11 12 13 13 -82 1 1 0 - 2 1 01 00 - 3 11 10 01 00 - 4 5 6 11 10 01 000 - 11 10 011 010 001 000 - 11 000 001 010 101 100 - >6 111 110 101 100 011 010 001 00001 0000001 000000001 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Comparisons of Video Standards Feature/ Standard National Cheng Kung University, Tainan, Taiwan Macroblock size MPEG-1 MPEG-2 MPEG-4 part 10 H. 264 16 x 16 (frame mode) 16 x 8 (field mode) 16 x 16 8 x 8, 8 x 16, 16 x 8, 16 x 16, 4 x 8, 8 x 4, 4 x 4 Block Size 8 x 8 16 x 16, 8 x 8, 16 x 8 [20] Transform DCT Wavelet transform ABT 8 x 8 Variable – depends on the block size Quantized step size Increases with constant increment Vector quantizatio n used step sizes that increase at the rate of 12. 5%. Entropy coding VLC (different VLC tables for Intra & Inter modes) VLC and CABAC Motion Esti. & MC 13 -83 Transform size Yes Yes More flexible Department of Electrical Engineering, Institute of Computer and Communication Engineering

Comparisons of Video Standards MPEG-2 MPEG-4 part 10/H. 264 Integer ½-pel Integer ¼-pel, ½-pel Integer ¼ -pel, 1/8 - pel Profiles No 5 profiles Several levels within a profile 8 profiles Several levels within a profile 3 profiles Several levels within a profile Reference frame Yes One frame Yes Multiple frames (as many as 5 frames allowed) Picture Types I, P, B, D I, P, B, S Playback & Random Access Yes Yes Error robustness Synchron. & concealment Data partition, redundancy, FEC for important packet transmission Synchron. , Data partitioning, Header extension, Reversible VLCs Deals with packet loss and bit errors in error-prone wireless networks Transmission rate Up to 1. 5 Mbps 2 -15 Mbps 64 kbps - ~2 Mbps Encoder complexity 13 -84 MPEG-1 Pel accuracy National Cheng Kung University, Tainan, Taiwan Feature/Standard Low Medium High Compatible with previous standards Yes Yes No Department of Electrical Engineering, Institute of Computer and Communication Engineering 64 kbps– 150 Mbps

數位三維視訊 Rate-Distortion Optimization n n Siwei Ma, Wen Gao, and Yan Lu, “Rate-Distortion Analysis for H. 264/AVC Video Coding and its Application to Rate Control, ” IEEE Transactions on Circuits and Systems for Video, vol. 15, no. 12, pp. 1533 -1544, December 2005. Gary J. Sullivan and Thomas Wiegand, “ Rate Distortion Optimization for Video Compression, ” IEEE Signal Processing Magazine, pp. 74 – 31, Nov. 1998

Typical Video Coder DCT Quantization Entropy Code Input Frame National Cheng Kung University, Tainan, Taiwan 13 -86 Dotted Box Shows Decoder Inter/Intra Compensated Prediction Inter/Intra Estimation & Mode Decision Encoded RR Residual data Entropy Decode Inverse Quantization Inverse DCT Approximated Input Frame Prior Coded Approximated Frame Buffer (Delay) Motion vector and Prediction Mode data RM Header information Department of Electrical Engineering, Institute of Computer and Communication Engineering RH D

Rate and Distortion for Video Data National Cheng Kung University, Tainan, Taiwan 13 -87 R = RR +RM+ RH n Rate ( ) – The coding bits including header, mode information, inter/intra information, and residual data after inter/intra prediction ~ n Objective Distortion Measure, D( , ) In (s) In ( s) – Mean Square Error (MSE) – Peak Signal-to-Noise-Ratio (PSNR) – Measure the fidelity to original video n Subjective Distortion Measure – Human Vision System (HVS) based – Emphasize visual quality rather than fidelity For close form expression and derivations, the mean square error (MSE) is given as: Department of Electrical Engineering, Institute of Computer and Communication Engineering

Shannon RD Curves RDO Goal: Small Rate and Small Distortion National Cheng Kung University, Tainan, Taiwan 13 -88 Coder C RDO Dilemma Problem: D R Coder B Coder A Rate (bps) For Gaussian source N(0, 2), the RD curve can be expressed by Department of Electrical Engineering, Institute of Computer and Communication Engineering

Constrained Optimization National Cheng Kung University, Tainan, Taiwan 13 -89 g(x, y) = c f(x, y) = d 2 f(x, y) = d 1 Minimize f(x, y) subject to g(x, y) < c Minimize f(x, y) subject to g(x, y) = c Department of Electrical Engineering, Institute of Computer and Communication Engineering

Rate Distortion Optimization (RDO) RDO Concept: Minimize distortion D, subject to a constraint bit rate, Rd , on the number of bits used Rd National Cheng Kung University, Tainan, Taiwan 13 -90 Min {D}, subject to R < Rd. The rate-distortion optimization to minimize the distortion under a given target rate can be written into Lagrangian’s method as The optimal solution is at the same RD slope as: The optimal mode in each MB is such that the resulting RD slope is the same among different MBs. Department of Electrical Engineering, Institute of Computer and Communication Engineering

RDO Problems: The design and operation involve the optimization of a number of decisions, including National Cheng Kung University, Tainan, Taiwan 13 -91 1. How to segment each picture into areas (block sizes), 2. Whether or not the replace each area of the picture with completely new INTRA-picture content, 3. If not INTRA content, a. How to do motion estimation? b. How to do DFD coding? 4. If new INTRA content, what approximation to send as the replacement content What part of the image should be coded using what method? Department of Electrical Engineering, Institute of Computer and Communication Engineering

Motion Estimation for Coding Efficiency Conventional Motion Estimation – ME Criterion: minimize prediction residuals National Cheng Kung University, Tainan, Taiwan 13 -92 Ignoring the motion vector bit-rate cost Motion Estimation in Low Bitrate Video – In low bit rate video coding, motion vectors may occupy a significant portion of total bit rate. – Rate-constrained motion estimation : Lagrange multiplier – However, not yet the ultimate rate-distortion optimization for the best overall coding performance Department of Electrical Engineering, Institute of Computer and Communication Engineering

Experimental Results National Cheng Kung University, Tainan, Taiwan 13 -93 G. Sullivan and L. Baker, “Rate-Distortion optimized motion compensation for video compression using fixed or variable size blocks”, Globecom’ 1991 EE 569 Digital Video Processing Department of Electrical Engineering, Institute of Computer and Communication Engineering 93

Lagrange Multiplier Optimization Motion Estimation: National Cheng Kung University, Tainan, Taiwan 13 -94 In considering each possible MV (including possible fractional pixels) to send for a picture area (including possible variable block sizes), an encoder should perform an optimized coding of the residual error and measure the resulting bit usage and distortion. Case 1 : INTER Coding using regions of size 16 x 16 samples (choosing between the SKIP mode signaled “ 1” and the INTER mode with codeword “ 0” followed by a MV for the 16 x 16 region) Case 2: INTER-coding using blocks of size 8 x 8 samples (choosing between the SKIP mode signaled with “ 1” the INTER+4 V mode with codeword “ 0”)) followed by four MVs for 8 x 8 regions) Case 3 : Combining Cases 1 and 2 using a rate-constrained encoding strategy, which adapts the frequency of using the various region sizes using Lagrange multiplier optimization (choosing between “ 1” for SKIP, “ 01” for INTER with a MV for the 16 x 16 region, and “ 00” for INTER+4 V with four MVs for 8 x 8 regions) Department of Electrical Engineering, Institute of Computer and Communication Engineering

RDO for Motion Estimation National Cheng Kung University, Tainan, Taiwan 13 -95 Department of Electrical Engineering, Institute of Computer and Communication Engineering

RDO for Mode Selection: National Cheng Kung University, Tainan, Taiwan 13 -96 where A denotes the search range, M {INTRA, SKIP, INTER, INTER+4 V} indicates a mode chosen for the MB, and Q is the selected quantizer step size. Ideally, the choice of quantizer step size Q should be optimized in a rate-distortion sense. Department of Electrical Engineering, Institute of Computer and Communication Engineering

RDO Problems National Cheng Kung University, Tainan, Taiwan 13 -97 Department of Electrical Engineering, Institute of Computer and Communication Engineering

RDO for Selection of Step Size, Q National Cheng Kung University, Tainan, Taiwan 13 -98 Quantization Parameter Selection: Department of Electrical Engineering, Institute of Computer and Communication Engineering

Selection of Lagrange Multiplier Quantization Parameter vs l: National Cheng Kung University, Tainan, Taiwan 13 -99 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Selection of Lagrange Multiplier Rate curve: , National Cheng Kung University, Tainan, Taiwan 13 -100 where a depends on the source probability density function. Taking derivative of rate curve with respect to D, we get Hence, the optimal Lagrange multiplier becomes with c = 4/(12 a) since the uniform quantizer tells us Department of Electrical Engineering, Institute of Computer and Communication Engineering

Relationship between QSTEP and QP In H. 264/AVC, the relation between QSTEP and QP is National Cheng Kung University, Tainan, Taiwan 13 -101 QSTEP = 2 (QP-4)/6. Department of Electrical Engineering, Institute of Computer and Communication Engineering

PSNR versus QP The Relation between PSNR and the quantization parameter QP is National Cheng Kung University, Tainan, Taiwan 13 -102 where l and b are the constants. Department of Electrical Engineering, Institute of Computer and Communication Engineering

Rate Estimation versus QP National Cheng Kung University, Tainan, Taiwan 13 -103 1. is the estimated number of coded bits of a macroblock 2. SADi is the SAD of a motion-compensated macroblock 3. t = I, P, B 4. The first item reflects the bits used to code the transform coefficients. 5. The second item is the bits used to code the header information of a macroblock. Department of Electrical Engineering, Institute of Computer and Communication Engineering

Some Rate Control Models TM 5: R(QP) = X/QP, X is a constant National Cheng Kung University, Tainan, Taiwan 13 -104 VM 8: TMN 8: where QP: Quantization Parameter MADi : mean absolute difference of a residual MB Xi: mode parameters A, K, and C: constants i 2 is the variance of residual coefficients in a macroblock. Department of Electrical Engineering, Institute of Computer and Communication Engineering

MPEG-4 Scalable Rate Control National Cheng Kung University, Tainan, Taiwan 13 -105 n MPEG-4 Annex based on VM 8 n Appropriate for a single video object (a rectangle VO that covers the entire frame) and a range of bit rates and spatial/temporal resolutions n Target bit rate for a certain number of frames, (long delay of course) Q: quantizer step size S: is the mean absolute difference of a residual frame Department of Electrical Engineering, Institute of Computer and Communication Engineering

TMN 8 Algorithm National Cheng Kung University, Tainan, Taiwan 13 -106 Rate Control Process: 1) Measure i, the variance of residual coefficients in a macroblock. 2) Calculate QPi 3) Encode MBi 4) Update the model parameters K and C based on the actual number of bits generated for MBi. TMN 8 is effective at maintaining good visual quality with a small encoder output buffer, keeping coding delay to a minimum. Department of Electrical Engineering, Institute of Computer and Communication Engineering

H. 264 RD Optimization National Cheng Kung University, Tainan, Taiwan 13 -107 For a block in an inter frame, the rate-constrained motion estimation is first done to find the optimal motion vector by minimizing s: the original video signal, c: the coded video signal Multiple-reference prediction: Department of Electrical Engineering, Institute of Computer and Communication Engineering

H. 264 R-D Optimization National Cheng Kung University, Tainan, Taiwan 13 -108 • Afterwards, the rate-constrained mode selection is rate-constrained mode selection performed to choose the optimal coding mode by minimizing • Lagrange multipliers s have the following relation with QP, m: constant • With the different QP, the different motion vectors and modes might be selected. Department of Electrical Engineering, Institute of Computer and Communication Engineering

數位三維視訊 H. 264/SVC and H. 264/MVC

Introduction • Why Scalable Video Coding: National Cheng Kung University, Tainan, Taiwan – Fluctuations in the available bandwidths – Multiple video streams are needed for heterogeneous clients • Scalable Video Coding: – A video bit stream is called scalable if part of the stream can be removed in such a way that the resulting bit stream is still decodable 4 Mb/s 2 Mb/s 6 Mb/s 10 Mb/s 8 Mb/s 13 -110 Department of Electrical Engineering, Institute of Computer and Communication Engineering

SVC Principle – Single Encoding National Cheng Kung University, Tainan, Taiwan Figure Department of Electrical Engineering, Institute of Computer and Communication Engineering 13 -111 courtesy “Scalable Video Coding Scalable extension of H. 264 / AVC” Vincent Botreau, Thomson

SVC Principle – Multiple Decoding National Cheng Kung University, Tainan, Taiwan 13 -112 Figure courtesy “Scalable Video Coding Scalable extension of H. 264 / AVC” Vincent Botreau, Thomson Department of Electrical Engineering, Institute of Computer and Communication Engineering

Scalable Video Coding (SVC) n Heterogeneous media delivery and devices: National Cheng Kung University, Tainan, Taiwan • Different users • Different needs • Different processor and displays • Different links and various bandwidth n Flexible source coding with scalability is needed • Simple adaptation to different bit-rates, frame rates or spatial resolutions of the video content on a bit-stream level n 13 -113 Realization of a fully scalable video coding scheme as an extension of H. 264/AVC • HHI proposed an SVC scheme, which incorporates the concept of Motion Compensated Temporal Filtering (MCTF) into the H. 264/AVC framework. • This approach outperforms all competing approaches, which used spatial wavelets, by far. • HHI‘s proposal has been selected as basis for MPEG-4 SVC Department of Electrical Engineering, Institute of Computer and Communication Engineering

Requirements for SVC standards National Cheng Kung University, Tainan, Taiwan 13 -114 • Superior coding efficiency compared to simulcasting the supported resolutions in separate bit-streams. • Similar coding efficiency compared to single layer coding for each subset of bit-stream. • Minimum increase in decoding complexity. • Support for a backward compatible base layer. • Support of simple bit-stream adaptations after encoding. • Partial decoding of the bit stream allows– – Graceful degradation in case part of bit stream is lost. Bit-rate adaptation Format adaptation Power adaptation Department of Electrical Engineering, Institute of Computer and Communication Engineering

Basics of Scalable Video Coding National Cheng Kung University, Tainan, Taiwan 13 -115 • Straight forward extension to H. 264 with very limited added complexity • Layered approach – One base layer – One or more enhancement layers. • Base layer is H. 264/AVC compliant. • An SVC stream can be decoded by an H. 264 decoder. • Enhancement layers enable temporal, spatial or quality (SNR) scalability. • Region-of-interest scalability is also required, wherein the subsets of the bit-stream represent spatially contiguous regions of original picture area. • Multiple scalability features can be combined to support various spatio-temporal resolutions and bit rates within single bit-stream. Department of Electrical Engineering, Institute of Computer and Communication Engineering

Structure of SVC Coder SNR scalable coding National Cheng Kung University, Tainan, Taiwan 13 -116 Temporal scalable coding Prediction Base layer coding Multiplex Spatial decimation SNR scalable coding Temporal scalable coding Prediction Base layer coding Department of Electrical Engineering, Institute of Computer and Communication Engineering

Temporal Scalability (Dyadic prediction structure) Frame Rate = 3. 75 fps 30 15 fps 7. 5 fps National Cheng Kung University, Tainan, Taiwan GOP border Prediction T 0 T 3 T 2 T 3 T 1 Key Picture • Group of Pictures (GOP) – Key Picture: Typically Intra-coded T 3 T 2 T 3 T 0 Key Picture Tx : Temporal Layer Identifier Structural Delay = 7 frames – Hierarchically predicted B Pictures: Motion-Compensated Prediction 13 -117 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Hierarchical B-pictures • Hierarchical Prediction Structures National Cheng Kung University, Tainan, Taiwan Hierarchical B pictures 0 4 3 5 2 7 6 8 1 12 11 13 10 15 14 16 9 GOP Non-dyadic hierarchical prediction 0 3 4 2 6 7 5 8 9 1 12 13 11 15 16 14 17 18 10 Hierarchical prediction with zero delay 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 13 -118 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Hierarchical B-pictures • IPP : GOP Size 1 National Cheng Kung University, Tainan, Taiwan 13 -119 – No Temporal scalability – Only Temporal Level 0 • IBP : GOP Size 2 – Temporal Levels 0, 1 • GOP Size 4 – Temporal Levels 0, 1, 2 • GOP Size 8 – Temporal Levels 0, 1, 2, 3 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Spatial Scalability National Cheng Kung University, Tainan, Taiwan 13 -120 The base layer contains a reduced-resolution version of each coded frame. Decoding the base layer alone produces a low-resolution output sequence and decoding the base layer with enhancement layer(s) produces a higherresolution output Subsample and encode to form the Base Layer Subtract the Predicted from the Original to get residue Decode and upsample to the original resolution Encode the residue to form the Enhancement Layer Department of Electrical Engineering, Institute of Computer and Communication Engineering

Spatial Scalability National Cheng Kung University, Tainan, Taiwan • A single-layer decodes only the base layer to produce a reduced-resolution output sequence. • A multi-layer decoder can reconstruct a full-resolution sequence. • Decoding process – Decode the base layer and up-sample to the original resolution. – Decode the enhancement layer. – Add the decoded residual from the enhancement layer to the decoded base layer to form the output frame. Decode the Base Layer to get low resolution (LS) video Decode the Enhancement Layer to recover the residue 13 -121 Upsample LS video to the original resolution Add the residue to get the form the output frame Department of Electrical Engineering, Institute of Computer and Communication Engineering

Spatial Scalability National Cheng Kung University, Tainan, Taiwan 13 -122 • In each spatial layer, motion compensation, and intra-prediction are employed similar to that of single layer coding. • To improve coding efficiency, inter-layer motion and residual prediction mechanisms are employed. • Inclusion of Inter layer prediction modes • Interlayer motion prediction • Interlayer residual prediction Department of Electrical Engineering, Institute of Computer and Communication Engineering

Interlayer Prediction in Spatial Scalability National Cheng Kung University, Tainan, Taiwan • Tenable usage of lower layer information to improve coding efficiency of the enhancement layers. • The prediction is based on up-sampled reconstructed lower layer or by averaging up-sampled with temporal prediction. • The interlayer prediction does not work as well as temporal prediction for slow motion and high spatial detail. • To improve coding efficiency for spatial scalable coding two additional interlayer prediction concepts are added. – Prediction of macroblock modes and associated motion parameters. – Prediction of residual signal. • Additionally one more mode ‘Inter layer Intra prediction’ is added to take care of the case when the co-located lower layer macroblock is intra coded. 13 -123 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Use of “base_mode_flag” National Cheng Kung University, Tainan, Taiwan 13 -124 • For spatial enhancement layers SVC includes a new macroblock mode, which is signaled by “base_mode_flag”. • For this macroblock type, only a residual signal (no additional side information such as intra prediction modes or motion parameters) is transmitted. • When base_mode_flag = 1 – The macroblock is predicted by “inter layer intra prediction” mode if co-located 8 x 8 sub-block lies inside an Intra coded macroblock. (intra_BL) – The macroblock is predicted by “interlayer motion prediction” mode, when reference layer macroblock is inter coded. (BL_skip) • These modes are not used when the flag is zero. Department of Electrical Engineering, Institute of Computer and Communication Engineering

Inter Layer Motion Prediction National Cheng Kung University, Tainan, Taiwan 13 -125 • The partitioning data of the enhancement layer macroblock together with the associated motion vectors are derived from the corresponding data of co-located 8 x 8 block in the reference layer. • The macroblock partitioning is obtained by up-sampling the corresponding partitioning of co-located 8 x 8 block in reference layer. • Each Mx. N sub macroblock partition in the 8 x 8 reference block corresponds to (2 M)x(2 N) macroblock partition in enhancement layer. • The motion vectors are derived by scaling the reference layer motion vector by 2. Department of Electrical Engineering, Institute of Computer and Communication Engineering

Inter Layer Intra Prediction National Cheng Kung University, Tainan, Taiwan 13 -126 • The corresponding reconstructed intra signal itself, of the reference layer is up-sampled. • Luma component is up-sampled using one-dimensional 4 -tap FIR filters in both horizontal and vertical direction. • Chroma components are up-sampled by simple bilinear filters. • In this way, it is avoided to reconstruct the inter coded macroblocks in the reference layer, and Single Loop Decoding is provided. Department of Electrical Engineering, Institute of Computer and Communication Engineering

Inter Layer Residual Prediction National Cheng Kung University, Tainan, Taiwan 13 -127 • Can be employed for all inter coded macroblocks, irrespective of base_mode_flag. • The mechanism that involves using the base layer prediction residual to predict the enhancement layer prediction residual. • Permits an enhancement layer video stream to be decoded with only one motion compensation loop at the enhancement layer and no motion compensation needs to be done at base layer. • Reduces decoder complexity. • The up-sampled residual of the co-located reference layer block is subtracted from the enhancement layer residual and only the resulting difference is encoded. Department of Electrical Engineering, Institute of Computer and Communication Engineering

Inter Layer Residual Prediction National Cheng Kung University, Tainan, Taiwan • The EL macroblocks E, F, G, H, covered by only one up sampled macroblock, A, B, C, D. • Without residual prediction: EL macroblock G is predicted from EL macroblock E, written as PEG, E(G) = O(G) – PEG • With residual prediction: The residual of BL macroblock C, i. e. O(C) – PAC is also used, to form a prediction for G. E(G) = O(G) – P’EG – U(O(C) - PAC) P’EG : Prediction formed from macroblock E under residual prediction mode. O (·) : Original Pixels E (·) : Prediction Residual U (·) : Upsampling function 13 -128 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Spatial + SNR Scalability Encoding ME, MC and Intra Prediction, Interlayer Prediction SVC enhancement layer Downsampling Quantization, Entropy coding D=1, Q=0 Upsampling ME, MC and Intra Prediction, Interlayer Prediction SVC enhancement layer ME, MC and Intra Prediction, Interlayer Prediction SVC base layer (H. 264 encoding) 13 -129 D=1, Q=1 Quantization, Entropy coding, Deblocking D=0, Q=1 Quantization, Entropy coding D=0, Q=0 Department of Electrical Engineering, Institute of Computer and Communication Engineering Multiplex National Cheng Kung University, Tainan, Taiwan SVC enhancement layer Quantization, Entropy coding, Deblocking SVC Bitstream

SNR Scalability National Cheng Kung University, Tainan, Taiwan 13 -130 • Types of SNR Scalability: – Coarse Grain Scalability (CGS) – Medium Grain Scalability (MGS) – Fine Grain Scalability (FGS) • Not supported by SVC standard because of very poor enhancement layer coding efficiency. • Bit rate adaptation at same spatial/temporal resolution • Provides graceful degradation of quality • Error resilience Department of Electrical Engineering, Institute of Computer and Communication Engineering

SNR (Quality) scalability Quality Level 2 National Cheng Kung University, Tainan, Taiwan Quality Level 1 Quality Level 0 SNR Layer 1 SNR Layer 2 SVC supports up to 16 SNR layers for each spatial layer 13 -131 Department of Electrical Engineering, Institute of Computer and Communication Engineering

CGS SNR Scalability National Cheng Kung University, Tainan, Taiwan 13 -132 • Coarse Grain Scalability – Can be considered as a special case of Spatial scalability except for identical picture sizes at the enhancement layer. – Enhancement layer coded with lower quantization parameter. – Only allows few selected bit rates to be supported in the scalable bit stream. Department of Electrical Engineering, Institute of Computer and Communication Engineering

MGS SNR Scalability National Cheng Kung University, Tainan, Taiwan Medium Grain Scalability (MGS) • Throwing away an entire SNR enhancement layer results in rapid loss in quality • The enhancement layer SNR packets can be removed in any order to reduce bit rate – Removing the right packets can provide a graceful degradation in quality • Example: – The (dotted) blue packets could be removed first to achieve a slight reduction in bit rate – If we still need some more reduction in bit rate, dotted red/green packets could also be removed. SNR Layer 1 SNR Layer 0 13 -133 Department of Electrical Engineering, Institute of Computer and Communication Engineering

SNR Scalability and Drift National Cheng Kung University, Tainan, Taiwan 13 -134 • Drift: Effect of lack of synchronization between motioncompensated prediction loops at encoder and decoder. – The synchronization loss may occur due to removal of quality refinement packets from the bit stream at decoder. • There is a tradeoff between enhancement layer coding efficiency and drift. Department of Electrical Engineering, Institute of Computer and Communication Engineering

SNR Scalability and Drift • Previously used concepts for trading off Enhancement layer coding efficiency and Drift National Cheng Kung University, Tainan, Taiwan 13 -135 • BL only control • EL only control • Two-loop control • No Drift propagation • Drift propagation in Both BL and EL • No Drift in BL • Drift propagation in EL only • High complexity • Efficient BL, medium efficient EL • H. 262, H. 263, MPEG 4 • Efficient BL , in-efficient EL • MPEG 4 FGS • In-Efficient BL , efficient EL • MPEG 2 FGS Department of Electrical Engineering, Institute of Computer and Communication Engineering

“Key Pictures” in SVC National Cheng Kung University, Tainan, Taiwan 13 -136 • SVC can use a combination of the three schemes described earlier – Using Key pictures to close the drift • Key Pictures for containing the drift – Normal pictures : Uses highest quality level reconstruction for MCP – Key Pictures (Closed loop Pictures) : Uses lowest quality level reconstruction for MCP – Drift doesn’t propagate beyond the key picture Department of Electrical Engineering, Institute of Computer and Communication Engineering

“Key Pictures” in SVC National Cheng Kung University, Tainan, Taiwan 13 -137 • Requires both lowest quality and highest quality to be reconstructed at key pictures • In order to limit decoding overhead for Key pictures, SVC do not allow change of motion parameters between base and enhancement layer representations of Key pictures. • This means enhancement quality levels are not allowed motion refinement for key pictures • Only one Motion Compensation is sufficient • Single loop decoding is possible in key pictures too! Department of Electrical Engineering, Institute of Computer and Communication Engineering

“Key Pictures” in SVC National Cheng Kung University, Tainan, Taiwan Example: Drift due to intermediate picture • The drift propagates only until the next key picture. • The base layer key frame needs to be de-blocked twice. – The fully decoded base layer key frame as reference for next key frame – The partially decoded key frame used for interlayer prediction Example: Drift due to first EL picture itself 13 -138 Department of Electrical Engineering, Institute of Computer and Communication Engineering

SVC Encoder National Cheng Kung University, Tainan, Taiwan 13 -139 Department of Electrical Engineering, Institute of Computer and Communication Engineering

SVC: Combined Scalability Spatio-Temporal-Quality Cube National Cheng Kung University, Tainan, Taiwan 13 -140 Department of Electrical Engineering, Institute of Computer and Communication Engineering

數位三維視訊 Multiview Video Compression

MPEG Standardization Call for Proposal (N 7567, Oct. 2005) n Proposal Competition (M 12969, Jan. 2006) p NTT and Nagoya University p Thomson and University of Southern California p KDDI Corp. p ETRI and Sejong University (M 12871) p MERL (M 12828) p KBS and Yonsei University (M 12874) p Fraunhofer-HHI (M 12945) p Technical University of Berlin n National Cheng Kung University, Tainan, Taiwan 13 -142 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Multiview Video Capturing System National Cheng Kung University, Tainan, Taiwan Why we need multiview video coding? – Multiview LCD display is available in the market! – Increasing network bandwidth – Huge video redundancy – Future 3 D video 13 -143 Multivew: Multiple Viewpoints Department of Electrical Engineering, Institute of Computer and Communication Engineering

Multiview Frame Structure time National Cheng Kung University, Tainan, Taiwan 1 3 4 5 6 7 . . view 13 -144 2 . . . Department of Electrical Engineering, Institute of Computer and Communication Engineering . . .

DCVP Disparity Compensated View Prediction (DCVP) • Problems National Cheng Kung University, Tainan, Taiwan – High spatial correlations between different viewpoints • Solution – Prediction between viewpoints B B P B B I B I B B P B B I P I B B P B B I B B P B B I I 13 -145 I B B P B B I Department of Electrical Engineering, Institute of Computer and Communication Engineering …. .

MVC Encoding Block Diagram Predictions based on H. 264/AVC View i National Cheng Kung University, Tainan, Taiwan picture + - T Q Q-1 T-1 Mode Decision + + Intra Prediction Motion Compensation Motion Estimation Reference Picture i Deblocking Filter Disparity / Compensation Disparity Estimation 13 -146 Entropy Coding Reference Picture Store for Other Views Department of Electrical Engineering, Institute of Computer and Communication Engineering Bit Stream

MVC Decoding Block Diagram Intra Prediction National Cheng Kung University, Tainan, Taiwan Q-1 Prediction Residuals in DCT Coefficients 13 -147 Bit Stream T-1 + Mode + Entropy Decoder MVs Inter Prediction Deblocking Filter View 1 View 2 View 3 Reference Picture Index View N-1 Reference Picture Department of Electrical Engineering, Institute of Computer and Communication Engineering Decoded Image

View Synthesis Prediction • Problems – Different viewpoints have different depth • Rotation, translation speed time C view National Cheng Kung University, Tainan, Taiwan 13 -148 • Solution – Synthesis virtual images before real prediction View Synthesis Via View Warping C’ View Synthesis Via View Interpolation R: Rotation matrix D: Depth information T: Translation matrix A: Intrinsic matrix Department of Electrical Engineering, Institute of Computer and Communication Engineering

How to Get Depths? Depth Information is obtained: National Cheng Kung University, Tainan, Taiwan 13 -149 • From camera record • From well-known computer vision algorithms • Block-based depth search where denotes the average error between the block at (x, y) in camera c at time t Department of Electrical Engineering, Institute of Computer and Communication Engineering

How to Get Depths? National Cheng Kung University, Tainan, Taiwan 13 -150 • Depths Map: – Left: computer vision algorithm – Right: block based depth search • Compression result: – Depth information: 5 -10% total bitrates – Left and right have equal performance Department of Electrical Engineering, Institute of Computer and Communication Engineering

Prediction Structure National Cheng Kung University, Tainan, Taiwan 13 -151 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Hierarchical B pictures National Cheng Kung University, Tainan, Taiwan • Hierarchical B pictures – Fully compatible to AVC Main profile – Non-dyadic decomposition is available GOP boundaries MCTF Enhancement Layer L 3 H 1 H 2 H 1 H 3 H 1 H 2 H 1 L 3 A B 3 B 2 B 3 B 1 B 3 B 2 B 3 A AVC Main Profile Compatible Base Layer 13 -152 Department of Electrical Engineering, Institute of Computer and Communication Engineering

Hierarchical B pictures National Cheng Kung University, Tainan, Taiwan 13 -153 Department of Electrical Engineering, Institute of Computer and Communication Engineering