Overview of H 264 MPEG-4 Part 10

Скачать презентацию Overview of H 264 MPEG-4 Part 10

c2269fb3f2c35d187b689e751e06ae57.ppt

Количество слайдов: 138

Overview of H. 264 / MPEG-4 Part 10 2004. 10. 20. Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington -1 -

Contents 1. Introduction 2. Layered Structure 3. Video Coding Algorithm 4. Error Resilience 5. Comparison of Coding Efficiency 6. Conclusions -2 -

Introduction ▣ Scope of Image and Video Coding Standards ◈ Only the Syntax and Decoder are standardized: – Optimization beyond the obvious – Complexity reduction for implementation – Provides no guarantees of quality Input (image / video) Pre-Processing Output (image / video) Encoding Post-Processing & Error Recovery Decoding Scope of Standard -3 -

Introduction ▣ Video Coding Standards Standard Main Applications Year JPEG, JPEG 2000 Image 1992 -1999, 2000 JBIG Fax 1995 -2000 H. 261 Video Conferencing 1990 H. 262, H. 262+ DTV, SDTV 1995, 2000 H. 263, H. 263++ Videophone 1998, 2000 MPEG-1 Video CD 1992 MPEG-2 DTV, SDTV, HDTV, DVD 1995 MPEG-4 Interactive video 2000 MPEG-7 Multimedia Content description Interface 2001 MPEG-21 Multimedia Framework 2002 H. 264/MPEG-4 part 10 Advanced Video Coding 2003 Fidelity Range Extensions (High profile), Studio editing, Post processing, Digital cinema 2004 August -4 -

Introduction ▣ MPEG-1 ◈ Formally ISO/IEC 11172 -2 (’ 93), developed by ISO/IEC JTC 1 SC 29 WG 11 (MPEG) – use is fairly widespread, but mostly overtaken by MPEG-2 – Superior quality compared to H. 261 when operated at higher bit rates ( 1 Mbps for CIF 352 x 288 resolution) – Provides approximately VHS quality between 1 -2 Mbps using SIF 352 x 240/288 resolution – Additional technical features : • • Bi-directional motion prediction (B-pictures) Half-pel motion vector resolution Slice-structured coding DC-only “D” pictures -5 -

Introduction ▣ Predictive Coding with B Pictures I B P -6 -

Introduction ▣ MPEG-2 / H. 262 ◈ Formally ISO/IEC 13818 -2 & ITU-T H. 262, developed (1994) jointly by ITU-T and ISO/IEC SC 29 WG 11 (MPEG) – Now in wide use for DVD and standard & high-definition DTV (the most commonly used video coding standard) – Primary new technical features: • Support for interlaced-scan pictures – Also • Various forms of scalability (SNR, Spatial, Temporal and hybrid) • I-picture concealment motion vectors – Essentially same as MPEG-1 for progressive-scan pictures, and MPEG -1 forward compatibility is required – Not especially useful below 2 -3 Mbps (range ~2 -5 Mbps SDTV broadcast, 6 -8 Mbps DVD, 18 Mbps HDTV), picture skipping not easy -7 -

Introduction ▣ H. 263 : The Next Generation ◈ ITU-T Rec. H. 263 (v 1: 1995): The next generation of video coding performance, developed by ITU-T – the current premier ITU-T video standard (has overtaken H. 261 as dominant videoconferencing codec) – Superior quality to prior standards at all bit rates (except perhaps for interlaced video) – Wins by a factor of two at very low rates – Version 2 (late 1997 / early 1998) & version 3 (2000) later developed with a large number of new features – Profiles defined early 2001 – H. 263+ & H. 263++ (Extensions to H. 263) -8 -

Introduction ▣ MPEG-4 Visual : Baseline H. 263 and Many Creative Extras ◈ MPEG-4 Visual (formally 14496 -2, v 1: early 1999): Contains the H. 263 baseline design and adds essentially all prior features and many creative new extras: – – – – Segmented coding of shapes Scalable wavelet coding of still textures Mesh coding Face animation coding Coding of synthetic and semi-synthetic content 10 & 12 -bit sampling More … v 2 (early 2000) & v 3 (early 2001) added later -9 -

Introduction ▣ Relationship to Other Standards ◈ Same design to be approved in both ITU-T / VCEG and ISO/IEC / MPEG ◈ In ITU-T / VCEG this is a new & separate standard – ITU-T Recommendation H. 264 – ITU-T Systems (H. 32 x) is modified to support it ◈ In ISO/IEC / MPEG this is a new “part” in the MPEG-4 suite – Separate coded design from prior MPEG-4 visual (Part 2) – New part 10 called “Advanced Video Coding” (AVC – similar to “AAC” MPEG-2 as separate audio codec) – Not backward or forward compatible with prior standards – MPEG-4 Systems / File Format modifying to support it ◈ H. 222. 0 | MPEG-2 Systems are also be modified to support it ◈ IETF working on RTP payload packetization -10 -

Introduction ▣ History of H. 264 / MPEG-4 part 10 ◈ ITU-T Q. 6/SG 16 started work on H. 26 L (L: Long Range) ◈ July 2001: H. 26 L demonstrated at MPEG (Moving Picture Experts Group) call for technology ◈ December 2001: ITU-T VCEG (Video Coding Experts Group) and ISO/IEC MPEG started a joint project – Joint Video Team (JVT) ◈ May 2003: Final approval from ISO/IEC and ITU-T ◈ The standard is named H. 264 by ITU-T and MPEG-4 part 10 by ISO/IEC ◈ Fidelity Range Extensions (August 2004) Amendment 1 ◈ Transport of MPEG-4 AVC on MPEG-2 TS Ammendment 3 -11 -

Introduction ▣ Purpose of H. 264 / MPEG-4 part 10 ◈ Higher coding efficiency than previous standards, MPEG-1, 2, 4 part 2, H. 261, H. 263 ◈ Simple syntax specifications ◈ Seamless integration of video coding into all current protocols ◈ More error robustness ◈ Various applications like video broadcasting, video streaming, video conferencing, D-Cinema, HDTV ◈ Network friendliness ◈ Balance between coding efficiency, implementation complexity and cost - based on state-of the-art in VLSI design technolgy -12 -

Introduction ▣ H. 264 / MPEG-4 part 10 Architecture -13 -

Introduction ▣ Applications of H. 264 / MPEG-4 part 10 : A Broad range of applications for video content including but not limited to the following: Video Streaming over the internet ◈ CATV Cable TV on optical networks, copper, etc. ◈ DBS Direct broadcast satellite video services ◈ DSL Digital subscriber line video services ◈ DTTB Digital terrestrial television broadcasting, cable modem, DSL ◈ ISM Interactive storage media (optical disks, etc. ) ◈ MMM Multimedia mailing ◈ MSPN Multimedia services over packet networks ◈ RTC Real-time conversational services (videoconferencing, videophone, etc. ) ◈ RVS Remote video surveillance ◈ SSM Serial storage media (digital VTR, etc. ) ◈ D Cinema Content contribution, content distribution, studio editing, post processing -14 -

Introduction ▣ Profiles and Levels for particular applications ◈ Profile : a subset of entire bit stream of syntax, different decoder design based on the Profile – Four profiles : Baseline, Main, Extended and High Profile Applications Baseline Video Conferencing Videophone Main Digital Storage Media Television Broadcasting Extended Streaming Video High Content contribution Content distribution Studio editing Post processing -15 -

Introduction ◈ Specific coding parts for the Profiles -16 -

Introduction ◈ Common coding parts for the Profiles – I slice (Intra-coded slice) : the coded slice by using prediction only from decoded samples within the same slice – P slice (Predictive-coded slice) : the coded slice by using inter prediction from previously-decoded reference pictures, using at most one motion vector and reference index to predict the sample values of each block – CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding -17 -

Introduction ◈ Coding parts for Baseline Profile – Common parts : I slice, P slice, CAVLC – FMO Flexible macroblock order : macroblocks may not necessarily be in the raster scan order. The map assigns macroblocks to a slice group – ASO Arbitrary slice order : the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture – RS Redundant slice : This slice belongs to the redundant coded data obtained by same or different coding rate, in comparison with previous coded data of same slice -18 -

Introduction ◈ Coding parts for Main Profile – Common parts : I slice, P slice, CAVLC – B slice (Bi-directionally predictive-coded slice) : the coded slice by using inter prediction from previously-decoded reference pictures, using at most two motion vectors and reference indices to predict the sample values of each block – Weighted prediction : scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice – CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding -19 -

Introduction ◈ Coding parts for Extended Profile – Common parts : I slice, P slice, CAVLC – SP slice : the specially coded slice for efficient switching between video streams, similar to coding of a P slice – SI slice : the switched slice, similar to coding of an I slice – Data partition : the coded data is placed in separate data partitions, each partition can be placed in different layer unit – Flexible macroblock order (FMO) – Arbitrary slice order (ASO) – Redundant slice (RS) – B slice – Weighted prediction -20 -

Introduction ◈ Profile specifications Baseline Main Extended High I & P Slices X X Deblocking Filter X X ¼ Pel Motion Compensation X X X X CAVLC/UVLC X X Error Resilience Tools – Flexible MB Order, ASO, Red. Slices X Variable Block Size (16 x 16 to 4 x 4) X SP/SI Slices X X B Slice X X X Interlaced Coding X X X CABAC X Data Partitioning X -21 -

Introduction ▣ Application requirements Application H. 264 Profiles MPEG-4 Profiles Broadcast television Coding efficiency, reliability (over a controlled distribution channel), interlace, low-complexity decoder Main ASP (Advanced Simple) Streaming video Coding efficiency, reliability (over a uncontrolled packet-based network channel), scalability Extended ARTS (Advanced Real Time Simple) or FGS (Fine Granular Scalability) Video storage and playback Coding efficiency, interlace, low-complexity encoder and decoder Main ASP Videoconferencing Coding efficiency, reliability, low latency, lowcomplexity encoder and decoder Baseline SP (Simple) Mobile video Coding efficiency, reliability, low latency, lowcomplexity encoder and decoder, low power consumption Baseline SP Studio distribution Requirements Lossless or near-lossless, interlace, efficient transcoding Main High Studio Profile -22 -

Introduction ◈ Level : corresponding to processing power and memory capability of a codec Level number Picture type & frame rate 1 QCIF @ 15 fps 1. 1 QCIF @ 30 fps 1. 2 CIF @ 15 fps 1. 3 CIF @ 30 fps 2. 1 HHR @15 or 30 fps 2. 2 SDTV @ 15 fps 3 SDTV: 720 x 480 x 30 i, 720 x 576 x 25 i 10 Mbps(max) 3. 1 1280 x 720 x 30 p 3. 2 1280 x 720 x 60 p 4 HDTV: 1920 x 1080 x 30 i, 1280 x 720 x 60 p, 2 Kx 1 Kx 30 p 20 Mbps(max) 4. 1 HDTV: 1920 x 1080 x 30 i, 1280 x 720 x 60 p, 2 Kx 1 Kx 30 p 50 Mbps(max) 4. 2 HDTV: 1920 x 1080 x 60 i, 2 Kx 1 Kx 60 p 5 SHDTV/D-Cinema: 2. 5 Kx 2 Kx 30 p 5. 1 SHDTV/D-Cinema: 4 Kx 2 Kx 30 p -23 -

Introduction ◈ Parameter set limits for each Level Max macroblock processing rate (MB/s) Max frame size (MBs) Max decoded picture buffer size (1024 bytes) Max video bit rate (1000 bits/s or 1200 bits/s) Max CPB size (1000 bits or 1200 bits) Vertical MV component range (luma frame samples) Min compression ratio Max number of MVs per two consecutive MBs 1 1 485 99 148. 5 64 175 [-64, +63. 75] 2 - 1. 1 3 000 396 337. 5 192 500 [-128, +127. 75] 2 - 1. 2 6 000 396 891. 0 384 1 000 [-128, +127. 75] 2 - 1. 3 11 880 396 891. 0 768 2 000 [-128, +127. 75] 2 - 2 11 880 396 891. 0 2 000 [-128, +127. 75] 2 - 2. 1 19 800 792 1 782. 0 4 000 [-256, +255. 75] 2 - 2. 2 20 250 1 620 3 037. 5 4 000 [-256, +255. 75] 2 - 3 40 500 1 620 3 037. 5 10 000 [-256, +255. 75] 2 32 3. 1 108 000 3 600 6 750. 0 14 000 [-512, +511. 75] 4 16 3. 2 216 000 5 120 7 680. 0 20 000 [-512, +511. 75] 4 16 4 245 760 8 192 12 288. 0 20 000 25 000 [-512, +511. 75] 4 16 4. 1 245 760 8 192 12 288. 0 50 000 62 500 [-512, +511. 75] 2 16 4. 2 491 520 8 192 12 288. 0 50 000 62 500 [-512, +511. 75] 2 16 5 589 824 22 080 41 310. 0 135 000 [-512, +511. 75] 2 16 5. 1 983 040 36 864 69 120. 0 240 000 [-512, +511. 75] 2 16 Level number -24 -

Layered Structure ▣ Two Layers : Network Abstraction Layer (NAL), Video Coding Layer (VCL) ◈ NAL – Abstracts the VCL data – hence the name Network ‘Abstraction’ Layer – Header information about the VCL format – Appropriate for conveyance by the transport layers or storage media – NAL unit (NALU) defines a generic format for use in both packet based and bit-streaming systems ◈ VCL – Core coding layer – Concentrates on attaining maximum coding efficiency -25 -

Layered Structure ▣ Elements of VCL -26 -

Layered Structure ▣ Supporting picture format : 4: 2: 0 chroma sampling 4 CIF Format 352 4 2 288 lines 2 144 lines 2 176 Y 2 180 pels 2 88 1 1 88 1 72 lines 180 pels 176 180 pels 1 144 lines 2 144 lines 360 pels QCIF format 176 90 pels Cb Cr -27 -

Video Coding Algorithm ▣ Block diagram for H. 264 encoder Video Input Bitstream Output + - Entropy Coding Transform & Quantization Inverse Quantization & Inverse Transform + Intra/Inter Mode Decision Motion Compensation + Intra Prediction Picture Buffering Deblocking Filter Motion Estimation -28 -

Video Coding Algorithm ▣ Block diagram for H. 264 Decoder Bitstream Input Entropy Decoding Inverse Quantization & Inverse Transform + Video Output Deblocking Filter + Intra/Inter Mode Selection Picture Buffering Intra Prediction Motion Compensation -29 -

VC Algorithm : Intra Prediction ▣ Exploits Spatial redundancy between adjacent macroblocks in a frame ▣ 4 x 4 luma block ◈ 9 prediction modes : 8 Directional predictions and 1 DC prediction (vertical : 0, horizontal : 1, DC : 2, diagonal down left : 3, diagonal down right : 4, vertical right : 5, horizontal down : 6, vertical left : 7, horizontal up : 8) M A B C D I a b c d J e f g i j k l F G H M L m n o mode 0 mode 5 mode 8 p mode 4 C D a b c d e f g h K mode 6 B J mode 1 A I h K E i j k l L m n o p mode 3 E F G H mode 7 samples a, b, …, p : the predicted ones for the current block, above and left samples A, B, …, M : previously reconstructed ones -30 -

VC Algorithm : Intra Prediction ▣ Example of 4 x 4 luma block ◈ Sample a, d : predicted by round(I/4 + M/2 + A/4), round(B/4 + C/2 + D/4) for mode 4 ◈ Sample a, d : predicted by round(I/2 + J/2), round(J/4 + K/2 + L/4) for mode 8 M A B C D E F G H M A B C D I a b c d I a b c e f g h J e f g i j k l K i j k m n o p L m n o H l L G h K F d J E p mode 4 mode 8 -31 -

VC Algorithm : Intra Prediction ▣ 16 x 16 luma ◈ 4 prediction modes (vertical : 0, horizontal : 1, DC : 2, plane : 3) Plane: works well in smoothly varying luminance. A linear ‘plane’ function is fitted to the upper (H) and left side (V) samples (8 x 8) luma (FRExt only) similar to 4 x 4 luma with low pass filtering of the predictor to improve prediction performance Plane -32 -

VC Algorithm Intra Prediction ▣ Chroma always operates using full MB prediction (8 x 8) 4: 2: 0 Format (8 x 16) 4: 2: 2 (16 x 16) 4: 4: 4 (Similar to 16 x 16 luma block but different mode order) 4 Prediction modes (DC: 0, Horizontal: 1, Vertical: 2, Plane: 3) -33 -

VC Algorithm : Inter Prediction ▣ Exploits temporal redundancy ▣ Prediction of variable block sizes ▣ Sub-pel motion compensation ▣ Deblocking filter ▣ Management of multiple reference pictures -34 -

VC Algorithm : Inter Prediction ◈ Prediction of variable block size – A MB can be partitioned into smaller block sizes – 4 cases for 16 x 16 MB, 4 cases for 8 x 8 Sub-MB – Large partition size : homogeneous areas, small : detailed areas Cannot mix the two partitions. i. e. cannot have 16 x 8 and 4 x 8 partitions When sub-MB partition (8 x 8) is selected, the (8 x 8) block can be further partitioned -35 -

VC Algorithm : Inter Prediction ▣ Sub-pel motion compensation ◈ Better compression performance than integer-pel MC ◈ Expense of increased complexity ◈ Outperforms at high bit rates and high resolutions Video Input + Entropy Coding Transform & Quantization - Bitstream Output Inverse Quantization & Inverse Transform Intra/Inter Mode Decision Motion Compensation 16 x 16 16 x 8 Intra MB Prediction 0 0 1 8 x 4 8 x 8 Picture Motion Estimation + + 8 x 16 8 x 8 0 0 2 1 4 x 8 1 3 Deblocking 4 x 4 Filtering 0 1 Buffering 0 Sub 0 1 0 MB 2 3 1 motion vector accuracy 1/4 (6 tap filter) -36 -

VC Algorithm : Inter Prediction ◈ Sub-pel accuracy A distinct MV can be sent for each sub-MB partition. ME can be based on multiple pictures that lie in the past or in the future in display order. Reference picture for ME is selected at the MB partition level. Sub-MB partitions within the same MB partition must use the same reference picture. -37 -

VC Algorithm : Inter Prediction ◈ Half-pel : interpolated from neighboring integer-pel samples using a 6 -tap Finite Impulse Response filter with weights (1, -5, 20, -5, 1)/32 ◈ Quarter-pel : produced using bilinear interpolation between neighboring half- or integer-pel samples b = round((E-5 F+20 G+20 H-5 I+J)/32) a = round((G+b)/2) -38 -

VC Algorithm : Inter Prediction ▣ Deblocking filter Adaptive ◈ To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise ◈ Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock, adaptively on the several levels (slice, blockedge, sample) -39 -

VC Algorithm : Inter Prediction ▣ Management of multiple reference pictures ◈ To take care of marking some stored pictures as ‘unused’ and deciding which pictures to delete from the buffer Video Input + Entropy Coding Transform & Quantization - Bitstream Output Inverse Quantization & Inverse Transform Intra/Inter Mode Decision Motion Compensation + Intra Prediction Picture Buffering Motion Estimation + Deblocking Filtering management of multiple reference pictures (short term, long term) -40 -

VC Algorithm : Transform & Quantization ▣ Transform ◈ Integer transform, multiplier free : additions and shifts in 16 -bit arithmetic ◈ Hierarchical structure : 4 x 4 Integer DCT + Hadamard transform Assignment of the indices of DC (dark samples) to luma 4 x 4 block, the numbers 0, 1, …, 15 are the coding order for (4 x 4) integer DCT transform (0, 0), (0, 1), (0, 2), …, (3, 3) are DC coefficients of each 4 x 4 block Hadamard transform is applied only when (16 x 16) intra prediction mode is used with (4 x 4) Int. DCT. Similarly for the chroma: MB size for chroma depends on 4: 2: 0, 4: 2: 2 and 4: 4: 4 formats -41 -

VC Algorithm : Transform ▣ 4 x 4 integer DCT ◈ X : input pixels, Y : output coefficients Y=(Cf x Cf. T) Ef Implies element by element multiplication -42 -

4 x 4 Inverse Int. DCT X = Ci. T (Y Ei) Ci Here In both forward and inverse transforms QP (Quantization step) is embedded in matrices Ef and Ei -43 -

VC Algorithm : Transform ▣ Luma DC coefficients for Intra 16 x 16 MB ◈ 16 DC coefficients of 16 (4 x 4) blocks are transformed using Walsh Hadamard transform Y D= where // = rounding to the nearest integer -44 -

VC Algorithm : Transform ▣ Chroma DC coefficients Intra pediction mode (4 x 4) Int. DCT ◈ Walsh Hadamard transform : 2 x 2 DC coefficients Y D= , 4: 2: 0 U V 16 17 18 20 2 x 2 DC 19 22 23 21 24 25 AC For 4: 2: 2 and 4: 4: 4 chroma formats Hadamard block size is increased. -45 -

VC Algorithm : Transform ▣ Block diagram emphasizing transform Video Input + - - 4 x 4 integer DCT transform 1 1 1 1 H = 2 1 – 1 – 1 1 – 2 2 – 1 Motion Intra/Inter Mode Decision Intra Prediction Compensation - Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks Picture Buffering Entropy Coding Transform & Quantization Bitstream Output Inverse Quantization & Inverse Transform + + Deblocking Filtering Motion Estimation -46 -

VC Algorithm : Quantization ▣ Multiplication operation for the exact transform is combined with the multiplication of scalar quantization ◈ Encoder : post-scaling and quantization ◈ Decoder : inverse quantization and pre-scaling X : quantizer input Y : quantizer output Qstep : quantization parameter, a total of 52 values, doubles in size for every increment of 6 in QP 8 for bits per decoded sample. FRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF : scaling term -47 -

VC Algorithm : Transform, Quantization Rescale and Inverse transform Intra (16 x 16) prediction mode only Input block Forward transform Post-scaling and quantization 2 x 2 or 4 x 4 DC transform Chroma or Intra 16 Luma Only Encoder part Encoder output / decoder input Output block Inverse quantization and pre-scaling Inverse transform 2 x 2 or 4 x 4 DC inverse transform Chroma or Intra 16 Luma Only Decoder part -48 -

VC Algorithm : Entropy Coding ▣ All syntax elements other than residual transform coefficients are encoded by the Exp-Golomb codes (UVLC) ▣ Scan order to read the residual data (quantized transform coefficients) : zig-zag, alternate ▣ Context-based Adaptive Variable Length Coding (CAVLC) in All Profiles ▣ Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Profile Zig-zag scan Alternate scan -49 -

▣ Exponential Golomb codes (for data elements other than tansform coefficients – these codes are actually fixed, and are also called Universal Variable Length Codes (UVLC)) -50 -

These are variable length codes with a regular construction [M Zeroes] [1] [INFO] INFO is an M-bit carrying information. The first codeword as no leading zero or trailing info. Code words 1 and 2 have a single-bit INFO field, code words 3 -6 have a two-bit INFO field and so on. The length of each Exp-Golomb codeword is (2 M+1) bits. M = Floor (Log 2 [code_num + 1]) INFO = code_num + 1 – 2 M -51 -

Decoding 1. Read in M leading zeroes followed by 1 2. Read in M-bit INFO field 3. Code_num = 2 M + INFO – 1 (For codeword 0, INFO and M are zero) CAVLC: Codes transform coefficients CABAC: Codes transform coefficients and MV All other syntax elements are coded with the Exp_Golomb codes -52 -

VC Algorithm : Entropy Coding ▣ CAVLC : handles the zero and +/-1 coefficients as the different manner with the levels of coefficients. The total numbers of zeros and +/-1 are coded. For the other coefficients, their levels are coded. ▣ Encoding steps ◈ step 1 : encode the total number of nonzero coefficients and +/-1 (trailing ones) values ◈ step 2 : encode the sign of each trailing one in reverse order ◈ step 3 : encode the levels of the remaining non-zero coefficients in reverse order ◈ step 4 : encode the total number of zeros before the last coefficient ◈ step 5 : encode each run of zeros H. 264 maintains 11 different sets of codes (4 for # of coefficients and 7 for the actual coefficients) These are adopted to the current stream or context (thus CAVLC) -53 -

VC Algorithm : Entropy Coding ▣ Example of CAVLC order 0 1 2 3 4 5 6 7 8 9 … 16 coeff. c 0 c 1 c 2 0 1 1 0 – 1 0 0 … 0 Step 1 : encode for no. of nonzero total coefficients and 1 or – 1 (trailing ones) from look-up table no. of nonzero total coefficients = 6 (order 0, 1, 2, 4, 5, 7) no. of trailing ones = 3 (order 4, 5, 7) Step 2 : encode for sign of trailing one in reverse order - (order 7) , + (order 5), + (order 4) Step 3 : encode for level of remaining non-zero coefficients in reverse order c 2 (order 2), c 1, c 0 Step 4 : encode for total no. of zeros before the last coefficient 2 (order 3, 6) Step 5 : encode for run of zeros in reverse order 1 (order 6 -5), 0 (order 4), 1 (order 3 -2) -54 -

VC Algorithm : Entropy Coding ▣ CABAC : utilizes the arithmetic coding, also in order to achieve good compression, the probability model for each symbol element is updated. Both MV and residual transform coefficients are coded by CABAC. ▣ Encoding steps ◈ step 1 : context modeling: Choose a suitable model ◈ step 2 : binarization: If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins ◈ step 3 : binary arithmetic coding using probability estimates provided by context modeling -55 -

CABAC increases compression efficiency by 10% over CAVLC but computationally more intensive -56 -

VC Algorithm : B Slice ▣ Generalized Bidirectional prediction ◈ Supports not only forward/backward prediction pair, but also forward/forward and backward/backward pairs ▣ Direct mode ◈ Derives reference picture, block size, and motion vector data from the subsequent inter picture. ▣ Weighted prediction ◈ Scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice. Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order) -57 -

VC Algorithm : B Slice ▣ Generalized Bidirectional prediction ◈ Multiple reference pictures mode ◈ Two forward references : proper for a region just before scene change ◈ Two backward references : proper for a region just after scene change -58 -

VC Algorithm : B Slice ▣ Direct mode ◈ Forward / backward pair of bi-directional prediction ◈ Prediction signal is calculated by a linear combination of two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures. mv. L 0 = tb mv. Col / td mv. L 1 = – (td – tb) mv. Col / td where mv. Col is a MV used in the co-located MB of the subsequent picture -59 -

VC Algorithm : B Slice ▣ Weighted prediction ◈ Different weights of reference signals for gradual transitions from scene to scene, i. e. , ‘fade to black’ (the luma samples of the scene gradually approach zero), ‘fade from black’ ◈ Different weighted prediction method for a macroblock of P slice or B slice ◈ A prediction signal p for B slice is obtained by different weights from two reference signals, r 1 and r 2. p = w 1 r 1 + w 2 r 2 where w 1 and w 2 are weighting factors ◈ Implicit type : the factors are calculated based on the temporal distance between the pictures ◈ Explicit type : the factors are transmitted in the slice header -60 -

VC Algorithm: SP and SI Slices (Extended profile only) ▣ Switched slice ◈ SP slice : the specially coded slice for efficient switching between video streams, similar to coding of a P slice ◈ SI slice : the switched slice, similar to coding of an I slice P(1, 1) P(1, 2) P(1, 3) P(1, 4) P(1, 5) P(2, 4) P(2, 5) Bitstream A S(3) P(2, 1) P(2, 2) P(2, 3) Bitstream B Allows bit stream switching and additional functionalities such as random access, fast forward, reverse and stream splicing. -61 -

Error Resilience ▣ Parameter setting ▣ Flexible macroblock ordering (FMO) ▣ Redundant slice methods ▣ Switched slice SP/SI Only in Extended Profile ▣ Data partitioning ▣ Arbitrary Slice Order ASO -62 -

Data partitioning slices (Extended profile only) 1. Coded data of a slice is placed in three separate data partitions A, B & C. 2. A has slice header and header data for each MB in the splice 3. B has coded residual data for intra and SI slice MBs 4. C has coded residual data for inter coded MB 5. Place each partition A, B & C in a separate NAL unit and transport separately -63 -

Error Resilience : Parameter setting ▣ The sequence parameter set contains all information related to a sequence of pictures ▣ a picture parameter set contains all information related to all the slices belonging to a single picture. ▣ The encoder chooses the appropriate picture parameter set to use by referencing the storage location in the slice header of each coded slice. H. 264 Encoder 1 2 H. 264 Decoder VCL Data transfer with PS #3 3 Reliable Parameter Set Exchange 3 2 1 Parameter Set #3 ·Video format NTSC ·Motion Resolution ¼ ·Enc: CABAC ·Frame width: 11 -64 -

Error Resilience : FMO ▣ Flexible macroblock ordering allows to assign macroblocks to slices in an order other than the scan order. ▣ Assume that all macroblocks of the picture allocated either to slice group 0 or slice group 1, and the macroblocks in each slice group are dispersed through the picture. ◈ If the packet containing the information of slice group 1 is lost during transmission, then the lost macroblock can be recovered by the error concealment mechanism, since every lost macroblock has several spatial neighbors that belong to the other slice. ▣ ASO is similar to FMO. Randomizes data prior to transmission. Errors are distributed more randomly over the video frames rather than in a single block of data. -65 -

Error Resilience : Redundant Slice ▣ Redundant slices allow to place one or more redundant representations of the same macroblocks. ▣ For example, the primary representation can be coded with a low quantization parameter (hence in good quality), whereas the redundant slice can be coded with a high quantization parameter (hence, in a much coarser quality, but also utilizing fewer bits). ▣ A decoder reacts to redundant slices by reconstructing only the primary slice, if it is available, and discarding the redundant slice. However, if the primary slice is missing, the redundant slice can be reconstructed. -66 -

Comparison of Coding Efficiency ▣ Subjective verification test ◈ Comparison of the H. 264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD). The numbers in the table indicate the coding efficiency improvement achieved by the H. 264 where the codecs being compared provide statistically equivalent picture quality. The letter ‘T’ indicates that H. 264 achieved transparency. ◈ H. 264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases. Sequence Bitrate[kbps] for QCIF Bitrate[kbps] for CIF 24 96 192 384 768 Foreman 48 > 1 x 2 x 2 x T 2 x > 2 x T T Paris > 1 x 2 x 2 x T, 2 x T Head > 2 x 2 x 2 x T T Zoom > 1 x 1 x 2 x 2 x -67 -

Comparison of Coding Efficiency ▣ Subjective verification test ◈ Comparison of H. 264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD. ◈ H. 264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases. Bitrate[kbps] for QCIF Sequence Bitrate[kbps] for CIF 24 96 192 384 768 Football 2 x / 1 x 2 x 2 x > 1 x 1 x > 1 x Mobile 2 x / 1 x 2 x 2 x > 2 x 4 x > 2 x T Husky 48 2 x 2 x > 1 x 2 x 2 x 2 x Tempete 2 x 2 x > 2 x T 2 x 2 x T, 2 x T -68 -

Comparison of Coding Efficiency ▣ Subjective verification test ◈ Comparison of H. 264 Main Profile and MPEG-2 for the Standard Definition (SD) ◈ When compared to MPEG-2 Hi. Q (real-time High Quality), H. 264 Main Profile achieves a coding efficiency improvement of 1. 5 times or greater in 8 out of 12 statistically conclusive cases. ◈ When compared to MPEG-2 TM 5, H. 264 Main Profile achieves a coding efficiency improvement of 1. 8 times or greater in 9 out of 12 statistically conclusive cases. Sequence Bitrate[Mbps] for MPEG-2 Hi. Q Bitrate[Mbps] for MPEG-2 TM 5 1. 5 2. 25 3 4 6 > 1. 5 x > 1. 3 x 1. 5 x 2 x 1. 8 x 1. 3 x 1. 5 x Mobile 4 x 2. 7 x 2 x T T > 4 x > 2. 7 x > 2 x T T Husky > 1. 5 x 1. 3 x 1 x /1. 3 x 1. 5 x 2. 7 x / 2 x 1. 8 x 2 x > 1. 5 x T, 2 x T T T, 4 x T T Football Tempete -69 -

Comparison of Coding Efficiency ▣ Subjective verification test ◈ Comparison of H. 264 Main Profile and MPEG-2 for the High Definition (HD) ◈ When compared to MPEG-2 Hi. Q, H. 264 Main Profile achieves a coding efficiency improvement of 1. 7 times or greater in 7 out of 9 statistically conclusive cases. ◈ When compared to MPEG-2 TM 5, H. 264 Main Profile achieves a coding efficiency improvement of 1. 7 times or greater in 8 out of 9 statistically conclusive cases. Sequence Bitrate[Mbps] for MPEG-2 Hi. Q Bitrate[Mbps] for MPEG-2 TM 5 6 10 20 1. 7 x 2 x T T, 3. 3 x T T T, 1. 7 x T T 720 (60 p) Crew 1080 (30 i) Stockholm Pan 1 x 2 x New Mobile & Calendar River Bed T, 2 x T > 1. 7 x > 1 x T 1. 7 x T, 2 x T 1080 (25 p) Harbour Vintage Car -70 -

Comparison of Coding Efficiency ▣ Objective test ◈ PSNR (between original and reconstructed pictures) and bitrate saving results of ‘Tempete’ CIF 15 Hz sequence for the video streaming application HLP – High Latency Profile ASP – Advanced Simple Profile H. 26 L – H. 264 Main Profile -71 -

Comparison of Coding Efficiency ▣ Objective test ◈ PSNR and bitrate saving results of ‘Paris’ CIF 15 Hz sequence for the video conferencing application CHC – Conversational High Compression SP – Simple Profile ASP – Advanced Simple Profile H. 26 L – H. 264 Baseline Profile -72 -

Conclusions ▣ H. 264 outperforms over the previous standards ▣ Comparison of standards Feature/Standard MPEG-1 MPEG-2 MPEG-4 part 2 (visual) H. 264/MPEG-4 part 10 16 x 16 (frame mode) 16 x 8 (field mode) 16 x 16 Block Size 8 x 8 16 x 16, 16 x 8, 8 x 8 16 x 16, 8 x 16, 16 x 8, 8 x 8, 4 x 8, 8 x 4, 4 x 4 Transform 8 x 8 DCT/Wavelet 4 x 4, 8 x 8 Int DCT 4 x 4, 2 x 2 Hadamard Scalar quantization with step size of constant increment Vector quantization Scalar quantization with step size increase at the rate of 12. 5% Entropy coding VLC VLC, CABAC Motion Estimation & Compensation Yes Yes, more flexible Up to 16 MVs per MB Playback & Random Access Yes Yes Macroblock size Quantization -73 -

Conclusions ▣ Comparison of standards (continued) (visual) Feature/Standard Pel accuracy MPEG-1 MPEG-2 MPEG-4 part 2 H. 264/MPEG-4 part 10 Integer, ½-pel, ¼-pel Profiles No 5 8 4 Reference picture one one multiple forward/backward forward/forward/backward/backward I, P, B, D I, P, B Error robustness Synchronization & concealment Data partitioning, FEC for important packet transmission Transmission rate Integer, ½-pel Up to 1. 5 Mbps 2 -15 Mbps Compatibility with previous standards n/a Yes No Encoder complexity Low Medium High Bidirectional prediction mode Picture Types I, P, B, SP, SI Data partitioning, Synchronization, Parameter setting, Data partitioning, Flexible macroblock Header extension, Reversible VLCs ordering, Redundant slice, Switched slice 64 kbps - 2 Mbps 64 kbps -240 Mbps -74 -

Conclusions ▣ Currently the commercial H. 264 codecs are widely developed by several companies for replacing / complementing existing products. ◈ Related companies - UBVideo website http: //www. ubvideo. com LSI Logic website http: //www. lsilogic. com Microsoft website: http: //www. microsoft. com Envivio website: http: //www. envivio. com Broadcom website: http: //www. broadcom. com Nagravision website: http: //www. nagravision. com Philips website: http: //www. philips. com Polycom website: http: //www. polycom. com Pixel. Tools Corporation website: http: //www. pixeltools. com Amphion website: http: //www. amphion. com -75 -

Conclusions ◈ Related companies (continued) - Ligos Corporation website: http: //www. ligos. com Life. Size website: http: //www. lifesize. com Netvideo website: http: //www. netvideo. com Motorola website: http: //www. motorola. com Vanguard Software Solutions website: http: //www. vsofts. com STMicroelectronics website: http: //us. st. com Main. Concept website: http: //www. mainconcept. com Impact Labs Inc. website: http: //www. impactlabs. com Sorenson media AVC Pro codec (H. 264) Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoft’s VC-1 video codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification) -76 -

Conclusions ◈ Related group - MPEG website http: //www. mpeg. org - JVT website: ftp: //standards. polycom. com - www. mpegif. org ◈ Test software http: //iphome. hhi. de/suehring/tml/download - H. 264/AVC JM Software: http: //bs. hhi. de/~suehring/tml/download ◈ Test sequences - http: //ise. stanford. edu/video. html - http: //kbs. cs. tu-berlin. de/~stewe/vceg/sequences. htm - http: //www. its. bldrdoc. gov/vqeg - ftp. tnt. uni-hannover. de/pub/jvt/sequences/ - http: //trace. eas. asu. edu/yuv. html -77 -

Conclusions ▣ H. 264 licensing : MPEG LA and Via Licensing are now coordinating the licensing terms, decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) ◈ MPEG LA website : http: //www. mpegla. com ◈ Via Licensing : http: //www. vialicensing. com ▣ FRExtensions ◈ ◈ ◈ to 4: 2: 2 and 4: 4: 4 chroma formats 12 bit resolution for medical imaging Scalable coding/ Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H. Schwartz, D. Marpe and T. Wiegand, “ SNR–scalable extension of H. 264/AVC”, ICIP 2004, vol. , pp. , Singapore, Oct. 2004. ▣ FINAL STAGES OF APPROVAL ◈ Standard systems and file format support specifications ◈ Standardizing reference software implementation ◈ Standardizing conformance bit streams and specifications -78 -

Contacts for Further Information ▣ JVT documents and software on open ftp website: ftp: //standards. polycom. com http: //iphome. hhi. de/suehring ▣ JVT reflector subscription: http: /mail. imtc. org/cgi-bin/lyris. pl? enter=jvt-experts ▣ JVT reflector e-mail: jvt-experts@mail. imtc. org ▣ JVT management team: ◈ Chair: Gary Sullivan (garysull@microsoft. com) ◈ Co-chair: Ajay Luthra (aluthra@motorola. com) ◈ Co-chair: Thomas Wiegand (wiegand@hhi. de) ▣ Dr. K. R. Rao, UTA: rao@uta. edu ▣ Dr. S. K. Kwon, Dongeui University: skkwon@dongeui. ac. kr ▣ Ms. A. Tamhankar, T-Mobile: arundhati@ieee. org ▣ Karsten. suehring@hhi. fraunhofer. de -79 -

References [1] MPEG-2: ISO/IEC JTC 1/SC 29/WG 11 and ITU-T, “ISO/IEC 13818 -2: Information Technology-Generic Coding of Moving Pictures and Associated Audio Information: Video, ” ISO/IEC and ITU-T, 1994. [2] MPEG-4: ISO/IEC JTCI/SC 29/WG 11, “ISO/IEC 14 496: 2000 -2: Information on Technology-Coding of Audio-Visual Objects-Part 2: Visual, ” ISO/IEC, 2000. [3] H. 263 : International Telecommunication Union, “Recommendation ITU-T H. 263: Video Coding for Low Bit Rate Communication, ” ITU-T, 1998. [4] H. 264 : International Telecommunication Union, “Recommendation ITU-T H. 264: Advanced Video Coding for Generic Audiovisual Services, ” ITU-T, 2003. [5] T. Stockhammer, M. Hannuksela, and S. Wenger, “H. 26 L/JVT Coding Network Abstraction Layer and IP-based Transport, ” IEEE ICIP 2002, Rochester, New York, Vol. 2, pp. 485 -488, Sep. 2002. -80 -

[6] P. List, A. Joch, J. Lainema, G. Bjontegaard, and M. Karczewicz, “Adaptive Deblocking Filter, ” IEEE Trans. CSVT, Vol. 13, pp. 614 -619, July 2003. [7] K. R. Rao and P. Yip, Discrete Cosine Transform, Academic Press, 1990. [8] I. E. G. Richardson, H. 264 and MPEG-4 Video Compression : Video Coding for Next-generation Multimedia, Wiley, 2003. [9] H. S. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, “Low. Complexity Transform and Quantization in H. 264/AVC, ” IEEE Trans. CSVT, Vol. 13, pp. 598 -603, July 2003. [10] S. W. Golomb, “Run-Length Encoding, ” IEEE Trans. on Information Theory, IT-12, pp. 399 -401, December 1966. [11] D. Marpe, H. Schwarz, and T. Wiegand, “Context-Based Adaptive Binary Arithmetic Coding in the H. 264/AVC Video Compression Standard, ” IEEE Trans. CSVT, Vol. 13, pp. 620 -636, July 2003. -81 -

[12] M. Flierl and B. Girod, “Generalized B Picture and the Draft H. 264/AVC Video-Compression Standard, ” IEEE Trans. CSVT, Vol. 13, pp. 587 -597, July 2003. [13] M. Karczewicz and R. Kurceren, “The SP- and SI-Frames Design for H. 264/AVC, ” IEEE Trans. CSVT, Vol. 13, pp. 637 -644, July 2003. [14] S. Wenger, “H. 264/AVC Over IP, ” IEEE Trans. CSVT, Vol. 13, pp. 645 -656, July 2003. [15] ISO/IEC JTC 1/SC 29/WG 11, “Report of The Formal Verification Tests on AVC (ISO/IEC 14496 -10 | ITU-T Rec. H. 264)”, MPEG 2003/N 6231, December 2003. [16] M. Ghanbari, “Standard Codecs : Image Compression to Advanced Video Coding, ” Hertz, UK: IEE, 2003. [17] A. Joch, F. Kossentini, H. Schwarz, T. Wiegand, and G. J. Sullivan, “Performance Comparison of Video Coding Standards using Lagrangian Coder Control, ” IEEE ICIP 2002, Rochester, New York, Vol. 2, pp. 501504, Sept. 2002. -82 -

[18] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H. 264/AVC Video Coding Standard, ” IEEE Trans. CSVT, Vol. 13, pp. 560 -576, July 2003. [19] MPEG website : http: //www. mpeg. org [20] JVT website : ftp: //standards. polycom. com [21] MPEG LA website : http: //www. mpegla. com [22] H. 264 / AVC JM Software : http: //bs. hhi. de/~suehring/tml/download [23] UBVideo website http: //www. ubvideo. com [24] LSI Logic website: http: //www. lsilogic. com [25] Microsoft website: http: //www. microsoft. com [26] Envivio website: http: //www. envivio. com [27] Pixel. Tools Corporation website: http: //www. pixeltools. com [28] Nagravision website: http: //www. nagravision. com [29] Philips website: http: //www. philips. com -83 -

[30] Polycom website: http: //www. polycom. com [31] Main. Concept website: http: //www. mainconcept. com [32] Amphion website: http: //www. amphion. com [33] Ligos Corporation website: http: //www. ligos. com [34] Life. Size website: http: //www. lifesize. com [35] Broadcom website: http: //www. broadcom. com [36] Netvideo website: http: //www. netvideo. com [37] Motorola website: http: //www. motorola. com [38] http: //www. mediaware. com [39] Impact Labs Inc. website: http: //www. impactlabs. com [40] Vanguard Software Solutions website: http: //www. vsofts. com [41] STMicroelectronics website: http: //us. st. com www. thomson. net [42] www. conexant. com (H. 264 decoder ICs _ HDTV & SDTV) [43] www. pixtree. com -84 -

[44] BT Exact--http: //www. btexact. bt. com/ [45] Demo. Ga. Fr. X--www. dolby. com [46] Equator--http: //www. equator. com/ [47] Moonlight--www. elecard. com [48] Sand Video--www. broadcom. com/ [49] Video. Locushttp: //www. lsilogic. com/technologies/industry_standards/mpeg_based_ standards_h_264. html [50] W&W Communications (and DSP Research)-http: //www. wwcoms. com/ [51] Cisco Systems -- www. cisco. com [52] Deutsche Telekom-- http: //www. telekom 3. de/en-p/home/ccstartseite. html -85 -

[53] Fast. VDO-- http: //www. fastvdo. com/ [54] Glance Networks---http: //www. glance. net [55] RADVISION-- www. radvision. com/ [56] Sun Microsystems--http: //www. sun. com/ [57] S. Srinivasan et al, “Windows media video 9: Overview and applications”, Signal Processing: Image Communication, vol. 19, pp. 851 -875, Oct. 2004. [57 a] G. Sullivan and T. Wiegand, “ Video compression – from concepts to H. 264/AVC standard”, Proc. IEEE, vol. 93, pp. 18 -31, Jan. 2005. [57 b] C. Gomila, “ The H. 264/MPEG -4 AVC video coding standard”, Short tutorial, EURASIP News Letter, vol. 15, pp. 19 -34, June 2004. [58] http: //ecs. itu. ch -86 -

[59] N. Kamaci and Y. Altunbasak, “ Performance comparison of the emerging H. 264 video coding standard with the existing standards”, IEEE ICME, pp. , Baltimore, MD, July 2003. [60] H. Schwartz, D. Marpe and T. Wiegand, “ SNR–scalable extension of H. 264/AVC”, ICIP 2004, vol. , pp. , Singapore, Oct. 2004. [61] G. J. Sullivan, P. Topiwala and A. Luthra “The H. 264/AVC advanced video coding standard: Overview and introduction to the fidelity range extensions”, SPIE Conf. on applications of digital image processing XXVII, vol. 5558, pp. 53 -74, Aug. 2004. [62] J. Ostermann et al, “ Video coding with H. 264/AVC: Tools, performance and complexity”, IEEE CAS Magazine, vol. pp. 7 -34, I quarter, 2004. [63] W. Gao et al, “ AVS – The Chinese next-generation video coding standard”, NAB 2004, Las Vegas, NV, April 2004. [64] http: //www. imtc. org/activity_groups/ JVT-EXPERTS LIST (FAQ) -87 -

[65] H. 264 / AVC reference SOFWARE 9. 3 [66] http: //iphome. hhi. de/suehring/tml/download/jm 93. zip [67] S. Kumar et al “Overview of error resiliency schemes in H. 264/AVC standard”, JVCIR, Special Issue on H. 264/AVC, VOL. , pp. , June-Aug. 2005. [68] www. stmicroelectronics. com WMV 9 and HD H. 264/AVC decoder chip (STB 7100) [69] a. Concept Main http: //www. mainconcept. com/index_flash. shtml b. Mpegable http: //www. mpegable. com/show/home. html c. Moonlight http: //www. moonlight. co. il/cons_xmuxer. php Moonlight’s codec is one of the popular ones in the industry and it supports AAC. All the codecs have a trial version for download and also sample video clips are available. -88 -

[70] ST Thomson, Broadcom and Ateme http: //www. ateme. com/products/h 264. php have decoder chips for H. 264. Ateme has real time single chip H. 264 Main profile encoder (FPGA) [71] Moscow State University has published a study of current implementation of H. 264 standard, including a widely-used implementation of MPEG-4 ASP as a reference. The study is available at: http: //compression. ru/video/codec_comparison/mpeg 4_avc_h 264_en. html Some of the results and observations in the study may be interesting to H. 264/AVC community. Another interesting test has been performed in December 2004. http: //www. doom 9. org/codecs-104 -1. htm The methodology is completely different than the one used by the Moscow State University. It features H 264, WM 9, RV 10, VP 6 and MPEG-4 ASP. -89 -

▣ http: //www. avc-alliance. org ▣ http: //ftp 3. itu. int/av-arch/jvt-site ▣ Http: //www. dvdforum. org/29 cmtg-resolution. htm High Profile is now officially mandatory for HD DVD Video (DVD - Forum). ▣ http: //tinyurl. com/3 u 9 ww (up to 3 recommendations can be downloaded per year) ▣ http: //tinyurl. com/6 dnck (ISO/IEC 14493 -10 - MPEG-4 part 10 published standard costs CHF 260. 00 Swiss Franks. ) -90 -

Fidelity Range Extensions Slices in a picture are compressed as follows: ♦ "Intra" spatial (block based) prediction o Full-macroblock luma or chroma prediction – 4 modes (directions) for prediction o 8 x 8 (FRExt-only) or 4 x 4 luma prediction – 9 modes (directions) for prediction 4: 2: 2, 4: 4: 4 Formats > 8 bit depths (8 x 8) integer DCT HVS weighting matrices Transform bypass lossless mode: uses prediction and entropy coding of prediction errors Residual color transform Source editing such as Alpha blending High bit rates [use RGB color format] Y Cg Co High resolution -91 -

♦ "Inter" temporal prediction – block based motion estimation and compensation o Multiple reference pictures o Reference B pictures o Arbitrary referencing order o Variable block sizes for motion compensation Seven block sizes: 16 x 16, 16 x 8, 8 x 16, 8 x 8, 8 x 4, 4 x 8 & 4 x 4 o 1/4 -sample luma interpolation (1/4 or 1/8 th-sample chroma interpolation) o Weighted prediction o Frame or Field based motion estimation for interlaced scanned video -92 -

♦ Interlaced coding features o Frame-field adaptation Picture Adaptive Frame Field (Pic. AFF) Choice of compression (frame or field) is selected a the frame level Macro. Block Adaptive Frame Field (MBAFF) o Field scan ♦ Lossless representation capability o Intra PCM raw sample-value macroblocks o Entropy-coded transform-bypass lossless macroblocks (FRExt-only) In the MBAFF, choice of compression (frame or field) is selected at the two-vertical-pair-MB pair. -93 -

♦ 8 x 8 (FRExt-only) or 4 x 4 Integer Inverse Transform (conceptually similar to the well-known DCT) ♦ Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only) ♦ Scalar quantization ♦ Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only) ♦ Logarithmic control of quantization step size as a function of quantization control parameter -94 -

♦ Deblocking filter (within the motion compensation loop) ♦ Coefficient scanning o Zig-Zag (Frame) o Field (alternate scan) ♦ Lossless Entropy coding o Universal Variable Length Coding (UVLC) using Exp-Golomb codes o Context Adaptive VLC (CAVLC) o Context-based Adaptive Binary Arithmetic Coding (CABAC) -95 -

♦ Error Resilience Tools o Flexible Macroblock Ordering (FMO) o Arbitrary Slice Order (ASO) o Redundant Slices ♦ SP and SI synchronization pictures for streaming and other uses -96 -

♦ Various color spaces supported (YCb. Cr of various types, YCg. Co, RGB, etc. – especially in FRExt) ♦ 4: 2: 0, 4: 2: 2 (FRExt-only), and 4: 4: 4 (FRExt-only) color formats ♦ Auxiliary pictures for alpha blending (FRExt-only) Each slice need not use all these tools. Depending upon the subset of these tools, a slice can be I, P, B, SP or SI. A picture may contain different slice types. -97 -

Slice I (Intra) P (Predicted) B (Bidirectionally predicted) (Reference for temporal prediction or non-reference) SP (Switching P) SI (Switching I) -98 -

I – Slice (MB in I slice and intra MB in P and B slices) Spatial intra prediction 9 directional modes for (4 x 4) or (8 x 8) blocks. Apply (4 x 4) or (8 x 8) Int. DCT to Intra prediction errors. Note (8 x 8) Int. DCT for FRExt-only. After (8 x 8) Int. DCT, HVS weighting is applied to coefficients (FRExt-only). -99 -

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC) PICAFF: Field processing similar to frame mode MBAFF: If MB pair in field mode (frame mode), field (frame) neighbors are used for spatial prediction. -100 -

I Slice (Spatial Prediction) • (16 x 16) Luma & Corresponding chroma block size for full MB prediction • (8 x 8) luma prediction (FRExt-only) • (4 x 4) Luma prediction -101 -

For (16 x 16) luma, full MB prediction has four modes • Vertical pels in MB predicted from pels just above of MB • Horizontal pels in MB predicted from pels just left of MB • DC pels in MB are predicted as average value of the neighboring pels • Planar Prediction Assume MB covers diagonally increasing luma values. Predictor is formed based upon the planar equation. -102 -

Chroma spatial prediction (operates on entire MB) • 4: 2: 0 (8 x 8) Similar to (16 x 16) Luma MB prediction • 4: 2: 2 (8 x 16) Vertical, Horizontal, DC, Planar • 4: 4: 4 (16 x 16) -103 -

FRExt Only For (8 x 8) luma intra prediction Nine Intra_8 x 8 prediction modes similar to the nine modes for Intra_4 x 4 -104 -

FRExt Only Integer 8 x 8 Transform (luma only) -105 -

FRExt Only HVS Weighting Matrices § Matrix can be transmitted in SPS and PPS § Separate Matrix for 4 x 4 and 8 x 8 transforms Default matrices § Separate Matrix for Inter and Intra Encoder can design and use customized scaling matrices. These are to be sent to the decoder at the sequence or picture level. -106 -

HVS Weighting Matrices ▣ Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization. (This itself is a multiplication) Weighting matrices can be customized separately for 4 x 4 Intra Y 4 x 4 Intra Cb, Cr 4 x 4 Inter Y 4 x 4 Inter Cb, Cr 8 x 8 Intra Y 8 x 8 Inter Y -107 -

FRExt Only Two scans similar to 4 x 4 transform switched for frame/field coding Coefficient scanning is based on the decreasing variances and to maximize number of zero-valued coefficients along the scan Frame Zig-Zag Field -108 -

Examples of parameters to be encoded Parameters Description Sequence, picture and slice-layer syntax elements Headers and parameters Macroblock type mb_type Prediction method for each coded macroblock Coded block pattern Indicates which blocks within a macroblock contain coded coefficients Quantiser parameter Transmitted as a delta value from the previous value of QP Reference frame index Identify reference frame(s) for inter prediction Motion vector Transmitted as a difference (mvd) from predicted motion vector Residual data Coefficient data for each 4 x 4 or 2 x 2 block -109 -

Exponential Golomb Codes (for data elements other than transform coefficients – these codes are actually fixed, and are also called Universal Variable Length Codes (UVLC)) -110 -

These are variable length codes with a regular construction [ M Zeros] [ 1 ] [ INFO ] INFO is an M-bit field carrying information. The first codeword has no leading zero or trailing INFO. Code words 1 and 2 have a single-bit INFO field, code words 3 -6 have a two-bit INFO field and so on. The length of each Exp-Golomb codeword is (2 M + 1) bits. M = Floor(log 2 [ code_num + 1 ]) INFO = code_num + 1 – 2 M -111 -

Decoding 1. 2. 3. Read in M leading zeros followed by 1 Read M-bit INFO field Code_num = 2 M + INFO – 1 CAVLC: Codes transform coefficients CABAC: Code transform coefficients and MV All other syntax elements are coded with the Exp_Golomb codes -112 -

ADOPTIONS ♦ DVD Forum: High Profile is mandatory for HD DVD players. ♦ The BD-ROM Video specification of the Blu-ray Disc Association: FRExtentions are mandatory. ♦ The DVB (digital video broadcast) standards for European broadcast television. For SD main is mandatory and high is optional. For HD High is mandatory. ATSC has preliminarily selected high profile. Several other environments may soon embrace it as well in the U. S. and various designs for satellite and cable television. -113 -

For applications such as content-contribution, content-distribution, and studio editing and postprocessing: ▣ Use more than 8 bits per sample of source video accuracy ▣ Use higher resolution for color representation than what is typical in consumer applications (i. e. , 4: 2: 2 or 4: 4: 4 sampling as opposed to 4: 2: 0 chroma sampling format) Perform source editing functions such as alpha blending (a process for blending of multiple video scenes, best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene) -114 -

▣ Use very high bit rates ▣ Use very high resolution ▣ Achieve very high fidelity – even representing some parts of the video losslessly ▣ Avoid color-space transformation rounding error ▣ Use RGB color representation -115 -

High Profiles ♦ High profile (HP), supporting 8 -bit video with 4: 2: 0 sampling, addressing high-end consumer use and other applications using high-resolution video without a need for extended chroma formats or extended sample accuracy ♦ High 10 profile (Hi 10 P), supporting 4: 2: 0 video with up to 10 bits of representation accuracy per sample ♦ High 4: 2: 2 profile (H 422 P), supporting up to 4: 2: 2 chroma sampling and up to 10 bits per sample, and -116 -

♦ High 4: 4: 4 profile (H 444 P), supporting up to 4: 4: 4 chroma sampling, up to 12 bits per sample, and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error All of these profiles support all features of the Main profile, and additionally support an adaptive transform block size and perceptual quantization scaling matrices. -117 -

FRExt Only 16 8 8 16 4: 2: 2 MB 16 16 4: 4: 4 MB Y Cb Cr MB structure in 4: 2: 2 and 4: 4: 4 formats -118 -

RGB Y Cb Cr ▣ Y = KR * R + (1 – KR – KB) * G + KB * B ▣ KR = 0. 2126; KB = 0. 0722; KR + K B + K G = 1 ▣ Y = 0. 2126 R + 0. 7152 G + 0. 0722 B ▣ Cb = 0. 5389 (B – Y) ; Cr = 0. 7874 (R – Y) ▣ (ITU-R Rec. BT. 601 defines KB=0. 114, KR=0. 299) -119 -

Rounding error in RGB Y Cb Cr ▣ FRExt Only : YCg. Co Cg = Green Chroma ; Co = Orange Chroma To further avoid any rounding error, add only one bit of precision to chroma samples -120 -

In 4: 4: 4 video, FRExt has residual color transform. Keep RGB domain (same depth) for input, output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only. Eliminates color-space conversion error without significantly increasing the overall complexity of the system. -121 -

Forward color space conversion ▣ Co = (R - B) ▣ t = B + (Co >> 1) ▣ Cg = G – t ▣ Y = t + (Cg >> 1) Where t is an intermediate temporary variable and “>>” denotes an arithmetic right shift operation. Inverse color space conversion ▣ t = Y – (Cg >> 1) ▣ G + t + Cg ▣ B = t – (Co >> 1) ▣ R = B + Co -122 -

SEI : Supplemental Enhancement Information Auxiliary pictures, which are extra monochrome pictures sent along with the main video stream, and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI). Film grain characteristics SEI, which allow a model of film grain statistics to be sent along with the video data, enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding, rather than burdening the encoder with the representation of exact film grain during the encoding process. -123 -

Deblocking filter display preference SEI, which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures. Stereo video SEI indicators, which allow the encoder to identify the use of the video on stereoscopic displays, with proper identification of which pictures are intended for viewing by each eye. -124 -

New Profiles in the H. 264/AVC FRExt Amendment ▣ ‘Higher’ profile supports all capabilities of the lower ones ▣ Also capable of decoding all bit streams encoded for the lower nested profiles All high profiles support all features of the main profile -125 -

Levels in H. 264/AVC Level 1 b added in FRExt. For some 3 G wireless environments -126 -

Levels in H. 264/AVC 1. If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 frames/sec 2. Horizontal and vertical maximum sizes cannot be more than sqrt[(Total # of pixels/frame)x 8] 3. If at a given level, picture size is less than that in the table, # of reference frames for ME and MC can be up to 16. -127 -

Compressed Bit Rate Multipliers for FRExt Profiles Multipliers for fourth column of table in page 125 To meet more demanding high fidelity applications -128 -

24 Frames/sec film 1920 x 1080 progressive ♦ The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps) ♦ The High profile of FRExt produced nominally transparent (i. e. , difficult to distinguish from the original video without compression) video quality at only 16 Mbps. [9] T. Wedi, Y. Kashiwagi, “Subjective quality evaluation of H. 264/AVC FRExt for HD movie content”, JVT document JVT-L 033, July 2004. -129 -

Courtesy: Advanced Technology Group of Motorola BCS -130 -

Courtesy: Advanced Technology Group of Motorola BCS -131 -

MP 4 ASP yields 1. 5 coding gain over MPEG-2. MPEG-4 AVC yields 2. 0 coding gain over MPEG-2. Fig. 7: (a) – (e) Comparison of R-D curves for MPEG-2 (MP 2), MPEG-4 ASP (MP 4 ASP) and H. 264/AVC (MP 4 AVC). I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3). Courtesy: Advanced Technology Group of Motorola BCS -132 -

▣ High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps ▣ Nominally transparent video quality on 1080 p 24 at 16 Mbps -133 -

(Fast VDO) Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt, thus the remark in the figure about potential future performance -134 -

High Profile Details: Deblocking Filter, CABAC, Signaling ▣ Deblocking Filter: • Only control of filter is adjusted: do not filter 4 x 4 blocks • No change to filter operation itself ▣ CABAC: • 61 new contexts and corresponding initialization values • No change to CABAC engine ▣ Signaling: • 8 x 8 transform on/off flag at PPS level • 8 x 8 transform on/off flag per macroblock allows adaptive use -135 -

High vs. Main Profile Summary ▣ High Profile contains: ◈ Main profile ◈ Adaptive MB level switching between 8 x 8 and 4 x 4 transform block sizes. ◈ Encoder specified perceptual based quantization scaling matrices ◈ Encoder specified separate control of each chroma component QP ▣ Coding efficiency impact (measured as average bit-rate reduction): ◈ HD Film: 12% ◈ HD Video (progressive): 12% ◈ HD Video (interlace): 4% (only 2 test clips) ◈ SD Video (interlace): 6% ▣ Complexity impact: ◈ Implementation beyond Main Profile affects Intra prediction, transform, deblocking filter control, CABAC decoding ◈ No increase in computational requirements ◈ Slight increase in memory requirements (CABAC, transform) -136 -

Licensing of H. 264/AVC Technology Two patent pools to obtain the license 1. MPEGLA www. mpegla. com 2. Via licensing www. vialicensing. com These two patent pools do not guarantee that they cover the entire technology of H. 264 as participation of a patent owner in a patent pool is voluntary. -137 -

AUDIO coding & systems ▣ H. 264 is limited to video ▣ Audio coder: Bit rates, Quality levels and # of channels – left to industry and standards groups (ATSC, SCTE, ARIB, DVB etc. ) ▣ DVB is considering AAC with SBR (AAC plus) ▣ ATSC has selected AC-3 plus from Dolby ▣ MPEG calls it HE-AAC (HE – High efficiency) ▣ ATSC, SCTE, ARIB, MPEG etc. will continue to use MPEG -1 Audio, MPEG-2, AAC and AC-3. -138 -