d71c83a2fcc7eb65c3fcfe30deff7e5e.ppt
- Количество слайдов: 92
Outline n n Introduction on Multimedia Coding Motion Estimation Discrete Cosine Transform Video Coding Standards
Multimedia Concepts n What is multimedia? ¡ ¡ n Combination of audio, video, image, graphic, and text. Coverage of all human I/O’s. Why does multimedia need to be coded?
Multimedia Coding for Different Applications n Mobile devices ¡ n Streaming service ¡ n Scalability, low to medium data-range, interactivity On-disk distribution (DVD) ¡ n Low data-rate, error resilience, scalability Interactivity Broadcast ¡ On-demand services
System Architecture Media aware Delivery unaware Media aware Delivery aware Media unaware Delivery aware Compression Layer Streams from as low as bps to Mbps System Layer Manages Elementary Streams, their synchronization and hierarchical relations Delivery Layer Provides transparent access and delivery of content irrespective of delivery technologies
Coding of Audiovisual Objects n n n Audiovisual scene is with “objects” Mixed different objects on the screen Visual ¡ ¡ n Video Animated face & body; 2 D and 3 D animated meshes Text and Graphics Audio ¡ ¡ General audio – mono, stereo, and multichannel Speech Synthetic sounds (“Structured audio”) Environmental spatialization
Example of MPEG-4 Video Objects Rectangular shape video object Arbitrary shape video object Animated Face From Olivier Avaro
The Scene Tree
1. 2. 3. 4. 5. Composition Description & Synchronization Delivery of streaming data Interaction with media objects Management and identification of intellectual property
Major Components
Media Objects Scene Graph Composition Rendering
Adding or Removing Objects (1) – = +
Adding or Removing Objects (2) From Igor S. Pandžić
Adding or Removing Objects (3) n Applications ¡ Video conferencing n n ¡ Real-time, automatic Separate foreground (communication partner) from background Object tracking in video n n May allow off-line and semi-automatic Separate moving object from others
Coding Techniques n Video objects ¡ ¡ ¡ n Audio objects ¡ ¡ ¡ n MPEG AAC (Advanced Audio Coder) TTS (Text-To-Speech) Face and Body ¡ n Shape Motion vectors texture Animation parameters 2 D Mesh ¡ ¡ Triangular patches Motion vector
Encoding of Visual Objects n Binary alpha block ¡ ¡ n Motion vector Context-based arithmetic encoding Texture ¡ ¡ Motion vector DCT
Natural Audio Coder Quality General audio (AAC, Twin. VQ) CD FM AM Parametric audio (HILN) Parametric speech (HVXC) Telephone High quality speech (CELP) Cellular 2 4 8 16 32 64 kbit/s From Olivier Dechazal
Facial Animation From Eine Übersicht
Object Mesh n n Useful for animation, content manipulation, content overlay, merging natural and synthetic video. . . Tessellate with triangular paths
Sprite Coding n n Represent background image with a larger size than that of image. Useful for camera motion
Multiview Video
Outline n n Introduction on Multimedia Coding Discrete Cosine Transform Motion Estimation Video Coding Standards
Outline n n What Is DCT And Why Use DCT How to Compute DCT Program The DCT Conclusion
An Image-Transform Coding System Input samples Forward transform ÷ 10 e. g. zip, RAR Huffman coding Quantizer Binary encoder Binary decoder Network Inverse transform × 10 Inverse quantizer Output samples
Introduction(1/5) – Representation of An Image n How to code an image ? 1. 2. n Spatial domain (pixel-based) Transform domain Transformation methods ¡ KLT , DFT , DWT , DCT. . .
Introduction(2/5) – Why Use DCT? Properties of DCT n n n Use cosine function as its basis function Performance approaches KLT Fast algorithm exists Most popular in image compression application Adopted in JPEG, M-JPEG, MPEG, H. 26 x
Introduction (3/5) - Does Transform Really Make Sense ? l Energy compaction l De-correlation: dependency elimination
Introduction (4/5) - Examples 139 148 150 149 155 164 165 168 98 115 130 135 143 146 142 147 89 110 125 128 129 121 104 106 96 116 128 132 134 132 113 109 111 125 127 131 137 120 110 122 126 131 133 131 126 112 133 134 136 138 140 144 141 139 138 139 139 140 146 148 147 8 8
Introduction (5/5) - Examples A pixel expressed by it’s value The coefficient of the basis vector (0, 0) DCT IDCT Pixel values in spatial domain DCT coefficients in transform domain
Definition of Basis Function n. Basis n. For function of the 1 -D N-point DCT N=8
Basic diagram of DCT Discrete cosine transform and Inverse DCT (1) (2)
The basis of 2 D-DCT with 8 x 8 block
Again – Do You Know What DCT Mean? A pixel expressed by it’s value The coefficient of the basis vector (0, 0) DCT IDCT Pixel values in spatial domain DCT coefficients in transform domain
How to Compute: 1 -D VS. 2 -D n n [1 -D] For a M × N 2 D-block, we can use 1 D N-point DCT in the row direction, then the 1 -D M-point DCT in the column direction to get the 2 D-DCT [2 -D] If 8 × 8 blocks are applied, the 2 D -DCT will be
DCT matrix is orthonormal n n The above equation is zero if u≠v orthorgonal The basis vector of DCT has unit norm According the above two , we know DCT matrix is orthonormal The same is applied to 2 D-DCT
Properties of Orthonormal n Energy can be conservation n Transform matrix can be refractor separable
Energy conservation of orthonormal transform n n
Separable Transform (1/2) n n n
Separable Transform (2/2)
Fast DCT algorithm (1/2)
Fast DCT algorithm (2/2)
How to program (1/3) - Normal form /**************************************/ /*2 D N*N DCT */ /*Input */ /*int arg. Source[N][N]:One block in the original image /*Output /*float arg. DCT[N][N]:The block in frequency domain corresponding to arg. Source[M][N] */ /**************************************/ void DCT(int arg. DCT[8][8] , int arg. Source[8][8]) { float C[8], Cos[8][8]; float temp; int i, j, u, v; for(i=0; i<8; i++) for(j=0; j<8; j++) Cos[i][j]=cos((2*i+1)*j*PI/16); C[0]=0. 35355339; for(i=1; i<8; i++) C[i]=0. 5; } for(u=0; u<8; u++) for(v=0; v<8; v++) { temp=0. 0; for(i=0; i<8; i++) for(j=0; j<8; j++) temp+=Cos[i][u]*Cos[j][v]*(arg. Source[i][j]-128); temp*=C[u]*C[v]; arg. DCT[u][v]=temp; } */ */
How to program (2/3) - Fast algorithm -1 /**************************************/ /*2 D N*N DCT */ /*Input */ /*int arg. Source[N][N]:One block in the original image /*Output /*float arg. DCT[N][N]:The block in frequency domain corresponding to arg. Source[M][N] */ /**************************************/ void DCT(int arg. DCT[8][8] , int arg. Source[8][8]) { float temp[8][8], temp 1; int i, j, k; for(i=0; i<8; i++) for(j=0; j<8; j++) { temp[i][j] = 0. 0; for(k=0; k<8; k++) temp[i][j] +=((int) arg. Source[i][k]-128)*Ct[k][j]; } for(i=0; u<8; u++) for(j=0; v<8; v++) { temp 1=0. 0; for(k=0; k<8; k++) temp 1+ =C[i][k] * temp[k][j]; } } arg. DCT[i][j]=ROUND(temp 1); */ */
How to program (3/3) - Algorithm suitable for hardware implement #include
Conclusion n n DCT provides a new method to express an image with the properties of the image The fast algorithm provided for hardware implement is possible.
Outline n n Introduction on Multimedia Coding Motion Estimation Discrete Cosine Transform Video Coding Standards
Outline n n n n What are motions in videos The importance of motions Motion representation How to find the motion of a block Block matching Residual Fast block matching algorithm Intra frame and inter frame
Motions in Video Clips n Local motions n Global motions Background
The Importance of Motions n Compress one frame independently ¡ Each pixel has to be compressed. n n DCT Quantization Binary coding Compress one frame depending on the previous frame. ¡ ¡ Background can be ignored. Only compress moving objects and new objects
Example 1. Compress and in frame 1. 2. Compress the motion of in remaining frames. ¡ 1 Direction and magnitude 2 3 4
Motion Representation n Use arrows to represent motions of objects. Region-based Pixel-based Global Block-based
How to Find The Motion of A Block? n Block matching Occlusion Frame i-1 Motion vector Reference frame (existed) Frame i matched Current frame (to be encoded)
Block Matching (1) current
Block Matching (2) n Compare the difference between two blocks. (one is in the current frame, and the other is in the reference frame) | Candidate block | - Current block p p = 1, sum of absolute difference p = 2, mean square error
Block Matching (3) n Integer pixel shift Search range Minimum MSE Block Measurement window is compared with a shifted array of pixels in the other frame, to determine the best match Rectangular array of pixels is selected as a measurement window
Residual (1) occlusion motion Residual
Residual (2) Residual only n Encoder (DCT Quantization Binary coding) ② Residual ③ DCT + Q i. DCT + i. Q ④ Motion Compensation ① MV = (dx, dy) Previous Frame Buffer
Residual (3) n Decoder Residual MV Coded Bitstream Reconstructed frame IDCT VLD Motion Compensation Previous Frame memory
Block Matching Algorithm - Full Search Method 15 15
Block Matching Algorithm Three Step Method
Block Matching Algorithm - Four Step Method IEEE Transation On Video Technology And Circuit System, June, 1996
Block Matching Algorithm Diamond Method
Fractional pixel accuracy n Fractional pixel accuracy ¡ e. g. half-pixel accuracy Integer pixel half pixel (dx, dy) = (1. 5, 1) H. 263, Foreman, QCIF SKIP=2, Q=4, 5, 7, 10, 15, 25
Encode A Frame with Motions n Intra frame (I-frame) ¡ n Encoded/decoded without using motion information. Inter frame ¡ Encoded/decoded using motion information. n n n Prediction frame (P-frame) Bi-directional prediction frame (B-frame) Group of pictures (GOP) ¡ Starting with an I-frame, followed a serious of inter frames. n n Random access Prevention of error propagation . . . Intra Inter GOP Inter Intra
I-Frame, P-Frame, and BFrame n P-frame ¡ n B-frame ¡ ¡ n Find motions from the previous I- or P-frame. Find motions from both previous and following I- and P-frame or P- and P -frame. Some objects may be found only at the following frame. Encoding order ¡ 1423756 1 2 3 4 5 6 7 I B B P
Video Encoder P-frame I-frame Frame input - DCT residual Q VLC Motion Compensation MV Motion Estmation MV Previous frame IDCT Frame memory Clipping Bitstream
Video Decoder I-frame P-frame Coded Bitstream Reconstructed frame IDCT VLC Motion Compensation Previous Frame memory
Outline n n Introduction on Multimedia Coding Motion Estimation Discrete Cosine Transform Video Coding Standards
The Scope of Video Coding Standardization n Only restrictions on the Bitstream, Syntax, and Decoder are standardized: ¡ ¡ ¡ Permits the optimization of encoding Permits complexity reduction for implementability Provides no guarantees on quality
Standards and Applications
International Telecommunication Union – Telecommunication Standardization (ITU-T) n H. 261 ¡ ¡ ¡ Videophone and video conferencing p x 64 kbps (p = 1. . . 30) Still in use n n H. 263 ¡ ¡ n Low complexity, low latency Mostly as a backward-compatibility feature Overtaken by H. 263 PSTN and mobile network: 10 to 24 kbps 1994: H. 263, H. 263+ H. 264 ¡ Double the coding efficiency in comparison to any other existing video coding standards
MPEG: Moving Picture Experts Group n n n MPEG-1: CD-i, (VOD trials), . . . MPEG-2: . . . + TV, HDTV MPEG-3: HDTV, merged into MPEG-2 MPEG-4: Coding of Audiovisual Objects MPEG-7: MM Description Interface MPEG-21: Digital Multimedia Framework
Chronological Table of Video Coding Standards ITU-T H. 263 (1995/96 ) VCEG H. 261 (1990) H. 263+ (1997/98) MPEG-2 (H. 262) (1994/95 ) ISO/IEC MPEG 1992 1994 H. 264 ( MPEG-4 Part 10 ) (2002) MPEG-4 v 1 (1998/99) MPEG-4 v 2 (1999/00) MPEG-4 v 3 (2001) MPEG-1 (1993) 1990 H. 263++ (2000) 1996 1998 2000 2002 2003
H. 261: The Basis of Modern Video Compression n n The first widespread practical success Video Format: ¡ ¡ n n CIF (352 x 288, above 128 Kbps) QCIF (176 x 144, 64 - 128 Kbps) Operated at 64 -2048 Kbps (p 64 Kbps) Still in use ¡ ¡ ¡ Low complexity, low latency Mostly as a backward-compatibility feature Overtaken by H. 263
MPEG-1: For Storage n n Five parts: System, Visual, Audio, Conformance, Reference Software Applications: VCD, VOD, Digital Camera ¡ n n n Maximum: 1. 856 mbps, 768 x 576 pels Superior quality to H. 261 when operated at higher bit rates (≥ 1 Mbps for CIF 352 x 288 resolution) Provides approximately VHS quality between 1 -2 Mbps using SIF 352 x 240/288 resolution Technical features: Adds bi-directional motion prediction and half-pixel motion to H. 261 design Use is fairly widespread, but mostly overtaken by MPEG-2 MP 3 = MPEG-1 layer 3 audio
MPEG-2 / H. 262: High Bit Rate, High Quality n n n n MPEG-2 Visual = H. 262 Not especially useful below 2 Mbps (range of use normally 2 -20 Mbps) Applications: SDTV (2 -5 Mbps), DVD (68 Mbps), HDTV (20 Mbps), VOD Support for interlaced scan pictures PSNR, temporal, and spatial scalability Consist of various “Profile” and “Level” MPEG-2 audio ¡ ¡ Support 5. 1 channel MPEG-2 AAC: requires 30% fewer bits than MP 3
H. 263: The Next Generation n Goal: Improved quality at lower rates n Has overtaken H. 261 as dominant video-conferencing codec n Superior to H. 261 at all bit rates n Significantly better quality at lower rates ¡ Better video at 18 -24 Kbps than H. 261 at 64 Kbps ¡ Enable video phone over regular phone lines (28. 8 Kbps) or wireless modem n H. 263+ (1998): supports all bit rates, more options n H. 263++ (2000): more options, emphasizing on error resilience and scalability
MPEG-4: H. 263 + Additions + Variable Shape Coding n n n n Goal: Support for interactive multimedia Visual Object (AO), Audio Object (AO) and AVO 18 video coding profiles Roughly follows H. 263 design and adds all prior features and (most important) shape coding Includes zero-tree wavelet coding of still textured pictures, segmented coding of shapes, coding of synthetic content 2 D & 3 D mesh coding, face animation modeling 10 -bit and 12 -bit video Contains 9 parts. Part 10 will be H. 264
A Note on Terminology of H. 264 n The following terms are used interchangeably: ¡ ¡ ¡ n H. 26 L The Work of the JVT or “JVT CODEC” JM 2. x, JM 3. x, JM 4. x The Thing Beyond H. 26 L The “AVC” or Advanced Video CODE Proper Terminology going forward: ¡ ¡ MPEG-4 Part 10 (Official MPEG Term) n ISO/IEC 14496 -10 AVC H. 264 (Official ITU Term)
Position of H. 264
New Features of H. 264 n n n n n Multi-mode, multi-reference MC Motion vector can point out of image border 1/4 -, 1/8 -pixel motion vector precision B-frame prediction weighting 4 4 integer transform Multi-mode intra-prediction In-loop de-blocking filter UVLC (Uniform Variable Length Coding) NAL (Network Abstraction Layer) SP-slices
Profiles and Levels n Profiles: Baseline, Main, and X ¡ ¡ ¡ n Baseline profile is the minimum implementation ¡ n Baseline: Progressive, Videoconferencing & Wireless Main: esp. Broadcast Extended: Mobile network No CABAC, 1/8 MC, B-frame, SP-slices 15 levels ¡ ¡ ¡ Resolution, capability, bit rate, buffer, reference # Built to match popular international production and emission formats From QCIF to D-Cinema
Variable Block Sizes 16 x 16 MB Types 0 8 x 8 Types 0 16 x 8 0 1 8 x 4 0 1 8 x 16 0 1 4 x 8 0 1 Various block sizes and shapes 8 x 8 0 1 2 3 4 x 4 0 1 2 3
Multiple Reference Frames Multiple reference frames Frame to be encoded
Integer Transform (1) n 4 4 and 2 2 Integer transform ¡ Integer transform matrix -1 INTRA_16 16 H 2 H 1 Cb Y Cr DCT 17 16 H 1 0 1 4 5 2 3 6 7 8 9 12 13 10 11 14 15 Y 18 19 20 21 Cb H 1 22 23 24 25 Cr
In-loop De-blocking Filter n n Highly compressed decoded inter picture Significantly reduces prediction residuals Without filter with H. 264/AVC De-blocking
Comparison
Summary n n n Video coding is based on hybrid video coding and similar in spirit to other standards but with important differences New key features are: ¡ Enhanced motion compensation ¡ Small blocks for transform coding ¡ Improved de-blocking filter ¡ Enhanced entropy coding Substantial bit-rate savings (up to 50%) relative to other standards for the same quality Enhancement on perceptive quality seems better than that on PSNR The complexity of the encoder triples that of the prior ones The complexity of the decoder doubles that of the prior ones
Applications Examples n Shopping ¡ ¡ n “try-on” clothes. Decorate/furnish rooms. ¡ ¡ Multiple, simultaneous views. Player/game statistics. User-directed replay, freeze, etc. ¡ n Field maintenance ¡ ¡ Mobile audio-visual terminal Remote audio-visual access. Security monitoring ¡ Interact with sporting events. ¡ n n Networked video games. ¡ n Region-of-interest (ex : face) isolation, enhancement. Auto traffic, harbor traffic management. Actual users as player in the scene Sign language From Olivier Avaro
Reference Software n H. 264 ¡ n http: //iphome. hhi. de/suehring/tml/ MPEG-4 ¡ http: //www. xvid. org


