5affbceb369194a5f3e4f9b2e730a4f2.ppt
- Количество слайдов: 53
VIDEO COMPRESSION FUNDAMENTALS, part 2 Pamela C. Cosman 1
Extra flavors and refinements n Many different variations/improvements possible for motion compensation q q q Increased accuracy of motion vectors Unrestricted motion vectors Multiple frame prediction Variable sized blocks Motion compensation for objects 2
Accuracy of Motion Vectors n n n Digital images are sampled on a grid. What if the actual motion does not move in grid steps? Solution: interpolation of grid points in reference frame adds a half-pixel grid Reference frame effectively has 4 times as many positions for the best match block to be found A h v m C B D 3
Unrestricted Motion Vectors n Suppose the camera is panning to the left Lower Left Macroblock Reference Frame n n Current Frame Now consider the lower left macroblock in the current frame. What is the best match for it in the reference frame? 4
Unrestricted Motion Vectors n If the macroblock were allowed to hang over the edge, then the best match would be like this: Lower Left Macroblock Reference Frame n n Current Frame But then the motion vector is pointing outside the frame! The encoder and decoder can agree on some standard interpolation to deal with this case 5
Unrestricted Motion Vectors Reference Frame Current Frame n The edge pixels in the reference frame are just replicated outside the frame, for as many extra columns as necessary n In this way, a motion vector pointing outside the frame is acceptable. Can get better matches! 6
Arbitrary Multiple Reference Frames n n n In H. 261, the reference frame for prediction is always the previous frame In MPEG and H. 263, some frames are predicted from both the previous and the next frames (bi-prediction) In H. 264, any frame may be designated to be used as reference: q Encoder and decoder maintain synchronized buffers of available frames (previously decoded) q Reference frame is specified as index into this buffer 7
Multiple Frame Prediction n H. 264 allows multiple frames to be used as references 8
Some Advantages of Multiple References n n n If object leaves scene and then comes back, can have a reference for it in long term past Similarly, if the camera pans to the right, and then back to the left, then the scene that reappears has a reference If there’s an error, and the receiver sends feedback to say where the error is, then the encoder can use another reference frame q Helpful even if there’s no feedback 9
Variable Block-Size MC n Motivation: size of moving/stationary objects is variable q q n Many small blocks may take too many bits to encode Few large blocks give lousy prediction Choices: In H. 264, each 16 x 16 macroblock may be: q q Kept whole, or Divided horizontally (vertically) into two sub-blocks of size 16 x 8 (8 x 16) Divided into 4 sub-blocks (8 x 8) In the last case, the 4 sub-blocks may be divided once more into 2 or 4 smaller blocks. 10
H. 264 Variable Block Sizes 8 x 8 16 x 16 8 x 8 Tree-Structured Motion Compensation 16 x 8 8 x 16 16 x 16 8 x 8 16 x 8 8 x 4 8 x 8 4 x 4 11
Motion Scale Example T=1 T=2 12
H. 264 Variable Block Size Example T=1 T=2 13
Variable Output Rate n n q q n n Suppose the control parameters of a video encoder are kept constant: Quantization parameter Motion estimation search window size, etc. Then the # of coded bits per macroblock (and per frame) will vary Typically, more bits produced when there is high motion or fine detail Example: # of bits per frame varies from 1300 to 9000 n (32 -225 kbits per second) Bits per frame 9000 1000 0 Frame Number 200 14
Rate Control n n Streams are usually coded for target rates, for example, 3 Mbit/second How are bits allocated among frames? Macroblocks in I-frames are all intra coded Macroblocks in P/B frames can be coded as: q q Intra (DCT blocks) Motion vectors only Motion vectors and difference DCT blocks Nothing at all (skipped) 15
Rate Control n The frames will have differing numbers of bits n This variation in bit rate can be a problem for many practical delivery and storage mechanisms q q Constant bit rate channel (such as a circuit-switched channel) cannot transport a variable-bitrate data stream Even a packet-switched channel is limited by link rates and congestion at any point in time 16
Constant rate channel n The variable data rate produced by an encoder can be smoothed by buffering prior to transmission ENCODER Buffer Variable bit rate Constant rate output from encoder channel n n DECODER Variable bit rate input to decoder First In/First Out (FIFO) buffer at the output of the encoder; another one at the input to the decoder Emptied by the decoder at a variable rate 17
Decoder Buffer Contents n First frame decoded n n stall n n 0 1 2 3 4 seconds 7 8 9 Takes 0. 5 sec before first complete coded frame received Then, decoder can extract and decode frames at correct rate of 25 fps until… At about 4 sec, buffer empties, decoder stalls (pauses decoding) Problem: video clip freezes until more data arrives Partial solution: add deliberate delay at decoder (e. g. , 1 sec delay to decode frame 1, allow buffer to reach higher fullness) 18
Variable Bit Rate n Example shows that variable coded bit rate can be adapted to a constant bit rate delivery medium using buffers. This entails q q n n Cost of buffer storage space Delay Not possible to cope with arbitrary variation of bit rate using this method, unless buffer size and decoding delay allowed to get arbitrarily large. So… encoder needs to keep track of buffer fullness… 19
Rate Control n n Goal: with the transmission system at the target rate for the video sequence, the encoder & decoder buffers of fixed size never overflow or underflow This is the problem of rate control MPEG does not specify how to achieve this In addition to preventing overflow/underflow, the rate control algorithm should also make the sequence look good 20
Choice of Rate Control Algorithm Choice of rate control depends on application 1) Offline encoding of video for DVD storage n q q q Processing time not a constraint Complex algorithm can be employed Two-pass encoding: n n q Encoder collects statistics about the video in the 1 st pass Encoder encodes the video on the 2 nd pass Goal is to “fit” the video on the DVD while: n n maximizing the overall quality of the video preventing buffer overflow or underflow during decoding 21
Choice of Rate Control n 2) Encoding of live video for broadcast q q q One encoder and multiple decoders Decoder processing and buffering are limited Encoder may use expensive fast hardware Delay of a few seconds usually OK Medium-complexity rate-control algorithm Perhaps two-pass encoding of each frame 22
Choice of Rate Control n 3) Encoding for two-way videoconferencing q q q Each terminal does both encoding and decoding Delay must be kept to a minimum (say <0. 5 sec) Low-complexity rate control Buffering minimized to keep delay small Encoder must tightly control output rate This may cause the output quality to vary significantly, e. g. , may drop when there is increased movement or detail in the scene 23
Rate Control n n Various possible approaches to rate control For example, calculate a target bit rate Ri for a frame based on q q The number of frames in the group of pictures The number of bits available for the remaining frames in the group The maximum acceptable buffer size contents The estimated complexity of the frame 24
Rate Control: Example Algorithm n Let S be the mean absolute value of the difference frame after motion compensation (a measure of frame complexity) n n n Calculate S for the frame Compute the quantizer step size Q using the model Encode the current frame using parameter Q Update the model parameters X 1 and X 2 based on the actual number of bits generated for the frame There also macroblock-level rate control algorithms when “tight” rate control is needed 25
Standards n n n n Standards Groups (MPEG, VCEG) H. 261: Videophone/videoconferencing (1990) MPEG-1: Low bit rates for dig. storage (1992) MPEG-2: Generic coding algorithms (1994) H. 263: Very low bit rate coding (1995) MPEG-4: Flexibility and computer vision approaches (1998) H. 264: Recent improvements (2003) 26
Advantages/Disadvantages n Disadvantages of standardization: q q q Improvements in price and performance come from battle to create and own proprietary approach Proprietary codecs generally exhibit higher quality than a standard Standards are slow moving, developed by committee, try to avoid patents n Advantages of standardization: q q q Interoperability Different platforms supported Vendors can compete for improved implementations Worldwide technical community can build on each other’s work Several standards have been hugely successful 27
H. 261: real-time, low complexity, low delay n n Motivated by the definition and planned deployment of ISDN (Integrated Services Digital Network) Rate of p*64 kbits/s where p is integer 1… 30 For example, p=2→ 128 kbits/s with video coding at 112 kbits/s and audio at 16 kbits/s Applications: videophone, videoconferencing n Videoconferencing compression: q q Operate in real time Not much coding delay Low complexity No particular advantage to shifting the complexity onto encoder or decoder (each user will require both encoding and decoding capabilities) 28
H. 261 Basics n n Standardization started 1984, finished 1990 Uncompressed CIF (4: 2: 0 chrom. sampling, 15 frames per sec. ) requires 18. 3 Mbps To get this down to p x 64 Kbps requires 10: 1 up to 300: 1 compression H. 261 achieves compression using the same basic elements discussed before: q q q Motion compensation (for temporal redundancy) DCT + Quantization (for spatial redundancy) Variable length coding (run-length, Huffman) 29
H. 261 Motion Compensation n n Motion compensation done on macroblocks of size 16 x 16, same as MPEG-1 and -2 However, consider application fields: videoconferencing, videophone q q q n A call is set up, conducted, and terminated. These events always occur together, in sequence Don’t need random access into the video Need low delay Also, expect slow-moving objects Question: What features should these facts lead to? 30
H. 261 Motion Estimation n Slow movement: For each block of pixels in the current frame, the search window is only ± 15 pixels in each direction 15 15 T=1 (previous frame) T=2 (current frame) 31
H. 261 Motion Compensation n No B pictures: don’t want the delay or complexity associated with them H. 261 uses forward motion compensation from the previous picture only First frame is Intra-frame. NO frame after that has to be Intra. Every subsequent frame may use prediction from the one before q This means that to decode a particular frame in the sequence, it is possible that we will have to decode from the very beginning. No random access. 32
ISO MPEG n Originally set up in 1988, committee had 3 work items: n MPEG-1: targeted at 1. 5 Mbps q MPEG-2: targeted at 10 Mbps q MPEG-3: targeted at 40 Mbps Later, became clear that algorithms developed for MPEG-2 would accommodate higher rates, so 3 rd work item dropped Later MPEG-4 added n Goals: q n q q MPEG-1: compression of video/audio for CD playback MPEG-2: storage and broadcast of TV-quality audio and video MPEG-4: coding of audio-visual objects Also MPEG-7 and MPEG-21 which are about multimedia content and not compression 33
MPEG-1 Audiovisual coder for digital storage media n n Goal: Coding full-motion video & associated audio at bit rates up to about 1. 5 Mbps Brief history of MPEG-1 q q q n October 1988: working group formed September 1989: 14 proposals made October 1989: video subjective tests performed March 1990: simulation model November 1992: international standard Solution to a specific problem: q Compress an audio-video source (~210 Mbps) to fit into a CD-ROM originally designed to handle uncompressed audio alone (requires aggressive compression 200: 1) 34
MPEG-1 major differences n Unlike videoconferencing, for digital storage media, random access capability is important q q INTRA frames In order to avoid a long delay between the frame a user is looking for, and the frame where decoding starts, INTRA frames should occur frequently But then the coding efficiency goes down Improve compression efficiency using B frames 35
B Frames n n n Bidirectionally predicted blocks allows effective prediction of uncovered background Bidirectional prediction can reduce noise (if good predictions available both past and future) B pictures not used for prediction→ substantial reduction in bits (I: P: B 5: 3: 1) B pictures – forward, backward, & interpolatively motion compensated from previous/next I/P frames n Increases motion estimation complexity in 2 ways: n n Search 2 frames Search bigger window if anchor frame farther away 36
MPEG-2 Generic Coding Algorithms n n Goal: digital video transmission in range 2 -15 Mbps Generic coding algorithms to support: q n Digital storage media, existing TV (PAL, SECAM, NTSC), cable, direct broadcast satellite, HDTV, computer graphics, video games Brief history: q q July 1990: working group established Nov 1991: Subjective tests on 32 proposals March 1993: technical contents of main level frozen Nov 1994: international standard (parts 1 -3) 37
Main differences MPEG-1 and -2 n MPEG-2 aimed at higher bit rates q n MPEG-2 has a wider range of bit rates q n Tool kit approach allows use of different subsets of algorithms MPEG-2 supports scalable coding q n Can be used for larger picture formats SNR scalable, spatially scalable MPEG-2 supports interlacing q This permeates everything: motion compensation, DCTs, Zig. Zag ordering for variable length coding 38
Overview of MPEG-4 Visual n MPEG-4 Visual is meant to handle many types of data, including q q q Moving video (rectangular frames) Video objects (arbitrary-shaped regions of moving video) 2 D and 3 D mesh objects (representing deformable objects) Animated human faces and bodies Static texture (still images) 39
Video Objects n n n MPEG-4 moves away from traditional view of video as a sequence of rectangular frames Instead, collection of video objects A video object is a flexible entity that a user can access (seek, browse) and manipulate (cut, paste) A video object (VO) is an arbitrarily-shaped area of scene that may exist for an arbitrary length of time An instance of a VO at a particular time is called a video object plane (VOP) Definition encompasses traditional view of rectangular frames too 40
MPEG-4: Object-based motion compensation T=1 T=2 41
Static Sprite Coding n n Background may be coded as a static sprite The sprite may be much larger than the visible area of the scene Source: http: //mpeg. telecomitalialab. com/standards /mpeg-4. htm 42
Global Motion Compensation n The encoder sends up to 4 global motion vectors (GMVs) for each VOP together with the location of each GMV in the VOP For each pixel position, an individual MV is calculated by interpolating between the GMVs and the pixel position is motion compensated according to this interpolated vector GMVs and interpolated vector n GMC compensating for rotation n GMC compensating for camera zoom 43
Global Motion Estimation between 2 images assuming 2 d affine motion Compression example: error images before and after global motion compensation (Soccer sequence: global motion estimation between 1 st and 10 th frame) 44
Coding Synthetic Visual Scenes n Animated 2 D mesh coding q q A 2 -D mesh is made up of triangular patches Deformation or motion can be modelled by warping the triangles Surface texture may be compressed as static texture Mesh and texture information might both be transmitted for key frames q q q No texture transmitted for intermediate frames Mesh parameters transmitted Decoder animates mesh 45
Motion Vectors for Meshes n n n A mesh is warped by transmitting vectors which displace the nodes Mesh MVs are predictively coded Texture residual can be coded with a very small number of bits n MPEG-4 also allows 3 -D meshes n The vertices need not be in one plane n 3 -D mesh samples the surface of a solid body 46
Shape-Adaptive DCT n The shape-adaptive DCT uses one-dimensional DCT, where the number of points in the transform matches the number of opaque values in each column (or row) Shift vertically 1 -D column DCT Final coefficients Shift horizontally 1 -D row DCT More complex than normal 8 x 8 DCT, but improves coding efficiency for boundary MBs 47
Face and Body Animation n Two basic steps: q q n Define basic shape of face or body model (typically carried out once at start of session) Send animation parameters to animate the model Encoder has choice of q q Generic facial definition parameters (FDPs) Custom FDPs for a specific face n In similar way, a body object is rendered from a set of Body Definition Parameters (BDPs) and animated using Body Animation Parameters 48
Face Animation n n The generic face can be modified by Facial Definition Parameters (FDPs) into a particular face FDP decoder creates a neutral face: one which carries no expression Change expressions by moving the vertices Not necessary to transmit data for each vertex, instead use Facial Animation Parameters (FAPs) Some combinations of vectors are common in expressions such as a smile, so these are coded as visemes q q n Can be used alone Can be used as predictions for more accurate FAPs Resulting data rate is small, e. g. , 2 -3 kbps 49
H. 264 Brief history n n The work started in VCEG (in 1998) as a parallel activity with the final version of H. 263 First test model produced in 1999. Many small steps over the next 4 years: q q n n Many tweaks to the integer transform and to the variable block size 1/8 pixel accurate MVs added in and then dropped Many tweaks on the deblocking filter Etc. etc. Final version March 2003 Final results: 2 -fold improvement in compression (compared to H. 263 and MPEG-2) & significantly better than MPEG-4 ASP 50
Comparison of H. 264 and MPEG-4 Comparison MPEG-4 Visual H. 264 Supported data types Rectangular video frames and fields, arbitrary-shaped objects, still texture and sprites, synthetic objects, 2 D and 3 D mesh objects Rectangular video frames and fields # profiles 19 3 Compression efficiency medium high Support for video streaming Scalable coding Switching slices Motion comp. min block size 8 x 8 4 x 4 MV accuracy ½ or ¼ pixel Transform 8 x 8 DCT 4 x 4 DCT approx. Built-in deblocking filter No Yes License payments required Yes Probably no for baseline 51
Question n A 16 by 16 MB to be motion compensated is shown above The search window is shown below Which block(s) in the search window will provide the best match q q With MAE error metric? With MSE error metric? 52
Question n n A sequence of frames is being coded by an MPEGstyle coder that searches for best-match macroblocks using full search with an MSE criterion Frames are I, B, B, P, B, B. , … The camera is moving horizontally by 10 pixels per frame during this sequence, so 30 pixels of offset between I frame and subsequent P frame Many macroblocks in the P frame might get coded using MV=(30, 0) with no difference block Why might some MBs not get coded precisely this way? List all the reasons you can think of. 53