
ac57a5a16a66b52d48c4c8c59e9923a2.ppt
- Количество слайдов: 29
Chap 3: Encoding Video Content and Transportation
Learning Objectives ÆAfter completing this learning object, you should be able to: ÖExplain compression goals and characteristics. ÖDescribe the technical architecture of the MPEG-2, H. 264/AVC, and VC-1 compression standards.
ØCompression Goal The goal of video compression is to reduce the quantity of data used to represent video content without substantially reducing the quality of the picture. Analog Video Sequence Film or Video Camera Uncompressed Digital Bitstream Digitization Compressed Digital Bitstream Compression // Digital TV Decode Analog TV NTSC/PAL Encode Transport
ØAbout MPEG ÆAn acronym for Moving Pictures Experts Group ÆWidely used by satellite, cable, and terrestrial TV systems ÆThe group has produced a family of major compression standard MPEG Format MPEG-1 MPEG-2 MPEG-4 (Part 2) MPEG-4 Part 10 Description The MPEG-1 file format was originally developed in 1988 and was primarily used to compress video data at bit rates of 1. 5 Mbps. MPEG-1 content is used for such services as DAB (Digital Audio Broadcasting) and is the standard format on the Internet for quality video. MPEG-1 is also the basis of the MP 3 standard, which is widely used for music on the Internet. MPEG-2 builds on the powerful video compression capabilities of the MPEG-1 standard. MPEG-2 is widely used in the delivery of broadcast-quality television and storing video content on DVDs. A number of international television standards are based on this compression format. MPEG-4, whose formal ISO/IEC designation is ISO/IEC 14496, was finalized in October 1998 and became an international standard in 2000. Part 2 of the standard is divided into a number of profiles that address the requirements of various video applications ranging from mobile phones to surveillance cameras. MPEG-4 Part 10 also called H. 264/AVC has been designed to deliver broadcast and DVD-quality video at minimum data rates. Also produced specifications for describing A/V content, delivery, and consumption: ÖMPEG-7 ÖMPEG-21
v Compression Key Concept – Remove Redundancy Compression Algorithms are able to reduce the size of a video bit stream significantly because video typically contains duplicate or redundant information both within and between frames. Image and video compression algorithms can take advantage of two primary types of redundancy to reduce the size of the resulting bit-stream. Ö Spatial Redundancy Ö Temporal Redundancy Spatial compression works on 1 image – Temporal compression works on several images
v. Spatial Compression - Overview ÆSpatial Compression (intra-frame compression): Ö Spatial compression is applied to a single picture Ö Spatially compressed pictures are called I Frames or I Slices Ö Spatial method is performed by taking advantage of the fact that the human eye is unable to distinguish small differences in color. Ö Neighboring pixels in an image often have similar values: color or brightness of an object typically does not vary significantly over small areas. Ö Instead of encoding each pixel individually, a compression algorithm could save bits by encoding only the difference between neighboring pixels. ÖThat difference is typically a smaller value than the full range of possible pixel values and therefore can be encoded with fewer bits.
v Temporal Compression: Intro Temporal compression Æ Temporal compression refers to the bit reduction between successive frames Ö Typical frame rate: 25 or 30 frames/sec Ö Change between two adjacent frames: 1/25 or 1/30 seconds Ö In most case most of the picture is unchanged, while only parts of the picture are moving. There is an obvious redundancy between frames. Ö Temporal compression only encodes the small changes or the direction that a part of the image moved between frames. Ö These changes typically require fewer bits then representing the whole image again.
Ø MPEG-2 Frame Types: I Frames Æ I-Frame Characteristics: ÖStarting point for a sequence ÖUndergo minimal compression ÖIndependently encoded as a single image ÖNo reference to any past or future frames ÖEncoding scheme used is similar to JPEG compression (. 25 bits represents a single pixel, whereas 2. 5 bits per pixel for higher quality) ÖSelf contained and used as a foundation to build other types of frames Ö Effectively a JPEG image ÖI frames are typically large ( 00’s to 000’s of IP packets per frame) Ö I frame is aproximately 64, 000 bytes Ö One I-frame will occur approximately every 0. 4 seconds of video runtime. Ö More I-Frames make an MPEG stream ‘more editable’. I B B P B
Ø MPEG-2 Frame Types: P Frames Æ Forward predicted frames (P-frames) Characteristics: Ö Based on past “I” or “P” frames Ö Moderately compressed Ö Not actually an encoded image Ö Contains motion information (vectors) that allows the IPTVCD to rebuild the frame ÖP-Frames require less bandwidth than I-Frames (Important for IPTV Deployments) ÖP-frames are typically much smaller than I-Frames (0’s to 00’s of packets per frame)
Ø MPEG-2 Frame Types: B-Frames Æ Bi-directional predicted frames (B-frames) Characteristics: ÖMade up from information from both past and future “I” frames and “P” frames Ö Encoding for B-frames is similar to P-frames, except that B-frames can specify two motion vectors (one to past and one to future) ÖExtensive compression - B-frames occupy less space than I-Frames or PFrames - (0’s to 00’s of packets per frame) Ö A stream containing a high-density of B-frames requires less bandwidth compared to a digital stream built with a high density of I & P frames. Ö When frame dropping occurs, B-frames get discarded first because they have the lowest impact on video quality, compared to I & P frames.
Ø Group of Pictures (GOPs) Æ GOP Characteristics: 10 Ö I, P & B images are combined to form a sequence of picture frames called a GOP. Ö Each GOP must begin with a full reference IFrame - mitigates propagation of errors to one GOP. Ö All frames depend on the contents of the I-Frame Ö GOPs vary in size – Average GOP for IPTV deployments is between 12 and 15 frames in length. Some GOP configurations can however include 250 frames. Ö There are typically between 10 and 12 P- and Bframes occurring between each I-frame. [I B B B P] 8 6 Relative amounts of data for each frame type in a typical MPEG GOP 4 2 0 I B B P B B I
Ø MPEG-2 Shortcomings Æ Although MPEG-2 has served the cable and satellite industries well for the past decade it has shortcomings when deployed on networks that have limited bandwidth capacities: Ö A telephone network was not designed to carry MPEG coded video. Æ New advanced compression schemes with better capabilities have been developed in recent years for the purpose of delivering video content over bandwidth constrained networks: ÖMPEG-4 Part 10 AVC/H. 264 Ö Microsoft’s Windows Media Video (also known as VC-1)
v Example MPEG-4 Compression Æ A farmer herding cattle. Æ The scene can be decomposed into a number of objects: ÖThe field and houses in the background ÖThe sky ÖThe farmer, son, and cattle walking along the road. ÖThe farmer’s voice ÖNoises emanating from the cows Æ H. 264/AVC treats each one of these objects separately. Æ Compression is applied to each object.
ØOthers VC-1 Æ Standardized by the Society of Motion Picture and Television Engineers (SMPTE) Æ Most high profile implementations has been its adoption by Microsoft’s Windows Media Video (WMV) 9 multimedia coding platform Æ A number of other international standards including the highdefinition DVD formats HD-DVD and Blu-ray have also adopted VC-1. AVS Æ China has developed a standard called AVS. Æ Efficiency levels achieved by this standard are quite similar to the performance levels achieved by MPEG-4 Part 10. Æ Also covers areas such as digital copyright and content management
Transportation Architecture
MPEG Streams 1/3 ÖElementary Streams ÖPacketized Elementary Streams ÖTransport Streams ÖProgram streams Video ES Video Encoder Audio ES Packetizer Audio Encoder Packetizer Video PES Audio PES PSIP Data Transport Stream MUX Video ES Video Encoder Audio ES Audio Encoder Multiple Program Packetizer Video PES Packetizer Audio PES Transport Stream
MPEG Streams 2/3
Program Streams 3/3 Æ A Program Stream carries a single program ÖIn MPEG, a program is a combination of video, audio, and related data ÖAll information in the program stream must have a common time-base. ÖAimed for error-free environments like DVD files, etc Video PES + Audio PES 1 + Audio PES 2 = Program Stream 1 Pack Header
PES Packet Overview Æ In order for the audio, data, and video elementary streams to be transmitted over the digital network, each elementary stream is converted into an interleaved stream of time stamped PES packets. Æ PES Packet Characteristics: Ö A PES packet may be a fixed (or variable) sized block Ö Up to 65536 bytes per packet Ö Each PES packet has a header - 6 bytes þ Includes a number that identifies the ES, this is important when combining audio & video sources þ Also includes a time stamp for synchronization of ES’s þ Due to networking difficulties the order of video frames outputted from the IPTV data center can be different to the order that they are received by the IPTVCD. þ Thus to help synchronization, MPEG based systems often time stamp PES packets Ö Remainder of PES packet used for video content
Time Stamping PES Packets Æ There are two types of time stamps that can be applied to each PES packet: ÖPresentation Time Stamps (PTS) þIt’s purpose is to define when and in what order the video should be presented/displayed to the viewer. þ 33 -bit time value þ Set in the PES Header Field ÖDecode Time Stamp (DTS) ÖIt’s purpose is to instruct the IPTVCD decoder when to process the packets. Æ These time stamps are based on the encoders system time clock (STC). Æ For synchronization to occur the STC needs to also be relevant at the IPTVCD decoder. Æ Thus, the encoder uses STC to time stamp a program clock reference (PCR) in the MPEG-2 transport stream packets. Æ These values are used by the decoder to synchronise its own 27 MHz clock to the encoders STC.
v Program Clock Reference (PCR) Æ To assist the decoder in presenting programs on time, at the right speed, and with synchronization, programs usually periodically provide a Program Clock Reference, or PCR, on one of the PIDs in the program Æ The PCR is a 42 -bit number. Æ The PCR time stamp carried by each program is derived from the 27 MHz source clock (STC). Æ This results in an accuracy of 1 in 27, 000 or 37 ns Æ The adaptation field in the packet header is used periodically to include the PCR code that allows generation of a locked clock at the decoder. Æ PCR’s should be spaced no more than 40 ms apart according to some international standards
MPEG TS Packets Overview Æ A TS is formed by breaking up either elementary streams or program streams or both into MPEG TS packets. ÆAbout MPEG TS Packets ÖFormed by breaking up the PES packets Ö Fixed-size of 188 bytes Ö Each TS packet contains one of the three media formats 1. Video 2. Audio 3. Data Æ Thus; TS packets do not support a mix of media
TS MPEG Packet Structure
Structure of an MPEG TS Packet Field Name Description of Functionality Synchronization Byte The header starts with a well-known Synchronization Byte (8 bits). This has the bit pattern 0 x 47 (0100 0111) and is used to detect the start of the MPEG TS packet. ÖThis single bit flag indicates an error in the associated transport stream. Transport error indicator ÖIt is set by the encoder, when it detects corrupted source content. Start indicator ÖThis purely identifies source content issue and is not an indicator of distribution network problems. This flag indicates the start of the video payload. Transport Priority When set this flag identifies priority level of the video payload. Program Identifier (PID) ÖThe most important field of the header is the 13 bits that define the program identifier. ÖThis uniquely identifies the stream that the packet belongs to. All packets belonging to this stream will have the same PID value. ÖThis information is used by the demultiplexer in the IPTVCD to distinguish between different packet types. The bulk of the packets are video, followed by audio packets and null packets for unused space. ÖNull packets are always assigned a PID value of 8191. ÖPackets that have no PID values are typically discarded by the receiving IPTVCD.
Structure of an MPEG TS Packet Field Name Description of Functionality Transport scrambling control Adaption field control This two-bit field indicates the encryption status of the transport stream packet payload. This two-bit field indicates whether the associated transport stream packet header includes an adaptation field and payload. The continuity counter increments by one each time a transport stream packet with the same PID value is passed through the MPEG system. This helps to identify lost or duplicate packets, which could affect the quality of the video been viewed by the IPTV subscriber. ÖThe adaptation field contains a variety of data used for timing and control including the Program Clock Reference (PCR). Continuity counter Adaptation field ÖThe PCR is used to synchronise the IPTVCD clock with the source encoder clock. ÖPCR values are 42 -bits in length and increment according to a standard clock rate of 27 MHz. Once synchronization has taken place the decoding of the IPTV MPEG-2 stream can occur.
RTP ÆThis optional layer is used by a wide variety of IPTV applications ÆIt acts as an intermediary between the H. 264/AVC, MPEG-2, or VC-1 encoded content in the higher layers and the lower sections of the IPTVCM. ÆReal-time Protocol is the foundation block of this layer. ÆOriginally designed for real-time streaming of media content across an IP network
IPTVCM Transport Layer ÆRTP packets form the input to the transport layer ÆPossible to map MPEG-TS packets directly into the transport layer protocol payload; avoiding the RTP layer ÆTransport Layer Characteristics: Ö Hide the intricacies of the IP network structure from the upper-layer processes Ö Provide for the reliability and integrity of the end-to-end communication link Ö If video data is not delivered to the IPTVCD correctly, the transport layer can initiate retransmission Ö Alternatively, it can inform the upper layers which can then take the necessary corrective action ÆTwo primary transport protocols: Ö Transmission Control Protocol (TCP) Ö User Datagram Protocol (UDP)
IP Encapsulation Æ IP Encapsulation is the process of taking a data stream, formatting it into packets, and adding the headers and other data required. Æ MPEG over IP Transport streams consist of a series of multiple MPEG TS packets packed inside UDP datagrams Ö A typical IP video packet will contain 7 TS packets (188 x 7 = 1316 bytes) Ö Add Ethernet, IP and UDP headers (64 bytes) 1316 bytes + 64 bytes _________ = 1380 bytes Ethernet IP/UDP MPEG 2 TS Video Packet 188 bytes MPEG 2 TS Video Packet 188 bytes IP packet with MPEG 2 TS video payload carried over Ethernet MPEG 2 TS Video Packet 188 bytes CRC
Traveling up the IPTVCM Æ When data is received at the IPTVCD; the following occurs: ÖEncapsulation process is reversed ÖDe-capsulation at the data link layer involves: þInspecting the packet þ Removing the Ethernet header and the CRC fields þExamines type code & determines the packet needs to be processed by the IP protocol þ Packets are passed upwards to the IP layer ÖDe-capsulation at the IP layer involves: ÖInspects packet ÖRemoves the IP header ÖPackets are passed upwards to the UDP layer Æ This process continues until the packets reach the top of the IPTVCM and the raw video gets displayed on the viewers TV screen