497ce58797814d03f2fd200b052775ca.ppt
- Количество слайдов: 41
Introduction n Much of the information is in form of images Images are handled by machines as a matrix of digital picture elements, or pixels The appearance of an image depends on n image type n resolution Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 1
Types of images & Resolution n bilevel (black & white) n e. g. faxes n grayscale color n dot per inches (dpi) n n 600 x 600 – actual medium quality laser printer 1200 x 1200 – low cost phototypesetter 4800 x 4800 – high resolution phototypesetter Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 2
Bilevel images: CCITT fax standard n n fax: facsimile CCITT Comité Consultatif International Téléphonique et Télégraphique, it is part of the ITU International Telecommunication Union, one of the specialized agencies of the United Nations In the late 70 s CCITT starts thinking about a standard for fax transmission 1980 CCITT Group 3 standard n group 1 & 2 are earlier attempt, which use simpler encoding and modulations techniques, resulting in very slow transmissions Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 3
CCITT Group 3 - I n n n It is the most common standard for fax transmission It is accepted worldwide, almost every fax machine supports this standard It uses compression algorithms for bilevel images Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 4
CCITT Group 3 - II n n n Paper size: international A 4 (not US letter) standard resolution 204 x 98 dpi (200 x 100) high resolution 204 x 196 dpi (200 x 200) bilevel image 1 bit/pixel image size: 1728 x 1188 bits at standard resolution about 2 Mbit Transmission rate: 4. 8 Kbit/s 1728 bits/line 1188 lines/page today is usually higher, 14. 4 – 33. 6 Kbit/s At 4. 8 Kbit/s in std resolution one page would take about 430 sec, but only 1 minute on average with Group 3 algorithms n n 5
Run-length coding n n n Each scan line is composed by sequences of pixel of the same color Count the number of element of each run Example 3 w 4 b 9 w 2 b 2 w 6 b 5 w 2 b 5 w. . . Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 6
G 3 1 D n n Group 3 One-Dimensional coding (G 3 1 D) is called Modified Huffman (MH) as it encodes run lengths using a predefined Huffman code In order to maintain black/white syncronization, each line begins with a white run, eventually of zero length Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 7
G 3 1 D 1000 011 10100 11 0111 0010. . . n predefined Huffman codewords have been found from the probabilities of the runs in typical handwritten documents Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 8
G 3 1 D n n As one line has 1728 bits, we have to define a codeword for all 1728 black and white run lengths As shorter runs occur more frequently that longer runs, we code each run length in an additive form n n there is a terminating and makeup codeword Lengths form 0 to 63 are coded with a single terminating codeword Longer runs are coded with one or more makeup codewords and a terminating codeword Each line is terminated with a EOL symbol composed of eleven 0 and one 1 Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 9
G 3 2 D n n n Group 3 Two-Dimensional coding (G 3 2 D) is called Modified READ (MR) as it is a variant of a previously defined code, called READ (Relative Element Address Designate) Many images have a high degree of vertical coherence between consecutive lines changing elements are coded w. r. t. a “nearby” change position of the same color in the previous (reference) line 10
G 3 2 D n n n Nearby means within an interval of radius 3 pixels If there are changing elements in the current line without correspondents in the reference line switch to horizontal mode (1 D) On the opposite if the ref line has a run with no counterpart in the current line special pass code Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 11
G 3 2 D reference line current line vertical mode horizontal pass vertical mode code mode -1 0 from a Huffman table, generated with codewords for code -3, -2, -1, 0, +1, +2, +3 . . . +2 -2
G 3 2 D n Two dimensional coding is more prone to transmission errors n n n In the G 3 1 D an error may cause problems in the entire line, but syncronization is forced back by EOL codeword Here an error in the reference line is likely propagated in all the other lines For this reason there are 1 reference line for each k lines (i. e. k-1 are coded w. r. t. each ref line) standard resolution k=2 high resolution k=4 Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 13
CCITT fax standard compression performances n Standard resolution (~200 x 100 dpi) n n n High resolution (~200 x 200 dpi) n n n G 3 1 D 0. 13 bits/pixel 57 s. for A 4 at 4. 8 Kbps G 3 2 D (k=2) 0. 11 bits/pixel 47 s. for A 4 at 4. 8 Kbps G 3 2 D (k=4) 0. 09 bits/pixel 74 s. for A 4 at 4. 8 Kbps Compression is very good for office image where run lengths are long It would be very bad for bilevel natural images 14
Continuous-tone images: why lossless compression? n n lossy compression is often preferred to have remarkably more compressed images, with good quality However there are some situations in which using an approximation may not be adequate n n n medical images historical documents images with legal relevance Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 15
Continuous-tone images: lossless compression n GIF standard PNG standard JPEG-LS n n n It is a quite new standard. The original JPEG standard included a lossless mode, but its performances were not close to ‘state of the art’ extimation of pixel value using quite simple context: effective and low cost solution www. hpl. hp. com/loco 16
GIF image format - I n n Adopted by Compu. Serve to minimize the time required to download images over a modem link The most widely used lossless image format until 1995 8 -bit pixel description 256 color images, but it is possible to use a color map Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 17
GIF image format - II n The color map can be specified for each image or can be omitted n n n if specified, it is included as an header into image file, in uncompressed form color map is composed of 256 24 -bit entries, that specify 256 RGB colors Compression scheme used is LZW n Alphabet symbols are the 256 colors of the color map plus a “clear” code and an “end-of-information” code Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 18
GIF image format - III n n Even if this feature is not widely used, GIF files may contain more than one image, and it is possible to share the color map LZW-coded information is grouped into blocks preceded by a byte-count, in order to skip an image without decompressing it In 1995 Unisys announced that there would be royalties on GIF implementations due to an old patent they held on LZW This catalyzed the development of a new lossless image format, designed for public domain and with the last improvements 19
PNG image format - I n n Portable Network Graphics (pronounced “ping”) it uses gzip compression scheme through some improvements compression obtained is about 10 -30% better than GIF By default it encodes the pixels in raster scan order, but some other methods are available n n n it is possible to code horizontal difference, i. e. the difference between current pixel value and the previous one or vertical difference, i. e. the difference w. r. t. the above pixel average difference, the difference with the average of above and next pixel. . . Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 20
PNG image format - II n n n It is possible to use more than 256 colors, up to 16 bit grayscale and 48 bit color GIF uses one special pixel value to indicate transparency, PNG uses 256 different values per pixel, allowing for picture progressively fading into the background It seems inevitable that PNG format will gradually assume the role of standard lossless image format for the WWW, replacing GIF Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 21
Continuous-tone images: why lossy compression? n n n Digital images are yet an approximation of the real analog phenomenon lossy techniques allow to obtain very good compression with a modest lost of details This is useful for storing and trasmitting images Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 22
Continuous-tone images: lossy compression n n JPEG 2000 n n n a new image coding system that uses stateof-the-art compression techniques based on wavelet technology file extension. jp 2 With very compressed files, if image size is the same, perceived quality of JPEG 2000 images is better w. r. t. JPEG images Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 23
JPEG format - I n n n JPEG is a standard defined by the Joint Photographic Experts Group in 1992 It was conceived to transmit images at 64 Kbps It has a lossy mode and a lossless mode (not so much used, and today replaced by the JPEG -LS standard) With lossy mode it allows to obtain very good quality at about 1 bit/pixel Implementation complexity is reasonable Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 24
JPEG format - II n n n It could be used with graylevel and color images Each channel of the color space (RGB, YUV. . . ) is treated separately it allows progressive transmission (that is much better suited for WWW than raster transmission) Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 25
Raster vs. progressive transmission
JPEG Coder - I Discrete Cosine Transform Quantization Binary Encoder Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 27
JPEG Coder - II n n n Image is divided in 8 x 8 -pixel squares Preprocessing Apply Discrete Cosine Transform on each square Coefficient quantization Bit stream encoding Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 28
Preprocessing: color space transformation & downsampling n from RGB into YUV n n The Y component represents the brightness of a pixel, and the U and V components together represent the hue and saturation Human eye can see more detail in the Y component than in the U and V, that can be compressed more aggressively n n n 4: 4: 4 4: 2: 2 4: 2: 0 no downsampling horizontal downsampling of a factor 2 both horizontal and vertical downsampling Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 29
Discrete Cosine Transform - I n n The discrete cosine transform (DCT) is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers It is used in JPEG because it is fast and quite easy to implement efficiently Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 30
Discrete Cosine Transform - II where n n n the block is pixels (in JPEG, 8 x 8) A(i, j) is the value of pixel of position (i, j) is the DCT coefficient of position low values for corresponds to low vertical frequencies, low values for to low horizontal frequencies Generally higher frequencies have very low values Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 31
Discrete Cosine Transform - III n n DCT function basis each 8 x 8 square is reduced to 64 coefficient Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 32
Discrete Cosine Transform - IV n n Knowing with infinite precision the 64 DCT coefficient it is possible to reconstruct exactly the pixels of the square But n n n finite precision quantization of the coefficients (always) Some coefficient related to high frequency are not transmitted. This allows higher compression without sacrifying too much quality as human eye is less responsible Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 33
Quantization - I n n n The DCT matrix obtained is scaled differently in each component, dividing each by a diferent factor the factor for each component has been decided based on human sensitivity to changes at each frequency In practice the matrix of factor is usually 34
Quantization - II n n Next, all values are rounded to nearest integer This leads to a quite high number of 0 s in the high frequency zone, as factors are bigger Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 35
Zig-zag scan n n Low frequency coefficients are transmitted before higher frequency coefficients This allows for progressive visualization of this 8 x 8 block Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 36
Raster vs. progressive transmission n Raster transmission n n DCT coefficient of the upper left block, then those of all the others in the upper part of the image and so on Progressive transmission n first all (0, 0) coefficients, than all (0, 1) and so on, following zig-zag scan in each block Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 37
Binary coding n DCT(0, 0) has usually a very slow variation from one block to the next, as it is the mean value n n Tipically the bit stream is coded with Huffman n n For this reason it is convenient to encode the difference from the previous value It is possible to use arithmetic scheme, gaining some compression at cost of decoding speed Huffman codes are predefined, or it is possible to build optimal tables and insert them in the stream Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 38
JPEG Decoder Binary Decoder Some values are lost! Dequantization Inverse DCT Good quality, but reconstruction is not exact Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 39
JPEG performances - I Gabriele Monfardini - Corso di Basi di Dati Multimediali a. a. 2004 -2005 40
JPEG performances - II Original Quality factor 75 Quality factor 20 Quality factor 3 41