
ccc2f3f395306a396d2662e72655b42d.ppt
- Количество слайдов: 40
MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20
Content v MPEG-7 v What overview is… v Why? v Objectives and scope v Main elements and organization. v MPEG-7 Audio v Low-level features v High-level tools
What is MPEG-7 v "Multimedia Content Description Interface“ v ISO/IEC standard by MPEG (Moving Picture Experts Group) v Providing meta-data for multimedia v MPEG-1, -2, -4: make content available; MPEG-7: makes content accessible, retrievable, filterable, manageable (via device / computer). Multi-degrees of interpretation of information’s meaning v Support as broad a range of applications as possible. v A compatible (with existing tech) and extensible standard. v
Why MPEG-7 v “The value of information often depends on how easy it can be found, retrieved, accessed, filtered and managed. ” v Past: poverty of the digital multimedia sources -> Simplicity of the access mechanisms v Now: growing amount of audiovisual information -> Identifying and managing them efficiently is becoming more difficult. e. g. “record only news about sport. ”
Why MPEG-7 For future multimedia services, content representation and description may have to be addressed jointly. v Many services dealing with content representation will have to deal first with content description v v v “a non-described content may be useless” Need for access only to the content description: New original services (e. g. optimizing personal time) v Adaptation to networks and terminal capabilities v
Application’s domains (incomplete) v v v v Broadcast media selection (e. g. , radio channel, TV channel). Digital libraries (e. g. , film, video, audio and radio archives). E-Commerce (e. g. , personalized advertising). Education (e. g. , repositories of multimedia courses, multimedia search for support material). Home Entertainment (e. g. , management of personal multimedia collections, including manipulation of content, e. g. karaoke). Journalism (e. g. searching speeches of a certain politician using his name, his voice or his face). Multimedia directory services (e. g. yellow pages, G. I. S). Surveillance and remote sensing.
MPEG-7 Objectives Standardize content-based description for various types of audiovisual information v Independent from media support (encoding and storage) v Different granularity Low-level features: shape, size, key, tempo changes, v High-level semantic info: “scene with a barking brown dog on the left and with the sound of passing cars in the background. ” v v Meaningful in the context of the application v Same material -> different types of features and combinations e. g. timbre v. s. loudness
MPEG-7 Objectives v Information about the content The form: e. g. the coding format used v Conditions for accessing the material: e. g. Intellectual property rights / price v Classification: e. g. parental rating v Links to other relevant materials v The context: “e. g. Olympic Games 1996, final of 200 meter hurdles, men)” v v Information present in the content: v Combination of low-level and high-level descriptors
Scope of the Standard processing chain: Graph by P. Salembier and O. Avaro
An example of architecture v v Pull: (Client Queries -> Descriptions repository -> Matched Ds) Push: (Filter descriptions -> Programmed actions) Graph by P. Salembier and O. Avaro
Workplan Graph by P. Salembier and O. Avaro
Where are the descriptions from? v Preservation of existing descriptive data (e. g. scripts) through the production/delivery v Generated automatically by capture devices (e. g. time or GPS location in a camera) v Extracted automatically & semi-automatically (i. e. with some human assistance) v Manually produced (e. g. for legacy material such as existing film archives)
Main Elements of MPEG-7 v Description Tools: ( textual / binary ) Descriptors (D): define the syntax and the semantics of each feature (metadata element) v Description Schemes (DS): relationships between components v v Description Definition Language (DDL): Define the syntax of the MPEG-7 Description Tools v Creation , extension and modification of DSs v v System tools: v Storage and transmission, synchronization of descriptions with content, multiplexing of descriptions, etc.
Main Elements of MPEG-7 v Relationship among elements introduced above. Graph by P. Salembier and O. Avaro
Description Tools v v v v v Creation and production processes: (director, title) Usage: (broadcast schedule) Storage features. Structural information: (spatial-temporal components) v Segmentations Low level features: (sound timbres, melody description) Conceptual information: (objects and events, interactions) Navigation and access: (summaries, variations) Collections of objects. User-content interactions: (user preferences, usage history)
Organization of Description Tools Graph by P. Salembier and O. Avaro
Descriptions (further) MPEG-7 approaches the description of content from several viewpoints. v A set of methods and tools for the different viewpoints of the description (not a monolithic system) v Interrelated and can be combined in many ways. v Associated with the content itself: (searching, filtering) v Location: (document V. S. stream) v physically located with the material v somewhere else on the globe (maybe not) v v Interoperability with other metadata standards: (XML)
Use of Description Tools The description tools are presented on the basis of the functionality they provide. v In practice, they are combined into meaningful sets of description units. v Furthermore, each application will have to select a sub-set of descriptors and DSs. v Library of tools! v DDL can be used to handle specific needs of the application. (like scripting in many current applications) v
Major Functionalities MPEG-7 Systems v MPEG-7 Description Definition Language v MPEG-7 Visual v MPEG-7 Audio v MPEG-7 Multimedia Description Schemes (D. T. ) v Reference Software: the e. Xperimentation Model (test) v MPEG-7 Conformance (syntax checking) v MPEG-7 Extraction and use of descriptions (technical v report)
MPEG-7 Audio v Audio provides structures—building upon some basic structures from the MDS—for describing audio content. v Low-level Descriptors: v audio features that cut across many applications v High-level v more Description Tools: specific to a set of applications.
Low-level Features “MPEG-7 Audio Framework”: v Two low-level descriptor types: (for sample and segment) v Scalar : (e. g. power or fundamental frequency) v Vector : (e. g. spectra) v v Hierarchical, consistent interface v v Any descriptor inheriting from these types can be instantiated, describing a segment with a single summary value or a series of sampled values, as the application requires. Scalable Series: (hierarchical re-sampling) v Progressively down-sample the data contained in a series (Application-oriented)
Low-level Features (types) Basic v Basic Spectral v Signal Parameters v Timbral Temporal v Timbral Spectral v Spectral Basis v MPEG-7 Silence Descriptor v
Low-level Features (graph) Graph by P. Salembier and O. Avaro
Low-level Features (details) v Basic: (temporally sampled scalar values for general use) v Audio. Waveform v waveform Descriptor envelope: (for display purposes). v Audio. Power Descriptor v temporally-smoothed instantaneous power: (quick summary of a signal) v Applicable to all kinds of signals
Low-level Features (details) v Basic Spectral: (single time-frequency analysis of signal) v Audio. Spectrum. Envelope: (Base class) v the short-term power spectrum: (display, synthesize, general-purpose search) v Audio. Spectrum. Centroid: v dominated v by high or low frequencies ? Audio. Spectrum. Spread: v the power spectrum centered near the spectral centroid, or spread out over the spectrum? v pure-tone and noise-like sounds v Audio. Spectrum. Flatness: (the presence of tonal components)
Low-level Features (details) v Signal Parameters: (periodic or quasi-periodic signals) v Audio. Fundamental. Frequency: v “confidence measure”, replacing “pitch-tracking” v Audio. Harmonicity: v distinction between sounds with a harmonic / inharmonic / non-harmonic spectrum
Low-level Features (details) v Timbral Temporal: (temporal characteristics of segments of sounds, musical timbre) v Log. Attack. Time v Temporal. Centroid v where in time the energy of a signal is focused. v Useful when attack times are identical
Low-level Features (details) v Timbral Spectral: (spectral features in a linear-frequency space) Spectral. Centroid: v power-weighted average of the frequency of the bins in the linear power spectrum. v distinguishing musical instrument timbres v 4 Ds for harmonic regularly-spaced components of signals: v Harmonic. Spectral. Centroid v Harmonic. Spectral. Deviation v Harmonic. Spectral. Spread v Harmonic. Spectral. Variation v
Low-level Features (details) v Spectral Basis: (low-dimensional projections of a spectral space to aid compactness and recognition) v Audio. Spectrum. Basis: va series of (time-varying / statistically independent) basis functions derived from the singular value decomposition of a normalized power spectrum. v Audio. Spectrum. Projection: v low-d independent subspaces of a spectra correlate strongly with different sound sources. v Provide more salience using less space. With Sound Classification and Indexing Description Tools. v v features of a spectrum after projection upon a reduced rank basis.
Low-level Features (details) v Silence segment: (no significant sound) v aid further segmentation of the audio stream, or as a hint not to process a segment
High-level audio Description Tools (Ds and DSs) v Exchange some generality for descriptive richness: v a smaller set of audio features (as compared to visual features) that may canonically represent a sound without domain-specific knowledge. Audio Signature (DS) v Musical Instrument Timbre v Melody v General Sound Recognition and Indexing v Spoken Content v
High-level audio Description Tools (details) v Audio Signature Description Scheme v Spectral. Flatness Ds v a unique content identifier for the purpose of robust automatic identification v e. g. audio fingerprinting
High-level audio Description Tools (details) v Musical Instrument Timbre Description Tools v Harmonic. Instrument. Timbre v Log. Attack. Time Ds: Descriptor v Percussive. Iinstrument. Timbre v Spectral. Centroid Descriptor Ds:
High-level audio Description Tools (details) v Melody Description Tools: v efficient, robust, and expressive melodic similarity matching. v Melody. Contour Description Scheme: v terse, efficient melody contour / rhythm v Melody. Sequence Description Scheme: vverbose, complete, expressive melody / rhythm. v. Interval encoding
High-level audio Description Tools (details) v General Sound Recognition and Indexing Description Tools: v Sound. Model Description Scheme v Sound. Classification. Model Description Scheme va set of Sound. Model DS -> multi-way classifier v Sound. Model. State. Path v indices Descriptor to states generated by a Sound. Model of a segment v immediately applied to sound effects v automatically index and segment sound tracks. v Low -> mid -> high level analyses
High-level audio Description Tools (details) v Spoken Content Description Tools: v detailed description of words spoken within an audio stream. v indexing into and retrieval of an audio stream v indexing of multimedia objects annotated with speech. v Recall of audio/video data by memorable spoken events. v v Spoken Document Retrieval v v a character or person spoke a particular word separate spoken documents Annotated Media Retrieval v photograph retrieved using a spoken annotation
Development v Currently under development: v v v New Audio Description Tools specified (MPEG-7 version 2): v v MPEG-7 Audio COR. 1 (currently at DCOR 1) MPEG-7 Amendment 1 (currently at FPDAM 1) Spoken Content: Audio Signal Quality: Audio Tempo: Currently Proposed tools: v v Low Level Descriptor for Audio Intensity Low Level Descriptor for Audio Spectrum Envelope Evolution Generic mechanism for data representation based on ‘modulation decomposition’ MPEG-7 Audio-specific binary representation of descriptors
MPEG-7 version 1 Schedule Call for Proposals October 1998 v Evaluation February 1999 v First version of Working Draft (WD) December 1999 v Committee Draft (CD) October 2000 v Final Committee Draft (FCD) February 2001 v Final Draft International Standard (FDIS) July 2001 v International Standard (IS) September 2001 v
MPEG-7 work plan: v See : Annex A of MPEG-7 Overview (version 9) http: //www. chiariglione. org/mpeg/standards/mpeg 7/mpeg-7. htm
Annotated Link Page / References v http: //www. music. mcgill. ca/~damonli/611_w 2. htm v All pictures taken from: v P. Salembier and O. Avaro, “MPEG-7: Multimedia Content Description interface”, http: //gps-tsc. upc. es/imatge/_Philippe/demo/MPEG 21_MPEG 7. pdf
ccc2f3f395306a396d2662e72655b42d.ppt