Скачать презентацию Robust Audio Identification for Commercial Applications Matthias Gruhne Скачать презентацию Robust Audio Identification for Commercial Applications Matthias Gruhne

f00189c825e83eef5354063cd1c9ff1c.ppt

  • Количество слайдов: 17

Robust Audio Identification for Commercial Applications Matthias Gruhne ghe@emt. iis. fhg. de Fraunhofer IIS, Robust Audio Identification for Commercial Applications Matthias Gruhne [email protected] iis. fhg. de Fraunhofer IIS, AEMT, D-98693 Ilmenau, Germany Matthias Gruhne, [email protected] iis. fhg. de Page 1 Fraunhofer Institut Integrierte Schaltungen

Overview • What is Audio. ID? • Requirements • System Architecture • MPEG 7 Overview • What is Audio. ID? • Requirements • System Architecture • MPEG 7 • Recognition Performance • Applications • Conclusions • Demonstration Matthias Gruhne, [email protected] iis. fhg. de Page 2 Fraunhofer Institut Integrierte Schaltungen

What is Audio. ID? Matthias Gruhne, ghe@emt. iis. fhg. de Page 3 Fraunhofer Institut What is Audio. ID? Matthias Gruhne, [email protected] iis. fhg. de Page 3 Fraunhofer Institut Integrierte Schaltungen

What is Audio. ID? Purpose • Identify audio material (artist, song, etc. ) by What is Audio. ID? Purpose • Identify audio material (artist, song, etc. ) by analysis of the signal itself ”Content-Based Identification” Conditions • No associated information required (headers, ID 3 tags) • No embedded signals (e. g. watermark), are required Some knowledge available about music to be identified (reference database) Matthias Gruhne, [email protected] iis. fhg. de Page 4 Fraunhofer Institut Integrierte Schaltungen

Requirements Recognition rate • High recognition rates (> 95%), even with distorted signals Robustness Requirements Recognition rate • High recognition rates (> 95%), even with distorted signals Robustness • Robust against various distortions: – volume change, equalization, noise addition, audio coding (e. g. MP 3), . . . – “analog” artifacts (e. g. D/A, A/D) Compactness • Small “signature” size Scalability • Extensibility of database (> 106 items) while keeping processing time low (few ms/item) Matthias Gruhne, [email protected] iis. fhg. de Page 5 Fraunhofer Institut Integrierte Schaltungen

System Architecture - Overview Matthias Gruhne, ghe@emt. iis. fhg. de Page 6 Fraunhofer Institut System Architecture - Overview Matthias Gruhne, [email protected] iis. fhg. de Page 6 Fraunhofer Institut Integrierte Schaltungen

System Architecture Feature Extractor • Signal preprocessing Feature Processor • Increase discriminance & efficiency System Architecture Feature Extractor • Signal preprocessing Feature Processor • Increase discriminance & efficiency • Extract the “essence” of audio signal • Temporal grouping of features (super vector) • Statistics calculation (mean, variance, etc. ) Matthias Gruhne, [email protected] iis. fhg. de Page 7 Fraunhofer Institut Integrierte Schaltungen

System Architecture Class generator • Clustering of processed feature vectors: – further reduce the System Architecture Class generator • Clustering of processed feature vectors: – further reduce the amount of data – enhance robustness (overfitting) • Add class with associated metadata to database Classification • Compare feature vectors against classes in database by means of some metric • Find class yielding the best approximation • Retrieve associated metadata Matthias Gruhne, [email protected] iis. fhg. de Page 8 Fraunhofer Institut Integrierte Schaltungen

MPEG-7 - Elements for Robust Audio Matching Low level data • “Audio. Spectrum. Flatness” MPEG-7 - Elements for Robust Audio Matching Low level data • “Audio. Spectrum. Flatness” LLD – Derived from: Spectral Flatness Measure (SFM) – Describes “un/flatness” of spectrum in frequency bands (tonal noise) “Fingerprint” • “Audio. Signature” Description Scheme – Statistical data summarization of “Audio. Spectrum. Flatness” LLD – Textual description in XML syntax Matthias Gruhne, [email protected] iis. fhg. de Page 9 Fraunhofer Institut Integrierte Schaltungen

MPEG-7 - Benefits • Standardized Feature Format guarantees worldwide interoperability • Published, open format MPEG-7 - Benefits • Standardized Feature Format guarantees worldwide interoperability • Published, open format descriptive data can be produced easily • Large MPEG-7 compliant databases expected to be available in near future (incl. “fingerprints”) • Long term format stability/ life time Matthias Gruhne, [email protected] iis. fhg. de Page 10 Fraunhofer Institut Integrierte Schaltungen

Recognition Performance- Conditions • Training and test sets (mostly rock / pop): Conditions – Recognition Performance- Conditions • Training and test sets (mostly rock / pop): Conditions – 15, 000 items – 90, 000 items Considered feature • Spectral Flatness Measure (SFM) Classification performance • Number of correctly identified items (both “single best” and “within top 10”) Matthias Gruhne, [email protected] iis. fhg. de Page 11 Fraunhofer Institut Integrierte Schaltungen

Recognition Performance - 15 k items Feature: SFM Cropping 100. 0% / 100. 0% Recognition Performance - 15 k items Feature: SFM Cropping 100. 0% / 100. 0% MP 3 @ 96 kbps 99. 6% / 99. 8% Loudsp. /Mic. 98. 0% / 99. 0% • 16 bands • Advanced matching with temporal tracking Matthias Gruhne, [email protected] iis. fhg. de Top 1 / Top 10 Page 12 Fraunhofer Institut Integrierte Schaltungen

Recognition Performance - 90 k items • 16 bands • Advanced matching with temporal Recognition Performance - 90 k items • 16 bands • Advanced matching with temporal tracking Matthias Gruhne, [email protected] iis. fhg. de Page 13 Fraunhofer Institut Integrierte Schaltungen

Applications • Retrieve associated metadata by identifying audio content • Automated search of audio Applications • Retrieve associated metadata by identifying audio content • Automated search of audio content on the Internet • Broadcast monitoring by protocoling the transmission of audio material • Feature based indexing of audio databases (similarity search) • . . . Matthias Gruhne, [email protected] iis. fhg. de Page 14 Fraunhofer Institut Integrierte Schaltungen

Conclusions • High recognition rates (>99 % tested with 90, 000 items) • Robust Conclusions • High recognition rates (>99 % tested with 90, 000 items) • Robust to “real world” signal distortions • Fast and reliable extraction and classification • Underlying feature specified in MPEG-7 standard ensures worldwide interoperability and licensing available for everyone Matthias Gruhne, [email protected] iis. fhg. de Page 15 Fraunhofer Institut Integrierte Schaltungen

Real Time Demonstration: • Demo running on laptop (Pentium III @ 500 MHz) • Real Time Demonstration: • Demo running on laptop (Pentium III @ 500 MHz) • Local database with 15, 000 items (Rock / Pop genre) • Acoustic transmission: mp 3 -> D/A -> Speakers -> Noisy Environment -> Microphone -> A/D -> Audio. ID Matthias Gruhne, [email protected] iis. fhg. de Page 16 Fraunhofer Institut Integrierte Schaltungen

Thanks for your Attention ! Matthias Gruhne, ghe@emt. iis. fhg. de Page 17 Fraunhofer Thanks for your Attention ! Matthias Gruhne, [email protected] iis. fhg. de Page 17 Fraunhofer Institut Integrierte Schaltungen