IPAL at Image Clef 2007 Mixing Features Models

Скачать презентацию IPAL at Image Clef 2007 Mixing Features Models

997850b8b971773c15af5bd315ee7c3c.ppt

Количество слайдов: 24

IPAL at Image. Clef 2007 Mixing Features, Models and Knowledge Sheng Gao IPAL French-Singaporean Joint Lab Institute for Infocomm Research Singapore

IPAL at Image. Clef’ 07 p Team members Jean-Pierre Chevallet, IPAL & CNRS, France (photo & medical) p Thi Hoang Diem Le, I 2 R, Singapore (photo & medical) p Trong Ton Pham, I 2 R, Singapore (photo) p Joo Hwee Lim, I 2 R, Singapore (photo) p

Outline p Ad-hoc photographic image retrieval (Image. CLEFphoto) u u u p Content based image retrieval (CBIR) system Text based information retrieval (TBIR) system Mix-modality retrieval system Benchmark results Summary Ad-hoc medical image retrieval (Image. CLEFmed) u u u UMLS based medical retrieval system Benchmark results Summary

Image. CLEFphoto’ 07 vs. Image. CLEFphoto’ 06 p Less text information is available in Image. CLEFphoto’ 07. Image text annotations: notes are excluded this year. Query: only annotation in the title field are used. p p Visual information plays more important role in Image. CLEFphoto’ 07 than Image. CLEFphoto’ 06. Query image samples are excluded from the image database. 244 new images are added in the image database.

Similar text annotation, different visual representation 01/1311 Title: Accommodation Huanchaco - Exterior View p 01/1310 Title: Accommodation Huanchaco - Interior View Visual evidence plays a critical role in the case.

Low-level visual features p p p p COR: Auto color correlogram, 324 -dimension. HSV: 166 -dimensional histogram in HSV(162 dimension, 18*3*3) and gray image (4 -dimension). GABOR: 48 -dimension including means and variances at 2 -scale and 12 -orientations in 5 x 5 grids. SIFT: 128 -dimensional appearance feature. EDGE: Canny edge histogram, 80 -dimension (6 orientations * 16 patches). HSV_UNI: 96 -dimensional HSV histogram, 32 -bin per channel. GABOR_global: 60 -dimension including means and variances at 5 -scale and 6 -orientations in whole image.

Indexing and similarity measure p Indexing tf-idf: index histogram, each bin treated as a word. SVD: indexing at the eigen-space (80% eigenvectors are kept). ISM: Integrated Statistic Model (ISM) based supervised learning is used to learn ranking function (refer to S. Gao, et al. , ACM Multimedia’ 07). HME: Hidden Maximum Entropy (HME) based supervised learning is used for ranking function (refer to S. Gao, et al. ICME’ 07 and ICIP’ 07). WORD_SVD: index at intermediate concepts 200 frequent keywords are intermediate concepts. HME models are trained for 200 concepts. Indexing image at 200 -dim concept space. WORD_ISM: ISM is used for ranking function at 200 -dim feature. Bo. V: index with the bag-of-visterm (1000 visterms).

Indexing and similarity measure p Similarity measure Cosine distance if the image is indexed by one feature vector, e. g. tf-idf, SVD, etc. Likelihood ratio between the positive and negative models, if supervised learning is used, e. g. ISM, HME.

CBIR system p Example fusion structure Visual feature COR HSV GABOR SIFT Indexing and ranking Tf-idf SVD Fusion cor ISM hsv HME gabor WORD_SVD WORD_ISM Fusion sift Fusion system

TBIR system p Process XML annotation files XML reader Remove stop words Stemming Lexicon

TBIR system p Language model based IR: a document-dependent language model is estimated for each document in the database. Estimate LM P(w 1|D) P(w 2|D) …… P(wn|D) Lexicon p Ranking according to the probability of the query, Q, is generated by the document-dependent LM. Q: q 1, q 2, ……, qm

TBIR system p Latent Semantic Indexing (LSI) based IR Term-document matrix SVD Eigenspace Lexicon Eigenspace p Index in eigen-space Cosine distance is used for similarity measure in LSI space.

TBIR system p Access Wikipedia to extract external knowledge for query / document expansion. 4, 881, 983 pages are downloaded (Wikipedia in English, April 2, 2007 ). 23, 399 animal terms are extracted, which are useful for queries 5, 20 and 35. 709 geographical terms are extracted, of which only mountain should be useful for queries 4 and 44.

Mix-modality system p Linear combination. CBIR TBIR p w 1 -w Mix-system Cross-modality pseudo-relevance feedback (PRF) Query words Query expansion TBIR Top N image document CBIR One scheme: CBIR to boost TBIR

Results p p 27 runs are submitted including CBIR, TBIR and mixmodality runs. Best run MAP: 0. 2833 6 th place among 476 runs; 2 nd place among automatic runs. IPAL_04 V_12 RUNS WEIGHT: combine 12 CBIR runs using the empirically tuned weights. IPAL_11 Tr. V_LM_12 RUNSVISUAL: LM-based TBIR plus the PRF. p CBIR best run MAP: 0. 1204 without PRF, 4 th place among CBIR. 1 st run from INAOE (MAP: 0. 1925) and the 2 nd run from XRCE (MAP: 0. 1890)

Results p Our TBIR best run MAP: 0. 1806 with automatic feedback, 7 th place among TBIR. 19 Ti. V_WTm. M S 0 M 2 D 0. 8 C 6 T 6: with a very small thesaurus manually extracted from Wikipedia. It is terms that are not from info boxes but are relevant to this collection. Using a black and white image detector based on HSV value of image. Run 15: LM, document expansion with automatic Wikipedia. p Top TBIR runs’ MAP: 0. 2020 without feedback (Budapest) and 0. 2075 with feedback (XRCE).

PRF analysis p CBIR system Feedback from the TBIR has few effect. MAP of the HSV-based CBIR (run 02) is only increased to 0. 0693 from 0. 0684 (run 01). Combining 12 CBIR PRF runs, MAP is increased to 0. 1358 (run 05) from 0. 1204 (run 04). p TBIR system Feedback from CBIR significantly improve MAP of the LM-based TBIR is 0. 1377 (run 08). With PRF (run 04), MAP reaches 0. 2442 (run 11).

Summary on Image. CLEFphoto p p p Combing rich visual content representations and indexing techniques significantly improve the CBIR system comparing with any individual visual system. CBIR based pseudo-relevance feedback significantly boost text based search system. Exploiting external knowledge such as Wikipedia gives an extra bonus, however, it is less effective than expected. Its large size causes confusion due to lake of disambiguation.

Image. CLEFmed - Bayesian network based approach p Conceptualization: Knowledge base: UMLS Metathesaurus (NLM). Images / texts French German English q Tree. Tagger c 1 XIotamap UMLS Concepts Metamap . . . c 2 cm d cj cn

Image. CLEFmed - Bayesian network based approach q p Retrieval process Document D observed: P(D)=1 c 1. . . c 2 cm d cj cn

Image. CLEFmed - Bayesian network based approach Inference via semantic links from document concept nodes to query concept nodes L: Maximum length of UMLS taxonomy l: minimal length of path between 2 concepts pa(c): document concept nodes which are parent nodes of c

Image. CLEFmed - Bayesian network based approach Relevance status value, RSV(q, d) : belief at q

Results Run Isa PARCHD BRRN RL RQ Map R-Prec IPAL-TXT-BAYISA 0. 1 0 0 0. 3057 0. 332 IPAL-TXT-BAYALLREL 2 0. 01 0 0 0. 3042 0. 333 IPAL-TXT-BAYALLREL 1 0. 2 0. 01 0. 001 0. 304 0. 3338 IPAL-TXT-BAYISA 0. 2 0 0 0. 3039 0. 3255 IPAL-TXT-BAYISA 0. 3 0 0 0. 2996 0. 3212 IPAL-TXT-BAYISA 0. 4 0 0 0. 2935 0. 3177

Summary on Image. CLEFmed p p p Bayesian model approach exploits semantic relationship between documents concepts and query concepts in an unified framework. It enhances the VSM by using the semantic relatedness between concepts. Improvements on relationship weighting issue as well as performance of model are our further study.