Скачать презентацию Multimedia Retrieval Outline Overview Indexing Multimedia Скачать презентацию Multimedia Retrieval Outline Overview Indexing Multimedia

64d95b342d3946b5e8c453d36002537b.ppt

  • Количество слайдов: 115

Multimedia Retrieval Multimedia Retrieval

Outline • Overview Indexing Multimedia • Generative Models & MMIR – Probabilistic Retrieval – Outline • Overview Indexing Multimedia • Generative Models & MMIR – Probabilistic Retrieval – Language models, GMMs • Experiments – Corel experiments – TREC Video benchmark

Indexing Multimedia Indexing Multimedia

A Wealth of Information Speech Audio Images Temporal composition Database A Wealth of Information Speech Audio Images Temporal composition Database

Associated Information gender name country Player Profile history Biography id picture Associated Information gender name country Player Profile history Biography id picture

User Interaction Poses a query Gives examples Views Evaluates query text video segments results User Interaction Poses a query Gives examples Views Evaluates query text video segments results feedback Database

Indexing Multimedia • Manually added descriptions – ‘Metadata’ • Analysis of associated data – Indexing Multimedia • Manually added descriptions – ‘Metadata’ • Analysis of associated data – Speech, captions, OCR, … • Content-based retrieval – Approximate retrieval – Domain-specific techniques

Limitations of Metadata • Vocabulary problem – Dark vs. somber • Different people describe Limitations of Metadata • Vocabulary problem – Dark vs. somber • Different people describe different aspects – Dark vs. evening

Limitations of Metadata • Encoding Specificity Problem – A single person describes different aspects Limitations of Metadata • Encoding Specificity Problem – A single person describes different aspects in different situations • Many aspects of multimedia simply cannot be expressed unambiguously – Processes in left (analytic, verbal) vs. right brain (aesthetics, synthetic, nonverbal)

Approximate Retrieval • Based on similarity – Find all objects that are similar to Approximate Retrieval • Based on similarity – Find all objects that are similar to this one – Distance function – Representations capture some (syntactic) meaning of the object • ‘Query by Example’ paradigm

Feature extraction N-dimensional space Ranking Display Feature extraction N-dimensional space Ranking Display

Low-level Features Low-level Features

Low-level Features Low-level Features

Query image Query image

So, … Retrieval? ! So, … Retrieval? !

IR is about satisfying vague information needs provided by users, (imprecisely specified in ambiguous IR is about satisfying vague information needs provided by users, (imprecisely specified in ambiguous natural language) by satisfying them approximately against information provided by authors (specified in the same ambiguous natural language) Smeaton

No ‘Exact’ Science! • Evaluation is not done analytically, but experimentally – real users No ‘Exact’ Science! • Evaluation is not done analytically, but experimentally – real users (specifying requests) – test collections (real document collections) – benchmarks (TREC: text retrieval conference) – Precision – Recall –. . .

Known Item Known Item

Query Query

Results … Results …

Query Query

Results … Results …

Semantic gap… concepts ? features raw multimedia data Semantic gap… concepts ? features raw multimedia data

Observation • Automatic approaches are successful under two conditions: – the query example is Observation • Automatic approaches are successful under two conditions: – the query example is derived from the same source as the target objects – a domain-specific detector is at hand

1. Generic Detectors 1. Generic Detectors

Retrieval Process Database Query Parsing Query type Nouns Adjectives Detector / Feature selection Filtering Retrieval Process Database Query Parsing Query type Nouns Adjectives Detector / Feature selection Filtering Ranking Camera operations Invariant People, Names color spaces Natural/physical objects . .

Parameterized detectors Example Topic 41 Query text People detector <1, 2, 3, many> Results Parameterized detectors Example Topic 41 Query text People detector <1, 2, 3, many> Results

Query Parsing Find The query type Nouns Adjectives Other examples of overhead zooming in Query Parsing Find The query type Nouns Adjectives Other examples of overhead zooming in views of canyons in the Western United States + names

Detectors The universe and everything F O C U S Camera operations (pan, zoom, Detectors The universe and everything F O C U S Camera operations (pan, zoom, tilt, …) People (face based) Names (Video. OCR) Natural objects (color space selection) Physical objects (color space selection) Monologues (specifically designed) Press conferences (specifically designed) Interviews (specifically designed) Domain specific detectors

2. Domain knowledge 2. Domain knowledge

Player Segmentation Original image Initial segmentation Final segmentation Player Segmentation Original image Initial segmentation Final segmentation

Advanced Queries Show clips from tennis matches, starring Sampras, playing close to the net; Advanced Queries Show clips from tennis matches, starring Sampras, playing close to the net;

3. Get to know your users 3. Get to know your users

Mirror Approach • Gather User’s Knowledge – Introduce semi-automatic processes for selection and combination Mirror Approach • Gather User’s Knowledge – Introduce semi-automatic processes for selection and combination of feature models • Local Information – Relevance feedback from a user • Global Information – Thesauri constructed from all users

Feature extraction N-dimensional space Clustering Ranking Concepts Display Thesauri Feature extraction N-dimensional space Clustering Ranking Concepts Display Thesauri

Low-level Features Low-level Features

Identify Groups Identify Groups

Representation • Groups of feature vectors are conceptually equivalent to words in text retrieval Representation • Groups of feature vectors are conceptually equivalent to words in text retrieval • So, techniques from text retrieval can now be applied to multimedia data as if these were text!

Query Formulation • Clusters are internal representations, not suited for user interaction • Use Query Formulation • Clusters are internal representations, not suited for user interaction • Use automatic query formulation based on global information (thesaurus) and local information (user feedback)

Interactive Query Process • Select relevant clusters from thesaurus • Search collection • Improve Interactive Query Process • Select relevant clusters from thesaurus • Search collection • Improve results by adapting query – Remove clusters occuring in irrelevant images – Add clusters occuring in relevant images

Assign Semantics Assign Semantics

Visual Thesaurus Glcm_47 Correct cluster representing ‘Tree’, ‘Forest’ ‘Incoherent’ cluster Fractal_23 Mis-labeled cluster Gabor_20 Visual Thesaurus Glcm_47 Correct cluster representing ‘Tree’, ‘Forest’ ‘Incoherent’ cluster Fractal_23 Mis-labeled cluster Gabor_20

Learning • Short-term: Adapt query to better reflect this user’s information need • Long-term: Learning • Short-term: Adapt query to better reflect this user’s information need • Long-term: Adapt thesaurus and clustering to improve system for all users

Thesaurus Only After Feedback Thesaurus Only After Feedback

4. Nobody is unique! 4. Nobody is unique!

Collaborative Filtering • Also: social information filtering – Compare user judgments – Recommend differences Collaborative Filtering • Also: social information filtering – Compare user judgments – Recommend differences between similar users • People’s tastes are not randomly distributed • You are what you buy (Amazon)

Collaborative Filtering • Benefits over content-based approach – Overcomes problems with finding suitable features Collaborative Filtering • Benefits over content-based approach – Overcomes problems with finding suitable features to represent e. g. art, music – Serendipity – Implicit mechanism for qualitative aspects like style • Problems: large groups, broad domains

5. Ask for help 5. Ask for help

Query Articulation Feature extraction N-dimensional space How to articulate the query? Query Articulation Feature extraction N-dimensional space How to articulate the query?

What is the query semantics ? What is the query semantics ?

Details matter Details matter

Problem Statement • Feature vectors capture ‘global’ aspects of the whole image • Overall Problem Statement • Feature vectors capture ‘global’ aspects of the whole image • Overall image characteristics dominate the feature-vectors • Hypothesis: users are interested in details

Irrelevant Background Query Result Irrelevant Background Query Result

Image Spots • Image-spots articulate desired image details – Foreground/background colors – Colors forming Image Spots • Image-spots articulate desired image details – Foreground/background colors – Colors forming ‘shapes’ – Enclosure of shapes by background colors • Multi-spot queries define the spatial relations between a number of spots

Query Images Results Hist 16 Hist Spot+Hist 5968 6274 6098 5953 6612 6563 7062 Query Images Results Hist 16 Hist Spot+Hist 5968 6274 6098 5953 6612 6563 7062 7107 6888 7034 192 2 4 3 1 14 2 4 3 1

A: Simple Spot Query `Black sky’ A: Simple Spot Query `Black sky’

B: Articulated Multi-Spot Query `Black sky’ above `Monochrome ground’ B: Articulated Multi-Spot Query `Black sky’ above `Monochrome ground’

C: Histogram Search in `Black Sky’ images 2 -4: 14: C: Histogram Search in `Black Sky’ images 2 -4: 14:

Complicating Factors • What are Good Feature Models? • What are Good Ranking Functions? Complicating Factors • What are Good Feature Models? • What are Good Ranking Functions? • Queries are Subjective!

Probabilistic Approaches Probabilistic Approaches

Generative Models… • A statistical model for generating data – Probability distribution over samples Generative Models… • A statistical model for generating data – Probability distribution over samples in a given ‘language’aka M ‘Language( Modelling’ P( |M) =P |M) P ( | M, ) © Victor Lavrenko, Aug. 2002

… in Information Retrieval • Basic question: – What is the likelihood that this … in Information Retrieval • Basic question: – What is the likelihood that this document is relevant to this query? • P(rel|I, Q) = P(I, Q|rel)P(rel) / P(I, Q) • P(I, Q|rel) = P(Q|I, rel)P(I|rel)

‘Language Modelling’ • Not just ‘English’ • But also, the language of – – ‘Language Modelling’ • Not just ‘English’ • But also, the language of – – author newspaper text document image • Hiemstra or Robertson? • ‘Parsimonious language models explicitly address the relation between levels of language models that are typically used for smoothing. ’

‘Language Modelling’ • Not just ‘English’ • But also, the language of – – ‘Language Modelling’ • Not just ‘English’ • But also, the language of – – author newspaper text document image • Guardian or Times?

‘Language Modelling’ • Not just English! • But also, the language of – – ‘Language Modelling’ • Not just English! • But also, the language of – – author newspaper text document image • or ?

Unigram and higher-order models P( ) =P( )P( | ) P( | • Unigram Unigram and higher-order models P( ) =P( )P( | ) P( | • Unigram Models P( )P( ) • N-gram Models P( )P( | ) • Other Models – Grammar-based models, etc. – Mixture models © Victor Lavrenko, Aug. 2002 )

The fundamental problem • Usually we don’t know the model M – But have The fundamental problem • Usually we don’t know the model M – But have a sample representative of that model P( |M( )) • First estimate a model from a sample • Then compute the observation probability M © Victor Lavrenko, Aug. 2002

Indexing: determine models Docs Models • Indexing – Estimate Gaussian Mixture Models from images Indexing: determine models Docs Models • Indexing – Estimate Gaussian Mixture Models from images using EM – Based on feature vector with colour, texture and position information from pixel blocks – Fixed number of components

Retrieval: use query likelihood • Query: • Which of the models is most likely Retrieval: use query likelihood • Query: • Which of the models is most likely to generate these 24 samples?

Probabilistic Image Retrieval ? Probabilistic Image Retrieval ?

Rank by P(Q|M) P(Q|M 1) P(Q|M 2) P(Q|M 3) P(Q|M 4) Query Rank by P(Q|M) P(Q|M 1) P(Q|M 2) P(Q|M 3) P(Q|M 4) Query

Topic Models P(Q|M 1) Query P(D 1|QM) P(Q|M 2) P(Q|M 3) P(D 2|QM) P(D Topic Models P(Q|M 1) Query P(D 1|QM) P(Q|M 2) P(Q|M 3) P(D 2|QM) P(D 3|QM) P(Q|M 4) Query Model P(D 4|QM) Documents

Probabilistic Retrieval Model • Text – Rank using probability of drawing query terms from Probabilistic Retrieval Model • Text – Rank using probability of drawing query terms from document models • Images – Rank using probability of drawing query blocks from document models • Multi-modal – Rank using joint probability of drawing query samples from document models

Text Models • Unigram Language Models (LM) – Urn metaphor • P( © Victor Text Models • Unigram Language Models (LM) – Urn metaphor • P( © Victor Lavrenko, Aug. 2002 )~P( )P( ) = 4/9 * 2/9 * 4/9 * 3/9

Generative Models and IR • Rank models (documents) by probability of generating the query Generative Models and IR • Rank models (documents) by probability of generating the query • Q: • P( | ) = 4/9 * 2/9 * 4/9 * 3/9 = 96/9 • P( | ) = 3/9 * 3/9 = 81/9 • P( | ) = 2/9 * 3/9 * 2/9 * 4/9 = 48/9 • P( | ) = 2/9 * 5/9 * 2/9 = 40/9

The Zero-frequency Problem • Suppose some event not in our example – Model will The Zero-frequency Problem • Suppose some event not in our example – Model will assign zero probability to that event – And to any set of events involving the unseen event • Happens frequently with language • It is incorrect to infer zero probabilities – Especially when dealing with incomplete samples ?

Smoothing • Idea: shift part of probability mass to unseen events • Interpolation with Smoothing • Idea: shift part of probability mass to unseen events • Interpolation with background (General English) – Reflects expected frequency of events – Plays role of IDF – +(1 - )

Hierarchical Language Model • MNM Smoothed over multiple levels Alpha * P(T|Shot) + Beta Hierarchical Language Model • MNM Smoothed over multiple levels Alpha * P(T|Shot) + Beta * P(T|‘Scene’) + Gamma * P(T|Video) + (1–Alpha–Beta–Gamma) * P(T|Collection) • Also common in XML retrieval – Element score smoothed with containing article

Image Models • Urn metaphor not useful – Drawing pixels useless • Pixels carry Image Models • Urn metaphor not useful – Drawing pixels useless • Pixels carry no semantics – Drawing pixel blocks not effective • chances of drawing exact query blocks from document slim • Use Gaussian Mixture Models (GMM) – Fixed number of Gaussian components/clusters/concepts

Key-frame representation split Y colour channels Cb Cr Take samples position DCT coefficients Query Key-frame representation split Y colour channels Cb Cr Take samples position DCT coefficients Query model 675 661 668 665 669 9 7 -7 10 -5 12 13 13 11 18 11 5 3 2 7 1 -5 -3 4 -3 9 11 0 5 1 4 3 -1 2 -5 1517 1536 1534 -9 2 0 0 0 -3 -4 -5 -5 -5 0 0 0 0 1 0 0 0 1 0 0 850 844 837 829 833 EM algorithm 15 5 3 0 -5 4 4 3 3 4 0 -2 -3 -1 -1 1 0 0 4 1 -2 0 3 -2 -2 1 0 -1 1 1 1 1 2 3 4 5

Image Models ? • Expectation-Maximisation (EM) algorithm – iteratively • estimate component assignments • Image Models ? • Expectation-Maximisation (EM) algorithm – iteratively • estimate component assignments • re-estimate component parameters

Expectation Maximization E M Component 1 Component 2 Component 3 Expectation Maximization E M Component 1 Component 2 Component 3

Expectation Maximization animation E M Component 1 Component 2 Component 3 Expectation Maximization animation E M Component 1 Component 2 Component 3

Visual. SEEk • Querying by image regions and spatial laylout – Joint content-based/spatial querying Visual. SEEk • Querying by image regions and spatial laylout – Joint content-based/spatial querying – Automated region extraction – Direct indexing of color feature

Image Query Process Image Query Process

Color Similarity • Color space similarity – measure the closeness in the HSV color Color Similarity • Color space similarity – measure the closeness in the HSV color space • Color histograms distance – Minkowski metric between hq and ht • Histogram quadratic distance – used in QBIC project – measure the weighted similarity between histograms – compute the cross similarity between colors • It is computational expensive

Color sets • Color sets – A compact alternative to color histograms • Color Color sets • Color sets – A compact alternative to color histograms • Color set examples – Transform RGB to HSV • Quantize the HSV color space to 2 hues, 2 saturations, and 2 values • Assign a unique index m to each quantized HSV color => right dimensional binary space • color set: a selection from the eight colors

Color sets and Back-Projection • Process stage – Color selection • 1, 2 colors, Color sets and Back-Projection • Process stage – Color selection • 1, 2 colors, …, until extract salient regions – Back-projection onto the image • map I[x, y] to the most similar color in the color set – Thresholding – Labeling

Color Set Query Strategy • Given the query color set – perform several range Color Set Query Strategy • Given the query color set – perform several range queries on the query color set’s colors – take the intersection of these lists – minimize the sum of attributes in the intersection list

Single Region Query • Region absolute location – fixed query location • The Euclidean Single Region Query • Region absolute location – fixed query location • The Euclidean distance of centroids – bounded query location • fall within a designed area: dq, t=0 • otherwise: the Euclidean distance of the centroids

Index Structure • Centroid location spatial access – Spatial quad-tree • Rectangle (MBR) location Index Structure • Centroid location spatial access – Spatial quad-tree • Rectangle (MBR) location spatial access – R-trees

Size Comparison • Area – The absolute distance between two regions • Spatial Extent Size Comparison • Area – The absolute distance between two regions • Spatial Extent – measure the distance among the width and the height of the MBRs – is much simpler than shape information

Single Region Query Strategy Single Region Query Strategy

Multiple Regions Query Multiple Regions Query

Region Relative Location • Use 2 D-string Region Relative Location • Use 2 D-string

Spatial Invariance • Provide scaling/rotation – approximate rotation invariance by providing different projections Spatial Invariance • Provide scaling/rotation – approximate rotation invariance by providing different projections

Multiple Regions Query Strategy • Relatives locations – perform query on all attributes except Multiple Regions Query Strategy • Relatives locations – perform query on all attributes except location – find the intersection of the region lists – for each candidate image • the 2 D-string is generated • compared to the 2 D-string of the query image

Query Formulation • User tools – sketches regions – position them on the query Query Formulation • User tools – sketches regions – position them on the query grid – assign colors, size, and absolute location – may assign boundaries

Query Examples(1) Query Examples(1)

Query Examples(2) Query Examples(2)

Query Examples(3) Query Examples(3)

Bolbworld • Image is treated as a few “blobs” – Image regions are roughly Bolbworld • Image is treated as a few “blobs” – Image regions are roughly homogeneous with respect color and texture

Grouping Pixels into Regions • For each pixel – assign a vector consisting of Grouping Pixels into Regions • For each pixel – assign a vector consisting of color, texture, and position features (8 -D space) • Model the distribution of pixels – Use EM algorithm to fit the mixture of Gaussians model to the data • Perform spatial grouping – Connected pixels belonging to the same color/texture cluster

Describe the Regions • For each region – store its color histogram • Match Describe the Regions • For each region – store its color histogram • Match the color of two regions – Use the quadratic distance between their histograms x and y – d 2 hist(x, y) = (x-y)TA(x-y) – Matrix A • a symmetric matrix of weights between 0 and 1 • represent the similarity between bin i and bin j

Querying in Blobworld • The user composes a query – By submitting an image Querying in Blobworld • The user composes a query – By submitting an image • see its Blobworld representation • select the relevant blobs to match • specify the relative importance of the blob features – Atomic query • Specify a particular blob with feature vector vi to match • For each blob vj in the database image • Find the distance between vi and vj – Measure the similarity between bi and bj – Take the maximum value

Querying in Blobworld • Compound query – calculated using fuzzy-logic operator – user may Querying in Blobworld • Compound query – calculated using fuzzy-logic operator – user may specify a weight for each atomic query – rank the images according to the overall score • indicate the blobs provided the highest score • helps the user refine the query – Change the weighting of blob features – Specify new blobs to match

Indexing of Blobworld • Indexing the color feature vectors – to speed up atomic Indexing of Blobworld • Indexing the color feature vectors – to speed up atomic queries – use R*-trees – higher dimensional data • require larger data entries => lower fanout – Need a low dimensional approximation to the full color feature vectors • use SVM to find Ak: the best rank k approximation to the weight matrix A • project the feature space into the subspace spanned by the rows of Ak • index the low-dimensional vector

Multimedia Cross-model Correlation • Case study: automatic image captioning – Each image has a Multimedia Cross-model Correlation • Case study: automatic image captioning – Each image has a set of extracted regions – Some of the regions have a caption

Feature Extraction • Discover image regions – Extracted by a standard segmentation algorithm – Feature Extraction • Discover image regions – Extracted by a standard segmentation algorithm – Each region is mapped into a 30 -dim feature vector • Mean, standard deviation of RGB values, average responses to various texture filters, position in the entire image layout, shape description • How to capture cross-media correlations?

Mixed Media Graph (MMG) • Graph construction – V(O): the vertex of object O Mixed Media Graph (MMG) • Graph construction – V(O): the vertex of object O – V(ai): the vertex of the attribute value A= ai – Add an edge if and only if the two token values are close enough • for each feature-vector, choose its k nearest neighbors => add the corresponding edges – the nearest neighbor relationship is not symmetric • NN-links: the nearest neighbor links • OAV-links: the object-attribute-value-links

Example of MMG Example of MMG

Correlation Discovery • Correlation discovery by random walk – random walk with restart – Correlation Discovery • Correlation discovery by random walk – random walk with restart – To compute the affinity of node “B” for node “A” • a random walker starts from node “A” • choose randomly among the available edges every time – with probability c to go back to node “A” (restart) • the steady-state probability that the random walker will find at node “B) – the affinity of “B” with respect to “A”

RWR Algorithm • Given a query object Oq – do an RWR from node RWR Algorithm • Given a query object Oq – do an RWR from node q=V(Oq) – compute the steady state probability vector uq=(uq(1), …, uq(N)). • Example – Given an image I 3 – Estimate ui 3 for all nodes in the GMMG – Report the top few caption words for the image