e553ddddea26dffa7e20497194e23b1b.ppt
- Количество слайдов: 113
Statistical Language Models for Information Retrieval
1 + +- - + + --- 0 + +- + ++ - - -- - + - -
Retrieval Engine Updated query Document collection top 10
Source Transmitter (encoder) Noisy Channel Receiver (decoder) Destination
Sampling
Estimation Total #words =100
Query = “data mining algorithms” Which model would most likely have generated this query?
Doc LM Query likelihood
Max. Likelihood Estimate Smoothed LM
TF weighting Words in both query and doc IDF-like weighting Ignore for ranking
long Verbose queries long short Keyword queries
Query = “the algorithms for data mining” p( “algorithms”|d 1) = p(“algorithm”|d 2) p( “data”|d 1) < p(“data”|d 2) p( “mining”|d 1) < p(“mining”|d 2) Query = “the algorithms for data mining”
Relevance
Ignored for ranking D
Document prior
An infinite mixture model Kernel-based density function
Document D Query Q
Background words Topic words
Empirical divergence Divergence minimization
Trec topic 412: “airport security” Mixture model approach Web database Top 10 docs
Estimate with a bilingual lexicon Or Parallel corpora
QUERY MODELING Query Language Model Query USER MODELING Retrieval Decision: Documents Document Language Models DOC MODELING Loss Function
Us er Sourc e Query Document
Boolean model Probabilistic relevance model Generative Relevance Theory Vector-space Model Two-stage LM KL-divergence model Subtopic retrieval model
Estimate Query Estimate Documents Query model parameters Query Language Model User model parameters Document Language Models Loss Function Set
Can LMs consistently (convincingly) outperform traditional methods without sacrificing efficiency? Can we do much better by going beyond unigram LMs?
How can we learn effectively from past relevance judgments? How can we break the document unit in a principled way?
How can we exploit user information and search context to improve search? What role can LMs play when combining text with relational data?
How can we develop an effective unified retrieval model for Web search? How can we exploit LMs to develop models for complex retrieval tasks?


