
959c66116a9521fce03368424e8c6a92.ppt
- Количество слайдов: 60
Natural Language Processing CSE 592 Applications of AI Winter 2003 Information Retrieval Speech Recognition Syntactic Parsing Semantic Interpretation 1
Example Applications • Spelling and grammar checkers • Finding information on the WWW • Spoken language control systems: banking, shopping • Classification systems for messages, articles • Machine translation tools 2
The Dream 3
Information Retrieval (Thanks to Adam Carlson) 4
Motivation and Outline • Background – Definitions • The Problem – 100, 000+ pages • The Solution – Ranking docs – Vector space – Probabilistic approaches • Extensions – Relevance feedback, clustering, query expansion, etc. 5
What is Information Retrieval • Given a large repository of documents, how do I get at the ones that I want – Examples: Lexus/Nexus, Medical reports, Alta. Vista • Different from databases – Unstructured (or semi-structured) data – Information is (typically) text – Requests are (typically) word-based 6
Information Retrieval Task • Start with a set of documents • User specifies information need – Keyword query, Boolean expression, highlevel description • System returns a list of documents – Ordered according to relevance • Known as the ad-hoc retrieval problem 7
Measuring Performance tn • Precision – Proportion of selected items that are correct • Recall – Proportion of target items that were selected fp tp fn System returned these Actual relevant docs Recall • Precision-Recall curve – Shows tradeoff Precision 8
Basic IR System • Use word overlap to determine relevance – Word overlap alone is inaccurate • Rank documents by similarity to query • Computed using Vector Space Model 9
Vector Space Model • Represent documents as a matrix – Words are rows – Documents are columns – Cell i, j contains the number of times word i appears in document j – Similarity between two documents is the cosine of the angle between the vectors representing those words 10
Vector Space Example a: System and human system engineering testing of EPS b: A survey of user opinion of computer system response time c: The EPS user interface management system d: Human machine interface for ABC computer applications e: Relation of user perceived response time to error measurement f: The generation of random, binary, ordered trees g: The intersection graph of paths in trees h: Graph minors IV: Widths of trees and wellquasi-ordering i: Graph minors: A survey 11
Vector Space Example cont. interface c user b a system 12
Similarity in Vector Space se ic Measures word overlap ist x etr rm e th O Normalizes for different length vectors 13
Answering a Query Using Vector Space • Represent query as vector • Compute distances to all documents • Rank according to distance • Example – “computer system” 14
Common Improvements • The vector space model – Doesn’t handle morphology (eat, eats, eating) – Favors common terms • Possible fixes – Stemming • Convert each word to a common root form – Stop lists – Term weighting 15
Handling Common Terms • Stop list – List of words to ignore • “a”, “and”, “but”, “to”, etc. • Term weighting – Words which appear everywhere aren’t very good discriminators – give higher weight to rare words 16
tf * idf 17
Inverse Document Frequency • IDF provides high values for rare words and low values for common words For a collection of 10000 documents 18
Probabilistic IR • Vector space model robust in practice • Mathematically ad-hoc – How to generalize to more complex queries? (intel or microsoft) and (not stock) • Alternative approach: model problem as finding documents with highest probability of being relevant to the query – Requires making some simplifying assumptions about underlying probability distributions – In certain cases can be shown to yield same results as vector space model 19
Probability Ranking Principle n For a given query Q, find the documents D that maximize the odds that the document is relevant (R): 20
Probability Ranking Principle n For a given query Q, find the documents D that maximize the odds that the document is relevant (R): Probability of document relevance to any query – i. e. , the inherent quality of the document 21
Probability Ranking Principle n For a given query Q, find the documents D that maximize the odds that the document is relevant (R): Probability that if document is indeed relevant, then the query is in fact Q But where do we get that number? 22
Bayesian nets for text retrieval Documents d 1 w 1 c 2 c 3 q 1 q 2 q 0 Document Network w 3 Words w 2 c 1 d 2 Concepts Query Network Query operators (AND/OR/NOT) Information need 23
Bayesian nets for text retrieval Computed once for entire collection Documents d 1 w 1 c 2 c 3 q 1 q 2 q 0 Document Network w 3 Words w 2 c 1 d 2 Concepts Query Network Query operators (AND/OR/NOT) Information need 24
Bayesian nets for text retrieval Documents d 1 w 1 Computed for each query c 2 c 3 q 1 q 2 q 0 Document Network w 3 Words w 2 c 1 d 2 Concepts Query Network Query operators (AND/OR/NOT) Information need 25
Conditional Probability Tables n P(d) = prior probability document d is relevant n n n P(w | d) = probability that a random word from document d is w n n Term frequency P(c | w) = probability that a given document word w has same meaning as a query word c n n Uniform model: P(d) = 1 / Number docs In general, document quality P(r | d) Thesarus P(q | c 1, c 2, …) = canonical form of operators AND, OR, NOT, etc. 26
Example Macbeth Hamlet reason trouble OR Document Network double two Query Network NOT AND User Query 27
Details n n n Set head q 0 of user query to “true” Compute posterior probability P(D | q 0) “User information need” doesn’t have to be a query - can be a user profile, e. g. , other documents user has read Instead of just words, can include phrases, inter-document links Link matrices can be modified over time. n n User feedback The promise of “personalization” 28
Extensions • • Meet demands of web-based systems Modified ranking functions for the web Relevance feedback Query expansion Document clustering Latent Semantic Indexing Other IR tasks 29
IR on the Web • Query Alta. Vista with “Java” – Almost 107 pages found • Avoiding latency – User wants (initial) results fast • Solution – Rank documents using word-overlap – Use special data structure - inverted index 30
Improved Ranking on the Web • Not just arbitrary documents • Can use HTML tags and other properties – Query term in
Page. Rank • Idea: Good pages link to other good pages – Round 1: count in-links Problems? – Round 2: sum weighted in-links – Round 3: and again, and again… • Implementation: Repeated random walk on snapshot of the web – weight frequency visited 32
Relevance Feedback • System returns initial set of documents • User identifies relevant documents • System refines query to get documents more like those identified by user – Add words common to relevant docs – Reposition query vector closer to relevant docs • Lather, rinse, repeat… 33
Query Expansion • Given query, add words to improve recall – Workaround for synonym problem • Example – boat ® boat OR ship • Can involve user feedback or not • Can use thesaurus or other online source – Word. Net 34
Document Clustering • Group similar documents – Similar means “close in vector space” • If a document is relevant, return whole cluster • Can be combined with relevance feedback • GROUPER http: //www. cs. washington. edu/research/clustering 35
Clustering Algorithms • K-means Initialize k cluster centers Loop Assign all document to closest center Move cluster centers to better fit assignment Until little movement Clusters Cluster centers • Hierarchical Agglomerative Clustering Initialize each document to a singleton cluster Loop Merge two closest clusters Until k clusters exist e sur ers a me lust s to en c y wa etwe y an ce b M an t dis 36
Latent Semantic Indexing • Creates modified vector space • Captures transitive co-occurrence information – If docs A & B don’t share any words, with each other, but both share lots of words with doc C, then A & B will be considered similar • Simulates query expansion and document clustering (sort of) 37
Variations on a Theme • Text Categorization – Assign each document to a category – Example: automatically put web pages in Yahoo hierarchy • Routing & Filtering – Match documents with users – Example: news service that allows subscribers to specify “send news about high-tech mergers” 38
Speech Recognition TO BE COMPLETED 39
Syntactic Parsing Semantic Interpretation TO BE COMPLETED 40
NLP Research Areas • Morphology: structure of words • Syntactic interpretation (parsing): create a parse tree of a sentence. • Semantic interpretation: translate a sentence into the representation language. – Pragmatic interpretation: incorporate current situation into account. – Disambiguation: there may be several interpretations. Choose the most probable 41
Some Difficult Examples • From the newspapers: – – Squad helps dog bite victim. Helicopter powered by human flies. Levy won’t hurt the poor. Once-sagging cloth diaper industry saved by full dumps. • Ambiguities: – Lexical: meanings of ‘hot’, ‘back’. – Syntactic: I heard the music in my room. – Referential: The cat ate the mouse. It was ugly. 42
Parsing • Context-free grammars: EXPR -> NUMBER EXPR -> VARIABLE EXPR -> (EXPR + EXPR) EXPR -> (EXPR * EXPR) • (2 + X) * (17 + Y) is in the grammar. • (2 + (X)) is not. • Why do we call them context-free? 43
Using CFG’s for Parsing • Can natural language syntax be captured using a context-free grammar? – Yes, no, sort of, for the most part, maybe. • Words: – – – nouns, adjectives, verbs, adverbs. Determiners: the, a, this, that Quantifiers: all, some, none Prepositions: in, onto, by, through Connectives: and, or, but, while. Words combine together into phrases: NP, VP 44
An Example Grammar • • S -> NP VP VP -> V NP NP -> NAME NP -> ART N ART -> a | the V -> ate | saw N -> cat | mouse NAME -> Sue | Tom 45
Example Parse • The mouse saw Sue. 46
Ambiguity • • • “Sue bought the cat biscuits” S -> NP VP VP -> V NP NP NP -> N N NP -> Det NP Det -> the V -> ate | saw | bought N -> cat | mouse |biscuits | Sue | Tom 47
Example: Chart Parsing • Three main data structures: a chart, a key list, and a set of edges • Chart: Name of terminal or non-terminal length 4 3 2 1 1 2 3 Starting points 4 48
Key List and Edges • Key list: Push down stack of chart entries – “the” “box” “floats” • Edges: rules that can be applied to chart entries to build up larger entries length 4 3 2 1 det the o box floats the 1 2 3 4 49
Chart Parsing Algorithm • Loop while entries in key list – 1. Remove entry from key list – 2. If entry already in chart, • Add edge list • Break – 3. Add entry from key list to chart – 4. For all rules that begin with entry’s type, add an edge for that rule – 5. For all edges that need the entry next, add an extended edge (see algorithm on right) – 6. If the edge is finished, add an entry to the key list with type, start point, length, and edge list • To extend an edge with chart entry c – – Create a new edge e’ Set start (e’) to start (e) Set end(e’) to end(e) Set rule(e’) to rule(e) with “o” moved beyond c. – Set the righthandside(e’) to the righthandside(e)+c 50
Try it • • • S -> NP VP VP -> V NP -> Det N Det -> the N -> box V -> floats 51
Semantic Interpretation • Our goal: to translate sentences into a logical form. • But: sentences convey more than true/false: – It will rain in Seattle tomorrow. – Will it rain in Seattle tomorrow? • A sentence can be analyzed by: – propositional content, and – speech act: tell, ask, request, deny, suggest 52
Propositional Content • We develop a logic-like language for representing propositional content: – Word-sense ambiguity – Scope ambiguity • Proper names --> objects (John, Alon) • Nouns --> unary predicates (woman, house) • Verbs --> – transitive: binary predicates (find, go) – intransitive: unary predicates (laugh, cry) • Quantifiers: most, some • Example: Mary: Loves(John, Mary) 53
From Syntax to Semantics • ADD SLIDES ON SEMANTIC INTERPRETATION 54
Word Sense Disambiguation • ADD SLIDES! 55
Statistical NLP • Consider the problem of tagging part-ofspeech: – “The box floats” – “The” Det; “Box” N; “Floats” V; • Given a sentence w(1, n), where w(i) is the i -th word, we want to find tags t(i) assigned to each word w(i) 56
The Equations • Find the t(1, n) that maximizes – P[t(1, n)|w(1, n)]=P[w(1, n)|t(1, n)]/P(w(1, n)) – So, only need to maximize P[w(1, n)|t(1, n)] • Assume that – A word depends only on previous tag – A tag depends only on previous tag – We have: • P[w(j)|w(1, j-1), t(1, j)]=P[w(j)|t(j)], and • P[t(j)|w(1, j-1), t(1, j-1)] = P(t(j)|t(j-1)] – Thus, want to maximize • P[w(n)|t(n-1)]*P[t(n+1)|t(n)]*P[w(n-1)|t(n-2)]*P[t(n)|t(n 1)]… 57
Example • “The box floats”: given a corpus (a training set) – Assignment one: • T(1)=det, T(2) = V, T(3)=V • P(V|det) is rather low, so is P(V|V). Thus is less likely compared to – Assignment two: • T(t)=det, T(2) = N; t(3) = V • P(N|det) is high, and P(V|N) is high, thus is more likely! – In general, can use Hidden Markov Models to find probabilities floats the det box N V 58
Experiments • Charniak and Colleagues did some experiments on a collection of documents called the “Brown Corpus”, where tags are assigned by hand. • 90% of the corpus are used for training and the other 10% for testing • They show they can get 95% correctness with HMM’s. • A really simple algorithm: assign t to w by the highest probability tag P(t|w) 91% correctness! 59
Natural Language Summary • Parsing: – context free grammars with features. • Semantic interpretation: – Translate sentences into logic-like language – Use additional domain knowledge for wordsense disambiguation. – Use context to disambiguate references. 60