Скачать презентацию Natural Language Processing CSE 592 Applications of AI Скачать презентацию Natural Language Processing CSE 592 Applications of AI

959c66116a9521fce03368424e8c6a92.ppt

  • Количество слайдов: 60

Natural Language Processing CSE 592 Applications of AI Winter 2003 Information Retrieval Speech Recognition Natural Language Processing CSE 592 Applications of AI Winter 2003 Information Retrieval Speech Recognition Syntactic Parsing Semantic Interpretation 1

Example Applications • Spelling and grammar checkers • Finding information on the WWW • Example Applications • Spelling and grammar checkers • Finding information on the WWW • Spoken language control systems: banking, shopping • Classification systems for messages, articles • Machine translation tools 2

The Dream 3 The Dream 3

Information Retrieval (Thanks to Adam Carlson) 4 Information Retrieval (Thanks to Adam Carlson) 4

Motivation and Outline • Background – Definitions • The Problem – 100, 000+ pages Motivation and Outline • Background – Definitions • The Problem – 100, 000+ pages • The Solution – Ranking docs – Vector space – Probabilistic approaches • Extensions – Relevance feedback, clustering, query expansion, etc. 5

What is Information Retrieval • Given a large repository of documents, how do I What is Information Retrieval • Given a large repository of documents, how do I get at the ones that I want – Examples: Lexus/Nexus, Medical reports, Alta. Vista • Different from databases – Unstructured (or semi-structured) data – Information is (typically) text – Requests are (typically) word-based 6

Information Retrieval Task • Start with a set of documents • User specifies information Information Retrieval Task • Start with a set of documents • User specifies information need – Keyword query, Boolean expression, highlevel description • System returns a list of documents – Ordered according to relevance • Known as the ad-hoc retrieval problem 7

Measuring Performance tn • Precision – Proportion of selected items that are correct • Measuring Performance tn • Precision – Proportion of selected items that are correct • Recall – Proportion of target items that were selected fp tp fn System returned these Actual relevant docs Recall • Precision-Recall curve – Shows tradeoff Precision 8

Basic IR System • Use word overlap to determine relevance – Word overlap alone Basic IR System • Use word overlap to determine relevance – Word overlap alone is inaccurate • Rank documents by similarity to query • Computed using Vector Space Model 9

Vector Space Model • Represent documents as a matrix – Words are rows – Vector Space Model • Represent documents as a matrix – Words are rows – Documents are columns – Cell i, j contains the number of times word i appears in document j – Similarity between two documents is the cosine of the angle between the vectors representing those words 10

Vector Space Example a: System and human system engineering testing of EPS b: A Vector Space Example a: System and human system engineering testing of EPS b: A survey of user opinion of computer system response time c: The EPS user interface management system d: Human machine interface for ABC computer applications e: Relation of user perceived response time to error measurement f: The generation of random, binary, ordered trees g: The intersection graph of paths in trees h: Graph minors IV: Widths of trees and wellquasi-ordering i: Graph minors: A survey 11

Vector Space Example cont. interface c user b a system 12 Vector Space Example cont. interface c user b a system 12

Similarity in Vector Space se ic Measures word overlap ist x etr rm e Similarity in Vector Space se ic Measures word overlap ist x etr rm e th O Normalizes for different length vectors 13

Answering a Query Using Vector Space • Represent query as vector • Compute distances Answering a Query Using Vector Space • Represent query as vector • Compute distances to all documents • Rank according to distance • Example – “computer system” 14

Common Improvements • The vector space model – Doesn’t handle morphology (eat, eats, eating) Common Improvements • The vector space model – Doesn’t handle morphology (eat, eats, eating) – Favors common terms • Possible fixes – Stemming • Convert each word to a common root form – Stop lists – Term weighting 15

Handling Common Terms • Stop list – List of words to ignore • “a”, Handling Common Terms • Stop list – List of words to ignore • “a”, “and”, “but”, “to”, etc. • Term weighting – Words which appear everywhere aren’t very good discriminators – give higher weight to rare words 16

tf * idf 17 tf * idf 17

Inverse Document Frequency • IDF provides high values for rare words and low values Inverse Document Frequency • IDF provides high values for rare words and low values for common words For a collection of 10000 documents 18

Probabilistic IR • Vector space model robust in practice • Mathematically ad-hoc – How Probabilistic IR • Vector space model robust in practice • Mathematically ad-hoc – How to generalize to more complex queries? (intel or microsoft) and (not stock) • Alternative approach: model problem as finding documents with highest probability of being relevant to the query – Requires making some simplifying assumptions about underlying probability distributions – In certain cases can be shown to yield same results as vector space model 19

Probability Ranking Principle n For a given query Q, find the documents D that Probability Ranking Principle n For a given query Q, find the documents D that maximize the odds that the document is relevant (R): 20

Probability Ranking Principle n For a given query Q, find the documents D that Probability Ranking Principle n For a given query Q, find the documents D that maximize the odds that the document is relevant (R): Probability of document relevance to any query – i. e. , the inherent quality of the document 21

Probability Ranking Principle n For a given query Q, find the documents D that Probability Ranking Principle n For a given query Q, find the documents D that maximize the odds that the document is relevant (R): Probability that if document is indeed relevant, then the query is in fact Q But where do we get that number? 22

Bayesian nets for text retrieval Documents d 1 w 1 c 2 c 3 Bayesian nets for text retrieval Documents d 1 w 1 c 2 c 3 q 1 q 2 q 0 Document Network w 3 Words w 2 c 1 d 2 Concepts Query Network Query operators (AND/OR/NOT) Information need 23

Bayesian nets for text retrieval Computed once for entire collection Documents d 1 w Bayesian nets for text retrieval Computed once for entire collection Documents d 1 w 1 c 2 c 3 q 1 q 2 q 0 Document Network w 3 Words w 2 c 1 d 2 Concepts Query Network Query operators (AND/OR/NOT) Information need 24

Bayesian nets for text retrieval Documents d 1 w 1 Computed for each query Bayesian nets for text retrieval Documents d 1 w 1 Computed for each query c 2 c 3 q 1 q 2 q 0 Document Network w 3 Words w 2 c 1 d 2 Concepts Query Network Query operators (AND/OR/NOT) Information need 25

Conditional Probability Tables n P(d) = prior probability document d is relevant n n Conditional Probability Tables n P(d) = prior probability document d is relevant n n n P(w | d) = probability that a random word from document d is w n n Term frequency P(c | w) = probability that a given document word w has same meaning as a query word c n n Uniform model: P(d) = 1 / Number docs In general, document quality P(r | d) Thesarus P(q | c 1, c 2, …) = canonical form of operators AND, OR, NOT, etc. 26

Example Macbeth Hamlet reason trouble OR Document Network double two Query Network NOT AND Example Macbeth Hamlet reason trouble OR Document Network double two Query Network NOT AND User Query 27

Details n n n Set head q 0 of user query to “true” Compute Details n n n Set head q 0 of user query to “true” Compute posterior probability P(D | q 0) “User information need” doesn’t have to be a query - can be a user profile, e. g. , other documents user has read Instead of just words, can include phrases, inter-document links Link matrices can be modified over time. n n User feedback The promise of “personalization” 28

Extensions • • Meet demands of web-based systems Modified ranking functions for the web Extensions • • Meet demands of web-based systems Modified ranking functions for the web Relevance feedback Query expansion Document clustering Latent Semantic Indexing Other IR tasks 29

IR on the Web • Query Alta. Vista with “Java” – Almost 107 pages IR on the Web • Query Alta. Vista with “Java” – Almost 107 pages found • Avoiding latency – User wants (initial) results fast • Solution – Rank documents using word-overlap – Use special data structure - inverted index 30

Improved Ranking on the Web • Not just arbitrary documents • Can use HTML Improved Ranking on the Web • Not just arbitrary documents • Can use HTML tags and other properties – Query term in – Query term in , , etc. tag – Check date of document (prefer recent docs) – Page. Rank (Google) 31

Page. Rank • Idea: Good pages link to other good pages – Round 1: Page. Rank • Idea: Good pages link to other good pages – Round 1: count in-links Problems? – Round 2: sum weighted in-links – Round 3: and again, and again… • Implementation: Repeated random walk on snapshot of the web – weight frequency visited 32

Relevance Feedback • System returns initial set of documents • User identifies relevant documents Relevance Feedback • System returns initial set of documents • User identifies relevant documents • System refines query to get documents more like those identified by user – Add words common to relevant docs – Reposition query vector closer to relevant docs • Lather, rinse, repeat… 33

Query Expansion • Given query, add words to improve recall – Workaround for synonym Query Expansion • Given query, add words to improve recall – Workaround for synonym problem • Example – boat ® boat OR ship • Can involve user feedback or not • Can use thesaurus or other online source – Word. Net 34

Document Clustering • Group similar documents – Similar means “close in vector space” • Document Clustering • Group similar documents – Similar means “close in vector space” • If a document is relevant, return whole cluster • Can be combined with relevance feedback • GROUPER http: //www. cs. washington. edu/research/clustering 35

Clustering Algorithms • K-means Initialize k cluster centers Loop Assign all document to closest Clustering Algorithms • K-means Initialize k cluster centers Loop Assign all document to closest center Move cluster centers to better fit assignment Until little movement Clusters Cluster centers • Hierarchical Agglomerative Clustering Initialize each document to a singleton cluster Loop Merge two closest clusters Until k clusters exist e sur ers a me lust s to en c y wa etwe y an ce b M an t dis 36

Latent Semantic Indexing • Creates modified vector space • Captures transitive co-occurrence information – Latent Semantic Indexing • Creates modified vector space • Captures transitive co-occurrence information – If docs A & B don’t share any words, with each other, but both share lots of words with doc C, then A & B will be considered similar • Simulates query expansion and document clustering (sort of) 37

Variations on a Theme • Text Categorization – Assign each document to a category Variations on a Theme • Text Categorization – Assign each document to a category – Example: automatically put web pages in Yahoo hierarchy • Routing & Filtering – Match documents with users – Example: news service that allows subscribers to specify “send news about high-tech mergers” 38

Speech Recognition TO BE COMPLETED 39 Speech Recognition TO BE COMPLETED 39

Syntactic Parsing Semantic Interpretation TO BE COMPLETED 40 Syntactic Parsing Semantic Interpretation TO BE COMPLETED 40

NLP Research Areas • Morphology: structure of words • Syntactic interpretation (parsing): create a NLP Research Areas • Morphology: structure of words • Syntactic interpretation (parsing): create a parse tree of a sentence. • Semantic interpretation: translate a sentence into the representation language. – Pragmatic interpretation: incorporate current situation into account. – Disambiguation: there may be several interpretations. Choose the most probable 41

Some Difficult Examples • From the newspapers: – – Squad helps dog bite victim. Some Difficult Examples • From the newspapers: – – Squad helps dog bite victim. Helicopter powered by human flies. Levy won’t hurt the poor. Once-sagging cloth diaper industry saved by full dumps. • Ambiguities: – Lexical: meanings of ‘hot’, ‘back’. – Syntactic: I heard the music in my room. – Referential: The cat ate the mouse. It was ugly. 42

Parsing • Context-free grammars: EXPR -> NUMBER EXPR -> VARIABLE EXPR -> (EXPR + Parsing • Context-free grammars: EXPR -> NUMBER EXPR -> VARIABLE EXPR -> (EXPR + EXPR) EXPR -> (EXPR * EXPR) • (2 + X) * (17 + Y) is in the grammar. • (2 + (X)) is not. • Why do we call them context-free? 43

Using CFG’s for Parsing • Can natural language syntax be captured using a context-free Using CFG’s for Parsing • Can natural language syntax be captured using a context-free grammar? – Yes, no, sort of, for the most part, maybe. • Words: – – – nouns, adjectives, verbs, adverbs. Determiners: the, a, this, that Quantifiers: all, some, none Prepositions: in, onto, by, through Connectives: and, or, but, while. Words combine together into phrases: NP, VP 44

An Example Grammar • • S -> NP VP VP -> V NP NP An Example Grammar • • S -> NP VP VP -> V NP NP -> NAME NP -> ART N ART -> a | the V -> ate | saw N -> cat | mouse NAME -> Sue | Tom 45

Example Parse • The mouse saw Sue. 46 Example Parse • The mouse saw Sue. 46

Ambiguity • • • “Sue bought the cat biscuits” S -> NP VP VP Ambiguity • • • “Sue bought the cat biscuits” S -> NP VP VP -> V NP NP NP -> N N NP -> Det NP Det -> the V -> ate | saw | bought N -> cat | mouse |biscuits | Sue | Tom 47

Example: Chart Parsing • Three main data structures: a chart, a key list, and Example: Chart Parsing • Three main data structures: a chart, a key list, and a set of edges • Chart: Name of terminal or non-terminal length 4 3 2 1 1 2 3 Starting points 4 48

Key List and Edges • Key list: Push down stack of chart entries – Key List and Edges • Key list: Push down stack of chart entries – “the” “box” “floats” • Edges: rules that can be applied to chart entries to build up larger entries length 4 3 2 1 det the o box floats the 1 2 3 4 49

Chart Parsing Algorithm • Loop while entries in key list – 1. Remove entry Chart Parsing Algorithm • Loop while entries in key list – 1. Remove entry from key list – 2. If entry already in chart, • Add edge list • Break – 3. Add entry from key list to chart – 4. For all rules that begin with entry’s type, add an edge for that rule – 5. For all edges that need the entry next, add an extended edge (see algorithm on right) – 6. If the edge is finished, add an entry to the key list with type, start point, length, and edge list • To extend an edge with chart entry c – – Create a new edge e’ Set start (e’) to start (e) Set end(e’) to end(e) Set rule(e’) to rule(e) with “o” moved beyond c. – Set the righthandside(e’) to the righthandside(e)+c 50

Try it • • • S -> NP VP VP -> V NP -> Try it • • • S -> NP VP VP -> V NP -> Det N Det -> the N -> box V -> floats 51

Semantic Interpretation • Our goal: to translate sentences into a logical form. • But: Semantic Interpretation • Our goal: to translate sentences into a logical form. • But: sentences convey more than true/false: – It will rain in Seattle tomorrow. – Will it rain in Seattle tomorrow? • A sentence can be analyzed by: – propositional content, and – speech act: tell, ask, request, deny, suggest 52

Propositional Content • We develop a logic-like language for representing propositional content: – Word-sense Propositional Content • We develop a logic-like language for representing propositional content: – Word-sense ambiguity – Scope ambiguity • Proper names --> objects (John, Alon) • Nouns --> unary predicates (woman, house) • Verbs --> – transitive: binary predicates (find, go) – intransitive: unary predicates (laugh, cry) • Quantifiers: most, some • Example: Mary: Loves(John, Mary) 53

From Syntax to Semantics • ADD SLIDES ON SEMANTIC INTERPRETATION 54 From Syntax to Semantics • ADD SLIDES ON SEMANTIC INTERPRETATION 54

Word Sense Disambiguation • ADD SLIDES! 55 Word Sense Disambiguation • ADD SLIDES! 55

Statistical NLP • Consider the problem of tagging part-ofspeech: – “The box floats” – Statistical NLP • Consider the problem of tagging part-ofspeech: – “The box floats” – “The” Det; “Box” N; “Floats” V; • Given a sentence w(1, n), where w(i) is the i -th word, we want to find tags t(i) assigned to each word w(i) 56

The Equations • Find the t(1, n) that maximizes – P[t(1, n)|w(1, n)]=P[w(1, n)|t(1, The Equations • Find the t(1, n) that maximizes – P[t(1, n)|w(1, n)]=P[w(1, n)|t(1, n)]/P(w(1, n)) – So, only need to maximize P[w(1, n)|t(1, n)] • Assume that – A word depends only on previous tag – A tag depends only on previous tag – We have: • P[w(j)|w(1, j-1), t(1, j)]=P[w(j)|t(j)], and • P[t(j)|w(1, j-1), t(1, j-1)] = P(t(j)|t(j-1)] – Thus, want to maximize • P[w(n)|t(n-1)]*P[t(n+1)|t(n)]*P[w(n-1)|t(n-2)]*P[t(n)|t(n 1)]… 57

Example • “The box floats”: given a corpus (a training set) – Assignment one: Example • “The box floats”: given a corpus (a training set) – Assignment one: • T(1)=det, T(2) = V, T(3)=V • P(V|det) is rather low, so is P(V|V). Thus is less likely compared to – Assignment two: • T(t)=det, T(2) = N; t(3) = V • P(N|det) is high, and P(V|N) is high, thus is more likely! – In general, can use Hidden Markov Models to find probabilities floats the det box N V 58

Experiments • Charniak and Colleagues did some experiments on a collection of documents called Experiments • Charniak and Colleagues did some experiments on a collection of documents called the “Brown Corpus”, where tags are assigned by hand. • 90% of the corpus are used for training and the other 10% for testing • They show they can get 95% correctness with HMM’s. • A really simple algorithm: assign t to w by the highest probability tag P(t|w) 91% correctness! 59

Natural Language Summary • Parsing: – context free grammars with features. • Semantic interpretation: Natural Language Summary • Parsing: – context free grammars with features. • Semantic interpretation: – Translate sentences into logic-like language – Use additional domain knowledge for wordsense disambiguation. – Use context to disambiguate references. 60