Скачать презентацию EECS 595 LING 541 SI 661 761 Скачать презентацию EECS 595 LING 541 SI 661 761

0c6a1007c7ae482ba2a0d46206267ebb.ppt

  • Количество слайдов: 41

EECS 595 / LING 541 / SI 661&761 Natural Language Processing Fall 2005 Lecture EECS 595 / LING 541 / SI 661&761 Natural Language Processing Fall 2005 Lecture Notes #2

Course logistics • Instructor: Prof. Dragomir Radev (radev@umich. edu) Ph. D. , Computer Science, Course logistics • Instructor: Prof. Dragomir Radev ([email protected] edu) Ph. D. , Computer Science, Columbia University Formerly at IBM TJ Watson Research Center • Times: Thursdays 2: 40 -5: 25 PM, in 411, West Hall • Office hours: TBA, 3080 West Hall Connector Course home page: http: //www. si. umich. edu/~radev/NLP-fall 2005

Linguistic Fundamentals Linguistic Fundamentals

Syntactic categories • Substitution test: Nathalie likes { black Persian tabby small } cats. Syntactic categories • Substitution test: Nathalie likes { black Persian tabby small } cats. • Open (lexical) and closed (functional) categories: No-fly-zone yadda the in

Morphology The dog chased the yellow bird. • • • Parts of speech: eight Morphology The dog chased the yellow bird. • • • Parts of speech: eight (or so) general types Inflection (number, person, tense…) Derivation (adjective-adverb, noun-verb) Compounding (separate words or single word) Part-of-speech tagging Morphological analysis (prefix, root, suffix, ending)

Part of speech tags From Church (1991) - 79 tags NN IN AT NP Part of speech tags From Church (1991) - 79 tags NN IN AT NP JJ , NNS CC RB VB VBN VBD CS /* /* /* /* singular noun */ preposition */ article */ proper noun */ adjective */ comma */ plural noun */ conjunction */ adverb */ un-inflected verb */ verb +en (taken, looked (passive, perfect)) */ verb +ed (took, looked (past tense)) */ subordinating conjunction */

Jabberwocky (Lewis Carroll) `Twas brillig, and the slithy toves Did gyre and gimble in Jabberwocky (Lewis Carroll) `Twas brillig, and the slithy toves Did gyre and gimble in the wabe: All mimsy were the borogoves, And the mome raths outgrabe. "Beware the Jabberwock, my son! The jaws that bite, the claws that catch! Beware the Jubjub bird, and shun The frumious Bandersnatch!"

Nouns • Nouns: dog, tree, computer, idea • Nouns vary in number (singular, plural), Nouns • Nouns: dog, tree, computer, idea • Nouns vary in number (singular, plural), gender (masculine, feminine, neuter), case (nominative, genitive, accusative, dative) • Latin: filius (m), filia (f), filium (object) German: Mädchen • Clitics (‘s)

Pronouns • Pronouns: she, ourselves, mine • Pronouns vary in person, gender, number, case Pronouns • Pronouns: she, ourselves, mine • Pronouns vary in person, gender, number, case (in English: nominative, accusative, possessive, 2 nd possessive, reflexive) Mary saw her in the mirror. Mary saw herself in the mirror. • Anaphors: herself, each other

Determiners and adjectives • • • Articles: the, a Demonstratives: this, that Adjectives: describe Determiners and adjectives • • • Articles: the, a Demonstratives: this, that Adjectives: describe properties Attributive and predicative adjectives Agreement: in gender, number Comparative and superlative (derivative and periphrastic) • Positive form

Verbs • • • Actions, activities, and states (throw, walk, have) English: four verb Verbs • • • Actions, activities, and states (throw, walk, have) English: four verb forms tenses: present, past, future other inflection: number, person gerunds and infinitive aspect: progressive, perfective voice: active, passive participles, auxiliaries irregular verbs French and Finnish: many more inflections than English

Other parts of speech • Adverbs, prepositions, particles • phrasal verbs (the plane took Other parts of speech • Adverbs, prepositions, particles • phrasal verbs (the plane took off, take it off) • particles vs. prepositions (she ran up a bill/hill) • Coordinating conjunctions: and, or, but • Subordinating conjunctions: if, because, that, although • Interjections: Ouch!

Phrase structure • Constraints on word order • Constituents: NP, PP, VP, AP • Phrase structure • Constraints on word order • Constituents: NP, PP, VP, AP • Phrase structure grammars S NP PN VP V N Spot chased Det a N bird

Phrase structure • Paradigmatic relationships (e. g. , constituency) • Syntagmatic relationships (e. g. Phrase structure • Paradigmatic relationships (e. g. , constituency) • Syntagmatic relationships (e. g. , collocations) S NP That VP man VBD PP NP caught the butterfly NP IN with a net

Phrase-structure grammars Peter gave Mary a book. Mary gave Peter a book. • • Phrase-structure grammars Peter gave Mary a book. Mary gave Peter a book. • • Constituent order (SVO, SOV) imperative forms sentences with auxiliary verbs interrogative sentences declarative sentences start symbol and rewrite rules context-free view of language

Sample phrase-structure grammar S NP NP NP VP VP VP P NP AT AT Sample phrase-structure grammar S NP NP NP VP VP VP P NP AT AT NP VP VBD IN VP NNS NN PP PP NP NP AT NNS NNS VBD VBD IN IN NN the children students mountains slept ate saw in of cake

Phrase structure grammars • Local dependencies • Non-local dependencies • Subject-verb agreement The women Phrase structure grammars • Local dependencies • Non-local dependencies • Subject-verb agreement The women who found the wallet were given a reward. • wh-extraction Should Peter buy a book? Which book should Peter buy? • Empty nodes

Dependency: arguments and adjuncts Sue watched the man at the next table. • Event Dependency: arguments and adjuncts Sue watched the man at the next table. • Event + dependents (verb arguments are usually NPs) • agent, patient, instrument, goal - semantic roles • subject, direct object, indirect object • transitive, intransitive, and ditransitive verbs • active and passive voice

Subcategorization • Arguments: subject + complements • adjuncts vs. complements • adjuncts are optional Subcategorization • Arguments: subject + complements • adjuncts vs. complements • adjuncts are optional and describe time, place, manner… • subordinate clauses • subcategorization frames

Subcategorization Subject: The children eat candy. Object: The children eat candy. Prepositional phrase: She Subcategorization Subject: The children eat candy. Object: The children eat candy. Prepositional phrase: She put the book on the table. Predicative adjective: We made the man angry. Bare infinitive: She helped me walk. To-infinitive: She likes to walk. Participial phrase: She stopped singing that tune at the end. That-clause: She thinks that it will rain tomorrow. Question-form clauses: She asked me what book I was reading.

Subcategorization frames • • Intransitive verbs: The woman walked Transitive verbs: John loves Mary Subcategorization frames • • Intransitive verbs: The woman walked Transitive verbs: John loves Mary Ditransitive verbs: Mary gave Peter flowers Intransitive with PP: I rent in Paddington Transitive with PP: She put the book on the table Sentential complement: I know that she likes you Transitive with sentential complement: She told me that Gary is coming on Tuesday

Selectional restrictions and preferences • Subcategorization frames capture syntactic regularities about complements • Selectional Selectional restrictions and preferences • Subcategorization frames capture syntactic regularities about complements • Selectional restrictions and preferences capture semantic regularities: bark, eat

Phrase structure ambiguity • Grammars are used for generating and parsing sentences • Parses Phrase structure ambiguity • Grammars are used for generating and parsing sentences • Parses • Syntactic ambiguity • Attachment ambiguity: Our company is training workers. • The children ate the cake with a spoon. • High vs. low attachment • Garden path sentences: The horse raced past the barn fell. Is the book on the table red?

Ungrammaticality vs. semantic abnormality * Slept children the. # Colorless green ideas sleep furiously. Ungrammaticality vs. semantic abnormality * Slept children the. # Colorless green ideas sleep furiously. # The cat barked.

Semantics and pragmatics • Lexical semantics and compositional semantics • Hypernyms, hyponyms, antonyms, meronyms Semantics and pragmatics • Lexical semantics and compositional semantics • Hypernyms, hyponyms, antonyms, meronyms and holonyms (part-whole relationship, tire is a meronym of car), synonyms, homonyms • Senses of words, polysemous words • Homophony (bass). • Collocations: white hair, white wine • Idioms: to kick the bucket

Discourse analysis • Anaphoric relations: 1. Mary helped Peter get out of the car. Discourse analysis • Anaphoric relations: 1. Mary helped Peter get out of the car. He thanked her. 2. Mary helped the other passenger out of the car. The man had asked her for help because of his foot injury. • Information extraction problems (entity crossreferencing) Hurricane Hugo destroyed 20, 000 Florida homes. At an estimated cost of one billion dollars, the disaster has been the most costly in the state’s history.

Pragmatics • The study of how knowledge about the world and language conventions interact Pragmatics • The study of how knowledge about the world and language conventions interact with literal meaning. • Speech acts • Research issues: resolution of anaphoric relations, modeling of speech acts in dialogues

Other areas of NLP • Linguistics is traditionally divided into phonetics, phonology, morphology, syntax, Other areas of NLP • Linguistics is traditionally divided into phonetics, phonology, morphology, syntax, semantics, and pragmatics. • Sociolinguistics: interactions of social organization and language. • Historical linguistics: change over time. • Linguistic typology • Language acquisition • Psycholinguistics: real-time production and perception of language

Word classes and part-of-speech tagging Word classes and part-of-speech tagging

Part of speech tagging • • Problems: transport, object, discount, address More problems: content Part of speech tagging • • Problems: transport, object, discount, address More problems: content French: est, président, fils “Book that flight” – what is the part of speech associated with “book”? • POS tagging: assigning parts of speech to words in a text. • Three main techniques: rule-based tagging, stochastic tagging, transformation-based tagging

Rule-based POS tagging • Use dictionary or FST to find all possible parts of Rule-based POS tagging • Use dictionary or FST to find all possible parts of speech • Use disambiguation rules (e. g. , ART+V) • Typically hundreds of constraints can be designed manually

Example in French <S> ^ beginning of sentence La rf b nms u article Example in French ^ beginning of sentence La rf b nms u article teneur nfs nms noun feminine singular Moyenne jfs nfs v 1 s v 2 s v 3 s adjective feminine singular en p a b preposition uranium nms noun masculine singular des p r preposition rivi`eres nfp noun feminine plural , x punctuation bien_que cs subordinating conjunction délicate jfs adjective feminine singular À p preposition calculer v verb

Sample rules BS 3 BI 1: A BS 3 (3 rd person subject personal Sample rules BS 3 BI 1: A BS 3 (3 rd person subject personal pronoun) cannot be followed by a BI 1 (1 st person indirect personal pronoun). In the example: ``il nous faut'' ({it we need}) - ``il'' has the tag BS 3 MS and ``nous'' has the tags [BD 1 P BI 1 P BJ 1 P BR 1 P BS 1 P]. The negative constraint ``BS 3 BI 1'' rules out ``BI 1 P'', and thus leaves only 4 alternatives for the word ``nous''. N K: The tag N (noun) cannot be followed by a tag K (interrogative pronoun); an example in the test corpus would be: ``. . . fleuve qui. . . '' (. . . river, that. . . ). Since ``qui'' can be tagged both as an ``E'' (relative pronoun) and a ``K'' (interrogative pronoun), the ``E'' will be chosen by the tagger since an interrogative pronoun cannot follow a noun (``N''). R V: A word tagged with R (article) cannot be followed by a word tagged with V (verb): for example ``l' appelle'' (calls him/her). The word ``appelle'' can only be a verb, but ``l''' can be either an article or a personal pronoun. Thus, the rule will eliminate the article tag, giving preference to the pronoun.

Stochastic POS tagging • HMM tagger • Pick the most likely tag for this Stochastic POS tagging • HMM tagger • Pick the most likely tag for this word • P(word|tag) * P(tag|previous n tags) – find tag sequence that maximizes the probability formula • A bigram-based HMM tagger chooses the tag ti for word wi that is most probable given the previous tag ti-1 and the current word wi: • ti = argmaxj P(tj|ti-1, wi) • ti = argmaxj P(tj|ti-1)P(wi|tj) : HMM equation for a single tag

Example • Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/ADV • People/NNS continue/VBP to/TO inquire/VB the/DT Example • Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/ADV • People/NNS continue/VBP to/TO inquire/VB the/DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN • P(VB|TO)P(race|VB) • P(NN|TO)P(race|NN) • TO: to+VB (to sleep), to+NN (to school)

Example (cont’d) • • • P(NN|TO) =. 021 P(VB|TO) =. 34 P(race|NN) =. 00041 Example (cont’d) • • • P(NN|TO) =. 021 P(VB|TO) =. 34 P(race|NN) =. 00041 P(race|VB) =. 00003 P(VB|TO)P(race|VB) =. 00001 P(NN|TO)P(race|NN) =. 000007

HMM Tagging • T = argmax P(T|W), where T=t 1, t 2, …, tn HMM Tagging • T = argmax P(T|W), where T=t 1, t 2, …, tn • By Bayes’ rule: P(T|W) = P(T)P(W|T)/P(W) • Thus we are attempting to choose the sequence of tags that maximizes the rhs of the equation • P(W) can be ignored • P(T)P(W|T) = -1 ti-1) P(w |w t …w i 1 1 i-1 ti)P(ti|w 1 t 1…wi

Transformation-based learning • • P(NN|race) =. 98 P(VB|race) =. 02 Change NN to VB Transformation-based learning • • P(NN|race) =. 98 P(VB|race) =. 02 Change NN to VB when the previous tag is TO Types of rules: – – – The preceding (following) word is tagged z The word two before (after) is tagged z One of the two preceding (following) words is tagged z One of the three preceding (following) words is tagged z The preceding word is tagged z and the following word is tagged w

Confusion matrix IN JJ IN - . 2 JJ . 2 - 3. 3 Confusion matrix IN JJ IN - . 2 JJ . 2 - 3. 3 8. 7 - NN NN NNP RB VBD VBN . 7 NNP . 2 3. 3 4. 1 RB 2. 2 2. 0 . 5 VBD . 3 . 5 VBN 2. 8 2. 1 1. 7 . 2 2. 7. 2 - 4. 4 2. 6 - Most confusing: NN vs. NNP vs. JJ, VBD vs. VBN vs. JJ

Readings • J&M Chapters 1, 2, 3, 8 • “What is Computational Linguistics” by Readings • J&M Chapters 1, 2, 3, 8 • “What is Computational Linguistics” by Hans Uszkoreit http: //www. coli. uni-sb. de/~hansu/what_is_cl. html • Lecture notes #1

Readings • J&M Chapters 3, 8 • Lecture notes #2 Readings • J&M Chapters 3, 8 • Lecture notes #2