Скачать презентацию Anaphora Resolution Sobha Lalitha Devi AU-KBC Research Centre Скачать презентацию Anaphora Resolution Sobha Lalitha Devi AU-KBC Research Centre

b4729834a5a1f0b36a9f149ff5670e6b.ppt

  • Количество слайдов: 100

Anaphora Resolution Sobha Lalitha Devi AU-KBC Research Centre MIT Campus of Anna University Chennai-44 Anaphora Resolution Sobha Lalitha Devi AU-KBC Research Centre MIT Campus of Anna University Chennai-44 [email protected] org 05/24/2011 Summer School, IIIT HYderabad

Contents n n n Introduction to Anaphora and Anaphora Resolution Types of Anaphora Process Contents n n n Introduction to Anaphora and Anaphora Resolution Types of Anaphora Process of Anaphora Resolution Tools Applications References 05/24/2011 Summer School, IIIT HYderabad 2

What is Cohesion COHESION is the internal continuity or network of points of continuity What is Cohesion COHESION is the internal continuity or network of points of continuity within a text. Text is not just a string of sentences. It is not simply a large grammatical unit “something of the same kind as a sentence, but differing from it in size- a sort of super-sentence· A semantic unit” Halliday & Hassan 05/24/2011 Summer School, IIIT HYderabad 3

Cohesive Relationships Cohesive relationships between words and sentences have certain definable qualities that allow Cohesive Relationships Cohesive relationships between words and sentences have certain definable qualities that allow us to recognize the super sentence Nature of cohesive relation Type of cohesion Relatedness of form Substitution and ellipsis; Lexical collocation Relatedness of reference Reference, Lexical reiteration Semantic connection Conjunction 05/24/2011 Summer School, IIIT HYderabad 4

Relatedness of Form n n n Substitution: Relatedness of Form n n n Substitution: "Nice teapots! I'll take one. “ Ellipsis: "Turn on. Tune in. Drop out. " ['you' is elided] Collocation: "John went to the bank. He wanted to swim in the river. " ['river' disambiguates 'bank'] 05/24/2011 Summer School, IIIT HYderabad 5

Relatedness of Reference Exophora: (extra linguistic feature: deitic markers , this that) “what is Relatedness of Reference Exophora: (extra linguistic feature: deitic markers , this that) “what is this? ” Anaphora: "I used to have the key. But I lost it. ” Cataphora: "It is your turn, John” Reiteration: "He speaks only to the Huxleys; the Huxleys speak only to the Darwins; and the Darwins speak only to God. “ 05/24/2011 Summer School, IIIT HYderabad 6

Semantic Connection Conjunction: Semantic Connection Conjunction: "You tell me that you've got ev'rything you want, and your bird can sing, but you don't get me, you don't get me! You say you've seen seven wonders, and your bird is green, but you can't see me, you can't see me! When your prized possessions, start to tear you down, then look in my direction, I'll be round. " [Beatles -- Lee Campbell] 05/24/2011 Summer School, IIIT HYderabad 7

Comparison Halliday & Hassan also classify comparison as a form of cohesion Comparison Halliday & Hassan also classify comparison as a form of cohesion "She's more fun than a barrel of monkeys!" "He's as tall as a six foot four inch tree. " 05/24/2011 Summer School, IIIT HYderabad 8

CONTEXT DEPENDENCE The interpretation of most expressions depends on the context in which they CONTEXT DEPENDENCE The interpretation of most expressions depends on the context in which they are used n Developing methods for interpreting context dependent expressions useful in many applications n We focus here on dependence of nominal expressions on context introduced LINGUISTICALLY, for which we will use the term ANAPHORA n 05/24/2011 Summer School, IIIT HYderabad 9

Introduction What is Anaphora Antecedent Anaphora Resolution 1. Sabeer Bhatia arrived at Los Angeles Introduction What is Anaphora Antecedent Anaphora Resolution 1. Sabeer Bhatia arrived at Los Angeles International Airport at 6 p. m. on September 23, 1998. His flight from Bangalore had taken 22 hrs and he was starving. [RD, NOV 2000] 05/24/2011 Summer School, IIIT HYderabad 10

Etymology of Anaphora ANA- Back, Upstream, Back upstream Phora- Act of Carrying Anaphora - Etymology of Anaphora ANA- Back, Upstream, Back upstream Phora- Act of Carrying Anaphora - Act of Carrying Back 05/24/2011 Summer School, IIIT HYderabad 11

What is Anaphora, in discourse, is a device for making an abbreviated reference (containing What is Anaphora, in discourse, is a device for making an abbreviated reference (containing fewer bits of disambiguating information, rather than being lexically or phonetically shorter) to some entity (or entities) in the expectation that the receiver of the discourse will be able to disabbreviate the reference and, thereby, determine the identity of the entity. (Hirst 1981) 05/24/2011 Summer School, IIIT HYderabad 12

Cataphora n When “anphora” precedes the antecedent Because she was going to the departmental Cataphora n When “anphora” precedes the antecedent Because she was going to the departmental store, Mary was asked to pick up the vegetables. 05/24/2011 Summer School, IIIT HYderabad 13

Relevance from the Linguistics point of view n n n Binding Theory is one Relevance from the Linguistics point of view n n n Binding Theory is one of the major results of the principles and parameters approach developed in Chomsky (1981) and is one of the mainstays of generative linguistics. The Binding Theory deals with the relations between nominal expressions and possible antecedents. It attempts to provide a structural account of the complementarity of distribution between pronouns, reflexives and R-expressions. 05/24/2011 Summer School, IIIT HYderabad 14

Dichotomy Between Linguistic and NLP n n The Binding Theory (and its various formulations) Dichotomy Between Linguistic and NLP n n The Binding Theory (and its various formulations) deals only with intra-sentential anaphora, A very small subset of the anaphoric phenomenon that practical NLP systems are interested in resolving. A much larger set of anaphoric phenomenon is the resolution of pronouns inter-sententially. This problem is dealt with by Discourse Representation Theory and more specifically by Centering Theory (Grosz et al. , 1995). . 05/24/2011 Summer School, IIIT HYderabad 15

Types of Anaphors The Prime Minister is yet to arrive and he is expected Types of Anaphors The Prime Minister is yet to arrive and he is expected at the central hall at any time. [The Times of India, Feb 2001] This book is about Anaphora Resolution. The book is designed to help beginners in the field and its author hopes that it will be useful. VP Anaphor John screamed, as did Mary. 05/24/2011 Summer School, IIIT HYderabad 16

Pronominal anaphora Vajpayee hits back forcefully when he told the opposition today “sometimes we Pronominal anaphora Vajpayee hits back forcefully when he told the opposition today “sometimes we fall prey to the media and sometimes you do. [Indian Express 2001] Possessive Priyanka eats only chicken sandwiches before going to take any exam; nothing else goes down her gullet that day. [Indian Express, 13 March 2001] 05/24/2011 Summer School, IIIT HYderabad 17

Reflexive Pronoun Finally , Danian heaved himself up and lay on a waiting stretcher. Reflexive Pronoun Finally , Danian heaved himself up and lay on a waiting stretcher. Demonstrative Pronoun John had lots of packing to do before he shifted his house. This was something he never liked…. Relative Pronoun Stumper Sameer Dige, who made his test debut, failed to show fast reflexives when it mattered. 05/24/2011 Summer School, IIIT HYderabad 18

Non Anaphoric Usage of Pronouns. Pleonastic It Cognative a. It is believed that…. . Non Anaphoric Usage of Pronouns. Pleonastic It Cognative a. It is believed that…. . b. It appears that…. . Modal Adjectives c. It is dangerous…… d. It is important…. . Temporal e. It is five o’clock f. It is winter Weather verbs g. It is raining f. It is snowing Distance h. How far it is to Chennai? 05/24/2011 Summer School, IIIT HYderabad 19

Non-anaphoric uses of pronouns He that plants thorns must never expect to gather roses. Non-anaphoric uses of pronouns He that plants thorns must never expect to gather roses. He who dares wins. Deictic He seems remarkably bright for a child of his age. 05/24/2011 Summer School, IIIT HYderabad 20

Noun Phrase Anaphora Definite descriptions and Proper names Roy Kaene has warned Manchester United Noun Phrase Anaphora Definite descriptions and Proper names Roy Kaene has warned Manchester United he may snub their pay deal. United’s skipper is even hinting that unless the future Old Trafford Package meets his demands, he could quit the club in June 2000. Irishman Keane, 27, still has 17 months to run on his current 23, 000 pound a week contract and wants to commit himself to United for life. Alex Ferguson’s No 1 player confirmed: If it’s not the contract I want, I won’t sign”. 05/24/2011 Summer School, IIIT HYderabad 21

Coreference Computational Linguists from many different countries attended the tutorial. The participants found it Coreference Computational Linguists from many different countries attended the tutorial. The participants found it hard to cope with the speed of the presentation, nevertheless they manages to take extensive notes. 05/24/2011 Summer School, IIIT HYderabad 22

Coreference Sophia Loren says she will always be grateful to Bono. The actress revealed Coreference Sophia Loren says she will always be grateful to Bono. The actress revealed that the U 2 singer helped her calm down when she became scared by a thunderstorm while travelling by a plane. She=> Sophia Loren The actress=> Sophia Loren The U 2 Singer=> Bono Her=>Sophia Loren She=>Sophia Loren 05/24/2011 Summer School, IIIT HYderabad 23

Coreference chain Sophia Loren says she will always be grateful to Bono. The actress Coreference chain Sophia Loren says she will always be grateful to Bono. The actress revealed that the U 2 singer helped her calm down when she became scared by a thunderstorm while travelling by a plane. Coreference chains n {Sophia Loren, she, the actress, her, she} n {Bono, the U 2 singer} 05/24/2011 Summer School, IIIT HYderabad 24

Chains of object mentions in text Toni Johnson pulls a tape measure across the Chains of object mentions in text Toni Johnson pulls a tape measure across the front of what was once a stately Victorian home. A deep trench now runs along its north wall, exposed when the house lurched two feet off its foundation during last week's earthquake. Once inside, she spends nearly four hours measuring and diagramming each room in the 80 -year-old house, gathering enough information to estimate what it would cost to rebuild it. While she works inside, a tenant returns with several friends to collect furniture and clothing. One of the friends sweeps broken dishes and shattered glass from a countertop and starts to pack what can be salvaged from the kitchen. (WSJ section of Penn Treebank corpus) 05/24/2011 Summer School, IIIT HYderabad 25

What is Anaphora Resolution n The Process of finding the antecedent for an Anaphor What is Anaphora Resolution n The Process of finding the antecedent for an Anaphor is Anaphora resolution ¡ ¡ 05/24/2011 Anaphor-The reference that point to the previous item. Antecedent-The entity to which the anaphor refers Summer School, IIIT HYderabad 26

RESEARCH ON ANAPHORA RESOLUTION: A QUICK SUMMARY 1970 -1995 Primarily theoretical Emphasis: commonsense knowledge, RESEARCH ON ANAPHORA RESOLUTION: A QUICK SUMMARY 1970 -1995 Primarily theoretical Emphasis: commonsense knowledge, salience Exception: Hobbs 1977, Shalom Lappin 1995 -2005 First annotated corpora to be used to develop, evaluate and compare systems (MUC, Geand Charniak, ACE) First robust systems Heuristic-based: Mitkov ML: Vieira & Poesio 1998, 2000; Soon et al 2001, Ng and Cardie 2002 Emphasis: surface features Exceptions: Poesio & Vieira, Harabagiu, Markert 2005 -present More sophisticated ML techniques (global models, kernels) Richer features –especially semantic. HYderabad information 05/24/2011 Summer School, IIIT First tools 27

Application of Anaphora Resolution Tasks that require determining the coherence of (segments of) text Application of Anaphora Resolution Tasks that require determining the coherence of (segments of) text Segmentation Post-hoc coherence check in summarization (Steinberger et al, 2007) Tasks that require identifying the most important information ina text Sentence selection in summarization (Steinberger et al 2005, 2007) Indexing Information extraction: recognize which expressions refer to objects in the domain Relation extraction from biomedical text (Sanchez-Grailletand Poesio, 2006, 2007) Multimodal interfaces: recognize which objects in the visual scene are being referred to 05/24/2011 Summer School, IIIT HYderabad 28

Different Approaches In Anaphora Resolution n Rule Based Statistical Based Machine Learning Based 05/24/2011 Different Approaches In Anaphora Resolution n Rule Based Statistical Based Machine Learning Based 05/24/2011 Summer School, IIIT HYderabad 29

Rule Based n Hobbs system 05/24/2011 Summer School, IIIT HYderabad 30 Rule Based n Hobbs system 05/24/2011 Summer School, IIIT HYderabad 30

Hard Constraints on Coreference n n n Number agreement Person and case Gender Agreement Hard Constraints on Coreference n n n Number agreement Person and case Gender Agreement Syntactic Agreement Selectional Restrictions 05/24/2011 Summer School, IIIT HYderabad 31

Number agreement Singular Plural Unspecifie d She, her, he, him, his, it We, us, Number agreement Singular Plural Unspecifie d She, her, he, him, his, it We, us, they, them you John and Mary loaned Sue a cup of coffee. Little did they know the magnitude of her addiction. 05/24/2011 Summer School, IIIT HYderabad 32

Person and Case Agreement First Nominative I, we Secon Third d you he, she, Person and Case Agreement First Nominative I, we Secon Third d you he, she, they Accusative me, us you Him, her, the m Genitive my, ou your r His, her, their 05/24/2011 Summer School, IIIT HYderabad 33

Gender Agreement *John has a coffee machine. She loves it. 05/24/2011 Summer School, IIIT Gender Agreement *John has a coffee machine. She loves it. 05/24/2011 Summer School, IIIT HYderabad 34

Syntactic Agreement n Reflexives (himself, herself…) have strong constraints on what syntactic positions they Syntactic Agreement n Reflexives (himself, herself…) have strong constraints on what syntactic positions they can appear in John bought himself a cup of coffee. *John bought him a cup of coffee. 05/24/2011 Summer School, IIIT HYderabad 35

Selectional Constraints Jim bought a coffee from the store. He drank it quickly. 05/24/2011 Selectional Constraints Jim bought a coffee from the store. He drank it quickly. 05/24/2011 Summer School, IIIT HYderabad 36

Also : Preferences n n n Recency Grammatical Role Repeated Mention Parallelism Verb Semantics Also : Preferences n n n Recency Grammatical Role Repeated Mention Parallelism Verb Semantics Ø 05/24/2011 Based on Salience Summer School, IIIT HYderabad 37

Recency John had a pop-tart. Bill had a jelly donut. Mary wanted it. Recent Recency John had a pop-tart. Bill had a jelly donut. Mary wanted it. Recent Entities are more salient 05/24/2011 Summer School, IIIT HYderabad 38

Grammatical Role “Sue bought a cup of coffee and a donut from Jane. She Grammatical Role “Sue bought a cup of coffee and a donut from Jane. She met John as she left. ” n Entities in subject position are more salient 05/24/2011 Summer School, IIIT HYderabad 39

Repeated Mention John went to the store to buy coffee. He loves coffee. He Repeated Mention John went to the store to buy coffee. He loves coffee. He drinks 5 cups a day. At the store, Bill sold him a cup. He was delighed. n Entities mentioned more frequently are more salient 05/24/2011 Summer School, IIIT HYderabad 40

Parallelism John bought coffee from Jim in the morning. Sue bought coffee from him Parallelism John bought coffee from Jim in the morning. Sue bought coffee from him in the evening. n Even with preferences to the contrary (grammatical role) the syntactic parallelism strongly prefers [him = Jim] 05/24/2011 Summer School, IIIT HYderabad 41

Verb Semantics John telephoned Bill. He was jonesing for coffee. John criticized Bill. He Verb Semantics John telephoned Bill. He was jonesing for coffee. John criticized Bill. He was jonesing for coffee. n Perhaps salience of different elements in the sentence changes with respect to the verb used. 05/24/2011 Summer School, IIIT HYderabad 42

Algorithms --- How to integrate these preferences? n Constraints are easy to use : Algorithms --- How to integrate these preferences? n Constraints are easy to use : reject all hypothesis which violate the hard constraints (if you can accurately detect the constraints!) n Preferences more difficult – how can one integrate these different preferences? 05/24/2011 Summer School, IIIT HYderabad 43

Hobbs Tree Search Algorithm n Given parse trees, search them in a specific order Hobbs Tree Search Algorithm n Given parse trees, search them in a specific order to find the most likely referent 05/24/2011 Summer School, IIIT HYderabad 44

Hobbs in Detail 1. Begin at NP 2. Go up tree to first NP Hobbs in Detail 1. Begin at NP 2. Go up tree to first NP or S. Call this X, and the path p. 3. Traverse all branches below X to the left of p. Propose as antecedent any NP that has a NP or S between it and X 4. If X is the highest S in the sentence, traverse the parse trees of the previous sentences in the order of recency. Traverse left-to-right, breadth first. When a NP is encountered, propose as antecedent. If not the highest node, go to step 5. 05/24/2011 Summer School, IIIT HYderabad 45

Hobbs cont. 5. 6. 7. 8. From node X, go up the tree to Hobbs cont. 5. 6. 7. 8. From node X, go up the tree to the first NP or S. Call it X, and the path p. If X is an NP and the path to X did not pass through the nominal that X dominates, propose X as antecedent Traverse all branches below X to the right of the path, in a left-to-right, breadth first manner. Propose any NP encountered as the antecdent If X is an S node, traverse all brnaches of X to the right of the path but do not go below any NP or S encountered. Propose any NP as the antecedent. 05/24/2011 Summer School, IIIT HYderabad 46

Lappin and Leass (1994) Anaphora Resolution Algorithm n The Lappin and Leass(1994) anaphora resolution Lappin and Leass (1994) Anaphora Resolution Algorithm n The Lappin and Leass(1994) anaphora resolution algorithm uses salience weight in determining the antecedent to the pronominals. n It requires as input a fully parsed sentence structure and uses hierarchy in identifying the subject, object etc. n This algorithm uses syntactic criteria to rule out noun phrases that cannot possibly corefer with it. n The antecedent is then chosen according to a ranking based on salience weights. 05/24/2011 Summer School, IIIT HYderabad 47

The salience Factors and Weights A pronoun P is non-coreferential with a (non-reflexive or The salience Factors and Weights A pronoun P is non-coreferential with a (non-reflexive or nonreciprocal) noun phrase N if any of the following conditions hold: u u u P and N have incompatible agreement features. P is in the argument domain of N. P is in the adjunct domain of N. P is an argument of a head H, N is not a pronoun, and N is contained in H. P is in the NP domain of N. P is a determiner of a noun Q, and N is contained in Q. 05/24/2011 Summer School, IIIT HYderabad 48

Examples Condition 1: The woman said that he is funny. Condition 2: She likes Examples Condition 1: The woman said that he is funny. Condition 2: She likes her. John seems to want to see him. Condition 3: She sat near her. Condition 4: He believes that the man is amusing. This is the man he said John wrote about. Condition 5: John’s portrait Summer School, IIIT HYderabad of him is interesting. 05/24/2011 49

Salience Factors and Weights Salience factor types with initial weights Factor type Initial weight Salience Factors and Weights Salience factor types with initial weights Factor type Initial weight Sentence recency 100 Subject emphasis 80 Existential emphasis 70 Accusative emphasis 50 Indirect object and oblique complement emphasis 40 Head noun emphasis 80 Non-adverbial emphasis IIIT HYderabad 50 05/24/2011 Summer School, 50

Kennedy 1996 The linguistic analysis for anaphora resolution includes The output of a part Kennedy 1996 The linguistic analysis for anaphora resolution includes The output of a part of speech tagger, Augmented with syntactic function annotations for each input token; Using LINGSOFT 05/24/2011 Summer School, IIIT HYderabad 51

A set of patterns are used for identifying n n n The NP Chunking A set of patterns are used for identifying n n n The NP Chunking with position of the NP in the text: Nominal Sequencing in two subordinate syntactic environments: a. in an adverbial adjunct b. in an NP (i. e. containment in a prepositional or clausal complement of a noun, or containment in a relative clause) Expletive “it”: 05/24/2011 Summer School, IIIT HYderabad 52

Anaphora Resolution n Uses Lappin and Lease algorithm SENT-S: CNTX-S: SUBJ-S: EXST-S: POSS-S: ACC-S: Anaphora Resolution n Uses Lappin and Lease algorithm SENT-S: CNTX-S: SUBJ-S: EXST-S: POSS-S: ACC-S: DAT-S: OBLQ-S: HEAD-S: ARG-S: 05/24/2011 100 iff in the current sentence 50 iff in the current context 80 iff GFUN = subject 70 iff in an existential construction 65 iff GFUN = possessive 50 iff GFUN = direct object 40 iff GFUN = indirect object 30 iff the complement of a preposition 80 iff EMBED = NIL 50 iff ADJUNCT IIIT HYderabad = NIL Summer School, 53

Mitkov 1997 n No Parsing of the Input Sentence n Boosting indicators ¡ First Mitkov 1997 n No Parsing of the Input Sentence n Boosting indicators ¡ First Noun Phrases: A score of +1 is assigned to the first NP in a sentence. ¡ Indicating Verbs: A score of +1 is assigned to those NPs immediately following a verb which is a member of a predefined set (including verbs such as discuss, present, illustrate, identify, summarise, examine, describe, define, show, check, develop, review, 05/24/2011 Summer School, IIIT HYderabad 54

MARS Cont…. ¡ ¡ 05/24/2011 Lexical Reiteration: A score of +2 is assigned to MARS Cont…. ¡ ¡ 05/24/2011 Lexical Reiteration: A score of +2 is assigned to those NPs repeated twice or more in the paragraph in which the pronoun appears, a score of +1 is assigned to those NPs repeated once in that paragraph. Section Heading Preference: A score of +1 is assigned to those NPs that also occur in the heading of the section in which the pronoun appears. Summer School, IIIT HYderabad 55

Boosting indicators contd. . Collocation Match: A score of +2 is assigned to those Boosting indicators contd. . Collocation Match: A score of +2 is assigned to those NPs that have an identical collocation pattern to the pronoun. ¡ Immediate Reference: A score of +2 is assigned to those NPs appearing in constructions of the form “… (You) V 1 NP … con (you) V 2 it (con (you) V 3 it)”, where con Є {and/or/before/after…}. ¡ Sequential Instructions: A score of +2 is applied to NPs in the NP 1 position of constructions of the form: “To V 1 NP 1 V 2 NP 2. (Sentence). To V 3 it, V 4 NP 4“ the noun phrase NP 1 is the likely antecedent of the anaphor it (NP 1 is assigned a score of 2). ¡ Term Preference: A score of +1 is applied to those NPs identified as representing terms in the genre of the text. ¡ 05/24/2011 Summer School, IIIT HYderabad 56

Impeding indicators ¡ Indefiniteness: Indefinite NPs are assigned a score of -1. ¡ Prepositional Impeding indicators ¡ Indefiniteness: Indefinite NPs are assigned a score of -1. ¡ Prepositional Noun Phrases: NPs appearing in prepositional phrases are assigned a score of -1. 05/24/2011 Summer School, IIIT HYderabad 57

Indian Language n Types of anaphors ¡ Dravidian type with gender marked pronouns n Indian Language n Types of anaphors ¡ Dravidian type with gender marked pronouns n ¡ Aryan Language Type without gender marking in the pronouns n n 05/24/2011 avan, aval, atu (example) Telugu does not have all other Dravidian languages have. “us” as in Hindi (example) FROM CHAPTER Summer School, IIIT HYderabad 58

<ANT 1, 2, 3> uutti </ANT> oru alakiya malai nakaram. Ooty(N) one(Q) beautiful(ADJ) hill(N) uutti oru alakiya malai nakaram. Ooty(N) one(Q) beautiful(ADJ) hill(N) town(N). (Ooty is a beautiful hill station. ) ithu malaikalin raani. This hill(N)+GEN queen(N). (This is Queen of hills. ) anku puungkaa, pataku cavaari, malai rayil untu. There park(N), boating(VBN) hill train(N) present(V+PRESENT+3 S) (There are park, boating, hill train ) athu oru cirantha currulaath thalam. it(PN) one(Q) best(ADJ) tourist(N) place(N). (It is a best tourist place. ) 05/24/2011 Summer School, IIIT HYderabad 59

“Vasisth” a Rule Based Anaphora Resolution System 1. mo: han(i) avan. Re(i) kuttiye mohan “Vasisth” a Rule Based Anaphora Resolution System 1. mo: han(i) avan. Re(i) kuttiye mohan he-poss child-acc (Mohan saw his child. ) 2. mo: han(i) avan. Re(i) kuttiye kantu mohan he-poss kantu. see-pst ennu k. Risnan pa. Rannu. child-acc see-pst compl krishnan say-pst (Krishnan said that Mohan saw his child. ) 3. *mo: han(i) avane(i) aticcu. mohan he-acc beat-pst (Mohan beat him. ) 4. mo: han avane(i) aticcu ennu k. Risnan(i) pa. Rannu. mohan he-acc beat-pst compl krishnan say-pst (Krishnan said that Mohan beat him. ) 05/24/2011 Summer School, IIIT HYderabad 60

The Algorithm for Intra-sentential Anaphora A pronoun P is coreferential with an NP iff The Algorithm for Intra-sentential Anaphora A pronoun P is coreferential with an NP iff the following conditions hold: a. P and NP have compatible P, N, G features. b. P does not precede NP. c. If P is possessive, then NP is the subject of the clause which contains P. d. If P is non-possessive, then NP is the subject of the immediate clause which does not contain P. 05/24/2011 Summer School, IIIT HYderabad 61

n n Vasisth is a multilingual Anaphora Resolution system Rule based With minimum Parsing n n Vasisth is a multilingual Anaphora Resolution system Rule based With minimum Parsing Exploit the Morphology of Indian Languages 05/24/2011 Summer School, IIIT HYderabad 62

“VASISTH” Using Salience Measure for Indian Languages No In-depth Parsing Exploit the Rich Morphology “VASISTH” Using Salience Measure for Indian Languages No In-depth Parsing Exploit the Rich Morphology of the Language The analysis depends on the salience weight of the candidate (NP) for the antecedent-hood of an anaphor from a list of probable candidates. 05/24/2011 Summer School, IIIT HYderabad 63

The salience weight assignment a) The current sentence gets a score of 50 and The salience weight assignment a) The current sentence gets a score of 50 and it reduces by 10 for each preceding sentence till it reaches the fifth sentence. The system considers five sentences for identifying the antecedent. b) The current clause gets a score of 75 if the pronoun present in the clause is a possessive pronoun and if it is a nonpossessive pronoun it gets zero score. c) The immediate clause gets the score 70 in the case of Possessive pronoun and gets a score of 75 for nonpossessive pronouns. d) For non-immediate clause, the possessive pronoun gets a score of 30 and non-possessive pronoun gets a score of 65. 05/24/2011 Summer School, IIIT HYderabad 64

e)The analysis showed that the subject could be the most probable antecedent for the e)The analysis showed that the subject could be the most probable antecedent for the pronoun. The case markings the subject of a sentence could take are nominative and dative. A Nominative, a Dative and a Possessive NP with a nominative/Dative head could become a subject of a sentence. 05/24/2011 Summer School, IIIT HYderabad 65

f) The direct object of a sentence could be identified by the case markings f) The direct object of a sentence could be identified by the case markings and all the case markings other than the subject are considered for object. The next most probable NP for antecedent-hood is the direct object and hence it gets a score of 40. g) The third NP in a clause, which is not identified as the subject or object, is considered as the indirect object and gets a low score of 30. 05/24/2011 Summer School, IIIT HYderabad 66

Salience factor weights for Indian Languages Salience Factors Current sentence Possessive Current clause Immediate Salience factor weights for Indian Languages Salience Factors Current sentence Possessive Current clause Immediate clause Non-immediate clause Non-Possessive Current clause Immediate clause Non-immediate clause Possessive and Non-Possessive N. Nom N. Poss N. Dat 05/24/2011 N. Acc, Loc, Instr… N. others(3 rd NP) Weights 50 - Reduced by 10 for preceding sentences upto 5 th sentence 75 70 30 0 75 65 80 50 50 Summer School, IIIT HYderabad 67 40 30

How it works The salience weight to an NP is assigned in the following How it works The salience weight to an NP is assigned in the following way Identify the Pronoun n Consider Four sentences above the sentence containing the Pronoun n Consider all the NPs preceding the Pronoun ( This is the general rule) 05/24/2011 Summer School, IIIT HYderabad 68

Here we take some NPs which follow the Pronoun since Tamil All Indian languages Here we take some NPs which follow the Pronoun since Tamil All Indian languages are relatively free word Order Assign Salience Weights. The NP which gets the maximum salience weight and agrees in png with the anaphor is considered as the antecedent to the anaphor 05/24/2011 Summer School, IIIT HYderabad 69

Machine Learning n CONLL Task and its results 05/24/2011 Summer School, IIIT HYderabad 70 Machine Learning n CONLL Task and its results 05/24/2011 Summer School, IIIT HYderabad 70

Introduction n Goal of the task ¡ ¡ Automatically identify coreference chains in a Introduction n Goal of the task ¡ ¡ Automatically identify coreference chains in a document The coreference chains can include n n n Names Nominal mentions Pronouns verbs that are coreferenced with a noun phrases. The data used is the Ontonotes ¡ ¡ English documents from Ontonotes This consists of five different types of genres n 05/24/2011 Newswire (NW), Broadcast News (BN), Telephonic Conversation (TC), Broadcast conversation (BC), Web blogs (WB) Summer School, IIIT HYderabad 71

Introduction n Coreferencing ¡ ¡ n Coreferents are classified into two types ¡ ¡ Introduction n Coreferencing ¡ ¡ n Coreferents are classified into two types ¡ ¡ n n Two referential entities when they exist in the real world Coreference analysis determines whether or not two entities refer to the same entity Pronominal referents Non-pronominal referents Pronominal referents are pronouns, which refer to other nouns in the text Non-pronominal referents are names, nominal mentions and other noun phrases 05/24/2011 Summer School, IIIT HYderabad 72

A Sample Text Eagle Clothes Inc. , which is operating under Chapter 11 of A Sample Text Eagle Clothes Inc. , which is operating under Chapter 11 of the Federal Bankruptcy Code , said it reached an agreement with its creditors. Under the accord , Albert Roth , chairman and chief executive officer , and Arthur Chase , Sam Beigel , and Louis Polsky will resign as officers and directors of the menswear retailer. Mr. Roth , who has been on leave from his posts , will be succeeded by Geoffrie D. Lurie of GDL Management Inc. , which is Eagle 's crisis manager. Mr. Lurie is currently co-chief executive. 05/24/2011 Summer School, IIIT HYderabad 73

Eagle Clothes Inc. , which is operating under" src="https://present5.com/presentation/b4729834a5a1f0b36a9f149ff5670e6b/image-74.jpg" alt="Coreference Tagged Text Eagle Clothes Inc. , which is operating under" /> Coreference Tagged Text Eagle Clothes Inc. , which is operating under Chapter 11 of the federal Bankruptcy Code , said it reached an agreement with its creditors. Under the accord , Albert Roth , chairman and chief executive officer , and Arthur Chase , Sam Beigel , and Louis Polsky will resign as officers and directors of the menswear retailer. Mr. Roth , who has been on leave from his posts , will be succeeded by Geoffrie D. Lurie of GDL Management Inc. , which is Eagle 's crisis manager . Mr. Lurie is currently cochief executive. 05/24/2011 Summer School, IIIT HYderabad 74

Our Approach n n Two different approaches are used for the two types of Our Approach n n Two different approaches are used for the two types of coreferents Mainly two modules ¡ ¡ n n Pronominal resolution module Non-pronominal resolution module Pronominal resolution refers to identification of a Noun phrase (NP) that is referred by a pronominal Non-Pronominal resolution refers to identification of a NP referring to other NP. 05/24/2011 Summer School, IIIT HYderabad 75

Pronominal Resolution Module n n Pronominal Resolution is done using refined salience measure. All Pronominal Resolution Module n n Pronominal Resolution is done using refined salience measure. All pronouns do not refer back to an entity ¡ For example in the sentence “It will rain today” n The pronoun “It” does not refer to any entity n Such an instance of “It” is called as pleonastic “it” ¡ Before we do pronominal resolution, we need to identify such non-anaphoric pronouns 05/24/2011 Summer School, IIIT HYderabad 76

Pronominal Resolution Module (Contd…) n The first step in the pronominal resolution module is Pronominal Resolution Module (Contd…) n The first step in the pronominal resolution module is ¡ ¡ n identification of non-anaphoric pronouns and filter the non-anaphoric pronouns from text The identification of non-anaphoric pronouns is done using CRFs, a machine learning approach 05/24/2011 Summer School, IIIT HYderabad 77

Pronominal Resolution Module (Contd…) n Using CRFs we build a language model ¡ The Pronominal Resolution Module (Contd…) n Using CRFs we build a language model ¡ The features used for the training are n n ¡ n Word Part-of-speech (POS) tag Window of five words Non-anaphoric pronoun filtering has been observed to improve the results 05/24/2011 Summer School, IIIT HYderabad 78

Pronominal Resolution Module (Contd…) n The second step - core task of the pronoun Pronominal Resolution Module (Contd…) n The second step - core task of the pronoun resolution module ¡ n n the identification of antecedent for a pronoun Use a refined salience measure based approach Word. Net and NE tag information are used to match the category of pronoun and the antecedent 05/24/2011 Summer School, IIIT HYderabad 79

Pronominal Resolution Module (Contd…) n The table below shows the refined salience factors and Pronominal Resolution Module (Contd…) n The table below shows the refined salience factors and the weights assigned for them Salience Factors Weights Current Sentence (sentence in which pronoun occurs) 100 For the preceding sentences up to four sentences from the current sentence Reduce sentence score by 10 Current Clause (clause in which pronoun occurs) 100 – for possessive pronoun 50 – for non-possessive pronouns Immediate Clause (clause preceding or following the current clause) 50 – for possessive pronoun 100 – for non-possessive pronouns Non-immediate Clause (neither the current or immediate clause) 50 Possessive NP 65 Existential NP 70 Subject 80 Direct Object 50 Indirect Object 05/24/2011 Compliment of PP Summer School, IIIT HYderabad 40 30 80

Non-Pronominal Resolution Module n Use CRFs, a machine learning method n Preparing the Training Non-Pronominal Resolution Module n Use CRFs, a machine learning method n Preparing the Training data for CRFs learning ¡ ¡ ¡ All positive pairs of NPs, and negative pairs of NPs are taken for training Positive pairs are the NPs and the anaphor All NPs between an anaphor and antecedent are the negative NPs Do not consider the NPs containing the pronouns, We consider the NP on the left side as antecedent NP and NP on the right side as anaphor NP 05/24/2011 Summer School, IIIT HYderabad 81

Features – Non-pronominal Resolution Module n The features used for training are ¡ Distance Features – Non-pronominal Resolution Module n The features used for training are ¡ Distance feature n n ¡ Definite NP n ¡ If the antecedent NP is a definite NP, has value 1 else 0 Demonstrative NP n ¡ the possible values are 0, 1, 2, …. Calculated based on number of sentences between NPs If the antecedent NP is demonstrative, has value 1 else 0 String match n n 05/24/2011 the possible values are between 0 and 1 calculated as ratio of the number of words matched between the NPs and the total number of words of the anaphor NP. Summer School, IIIT HYderabad 82

Features – Non-pronominal Resolution Module (Contd…) ¡ ¡ ¡ Number Agreement n use the Features – Non-pronominal Resolution Module (Contd…) ¡ ¡ ¡ Number Agreement n use the gender data file (Bergsma and Lin, 2006) provided by Co. NLL n Also use POS information Gender agreement n use the gender data file (Bergsma and Lin, 2006) provided by Co. NLL Alias feature n the possible values are 0 or 1. n this is obtained using three methods ¡ Comparing the head of the NPs, if both are same then scored as 1 ¡ If both the NPs start with NNP or NNPS POS tags, and if they are same then scored as 1 ¡ Looks for Acronym match, if one is an acronym of other it is scored as 1 Proper NP n If both NPs are proper NPs then value of 1 else 0 NE Tag Info n If NE tag present then value of 1 else 0 05/24/2011 Summer School, IIIT HYderabad 83

Non-Pronominal Resolution Module (Contd…) n The semantic class information (noun category) obtained from the Non-Pronominal Resolution Module (Contd…) n The semantic class information (noun category) obtained from the Word. Net is used for the filtering purpose. ¡ 05/24/2011 The pairs which do not have semantic feature match are filtered out. Summer School, IIIT HYderabad 84

Complete Coreference Chain Building n n The Coreferring pairs obtained from pronominal resolution system Complete Coreference Chain Building n n The Coreferring pairs obtained from pronominal resolution system and Non-pronominal system are merged to generate the complete coreference chains. The merging is done as follows: ¡ ¡ ¡ n A member of a coreference pair is compared with all the members of the coreference pairs identified if it occurs in anyone of the pair, then the two pairs are grouped this process is done for all the members of the identified pairs and the members in each group are aligned based on their position in the document to form the chain We show the sample outputs for building chain in the next few slides 05/24/2011 Summer School, IIIT HYderabad 85

Evaluation n The data used for training and testing is the English portion of Evaluation n The data used for training and testing is the English portion of the Ontonotes The training and development data consisted of 1876 files from different genres viz. , Newswire, Broadcast news, Broadcast conversation, Web blogs, magzine articles The metrics used for evaluating the complete system are ¡ ¡ ¡ 05/24/2011 MUC B-Cubed CEAFE Summer School, IIIT HYderabad 86

Evaluation n n We have performed evaluation of the Pronominal resolution module separately using Evaluation n n We have performed evaluation of the Pronominal resolution module separately using the development data We perform the evaluation of non-anaphor detection engine ¡ In the evaluation of pronominal resolution module we studied how the non-anaphoric pronoun detection improves 05/24/2011 Summer School, IIIT HYderabad 87

Results – Pronominal Resolution Module Evaluation of Non-anaphoric pronoun detection component Evaluation of Pronominal Results – Pronominal Resolution Module Evaluation of Non-anaphoric pronoun detection component Evaluation of Pronominal resolution module Actual (gold standar d) System identifie d Correctl y Accur acy (%) System type Total Anaph oric Prono uns System identified pronoun s System correctly Resolved Pronouns Prec (%) Anaphoric Pronouns 939 908 96. 6 Without nonanaphoric pronoun detection 939 1099 693 63. 1 Nonanaphoric pronouns 160 81 50. 6 939 987 693 70. 2 Total 1099 989 89. 9 With nonanaphoric pronoun detection Type pronoun of 05/24/2011 Summer School, IIIT HYderabad 88

Results– Complete System Metric Mention Detection Coreference Resolution Recall Precision F 1 MUC 68. Results– Complete System Metric Mention Detection Coreference Resolution Recall Precision F 1 MUC 68. 1 61. 5 64. 6 52. 1 49. 9 50. 9 B-CUBED 68. 1 61. 5 64. 6 66. 6 67. 1 CEAFE 68. 1 61. 5 64. 6 42. 8 44. 9 43. 8 05/24/2011 Summer School, IIIT HYderabad 89

Results Discussion n On analysis of the output we found mainly three types of Results Discussion n On analysis of the output we found mainly three types of errors. They are ¡ Newly invented chains n n n 05/24/2011 The system identifies new chains that are not found in the gold standard annotation. This reduces the precision of the system. This is because of the string match as one of the features. Summer School, IIIT HYderabad 90

Results Discussion (Contd…) n Only head nouns in the chain ¡ system while selecting Results Discussion (Contd…) n Only head nouns in the chain ¡ system while selecting pair for identifying coreference, the pair has only the head noun instead of the full phrase. ¡ In the phrase “the letters sent in recent days”, the system identifies “the letters” instead of the whole phrase. ¡ This affects both the precision and recall of the system. 05/24/2011 Summer School, IIIT HYderabad 91

Conclusion n Presented a coreference resolution system which combines the pronominal resolution using refined Conclusion n Presented a coreference resolution system which combines the pronominal resolution using refined salience based approach with non-pronominal resolution using CRFs, machine learning approach. Non-anaphoric pronouns identification improves the precision. In non-pronominal resolution algorithm, the string match feature is an effective feature. ¡ ¡ n But, this feature is found to introduce errors. We need to additional contextual and semantic feature to reduce above said errors. The results on the development set are encouraging. 05/24/2011 Summer School, IIIT HYderabad 92

Problems in Anaphora Resolution n Ambiguous Sentences ¡ ¡ ¡ n We gave the Problems in Anaphora Resolution n Ambiguous Sentences ¡ ¡ ¡ n We gave the bananas to the monkeys because they were hungry. We gave the bananas to the monkeys because they were ripe. We gave the bananas to the monkeys because they were here. Complement anaphora ¡ ¡ 05/24/2011 (1) Only a few of the children ate their ice-cream. They ate the strawberry flavour first. (2) Only a few of the children ate their ice-cream. They threw it around the room instead Summer School, IIIT HYderabad 93

The Prime Minister of New Zealand visited us yesterday. The visit was the first The Prime Minister of New Zealand visited us yesterday. The visit was the first time she had come to New York since 1998. If the second sentence is quoted by itself, it is necessary to resolve the anaphor: ¡ The visit was the first time the Prime Minister of New Zealand had come to New York since 1998. Although of course, as The Prime Minister of New Zealand is an office of state and she would seem to refer to the person currently occupying that office, it could quite easily be that the Prime Minister of New Zealand had visited New York since 1998 and before the present day, whilst the present incumbent she had not. ¡ n n 05/24/2011 Summer School, IIIT HYderabad 94

Corpus Creation for ML Method n Annotation Guide lines ¡ ¡ 05/24/2011 PALINKA MUC Corpus Creation for ML Method n Annotation Guide lines ¡ ¡ 05/24/2011 PALINKA MUC GATE Give examples Summer School, IIIT HYderabad 95

Tools n n GATE Java-RAP (pronouns) GUITAR (Poesio & Kabadjov, 2004; Kabadjov, 2007) BART Tools n n GATE Java-RAP (pronouns) GUITAR (Poesio & Kabadjov, 2004; Kabadjov, 2007) BART (Versleyet al, 2008) 05/24/2011 Summer School, IIIT HYderabad 96

TASKS and Conferences n n n Co. NLL shared task Conference Discourse Anaphora and TASKS and Conferences n n n Co. NLL shared task Conference Discourse Anaphora and Anaphora Resolution Colloquium: DAARC 05/24/2011 Summer School, IIIT HYderabad 97

Where it is required? n n Machine Translation Information Extraction Summarization And in………. almost Where it is required? n n Machine Translation Information Extraction Summarization And in………. almost all NLU applications 05/24/2011 Summer School, IIIT HYderabad 98

References n n n Massimo Poesio Slides: “Anaphora resolution for Practical task” Ruslan Mitkov: References n n n Massimo Poesio Slides: “Anaphora resolution for Practical task” Ruslan Mitkov: “MARS a Knowledge Poor anaphora resolution system” MORE FROM OTHER SOURSES 05/24/2011 Summer School, IIIT HYderabad 99

Thank You 05/24/2011 Summer School, IIIT HYderabad Thank You 05/24/2011 Summer School, IIIT HYderabad