94131cc995753cb715141a320c6186df.ppt
- Количество слайдов: 79
CS 626 -449: NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept. , IIT Bombay Lecture 37: Semantic Role Extraction (obtaining Dependency Parse)
Vaquious Triangle Interlingua based (do deep semantic process Before entering the target language) An aly sis (enter the target Language immediately Through a dictionary) ion Direct t ra (do deep semantic process Before entering the target language) ne Ge Transfer Based Vaquious: an eminent French Machine Translation Researcher Originally a Physicist 2
Universal Networking Language n n Universal Words (UWs) Relations Attributes Knowledge Base 3
UNL Graph He forwarded the mail to the minister. forward(icl>se nd) agt @ entry @ past gol obj He(icl>pers on) minister(icl>pers on) mail(icl>collection @def ) 4
AGT / AOJ / OBJ n AGT (Agent) Definition: Agt defines a thing which initiates an action n AOJ (Thing with attribute) Definition: Aoj defines a thing which is in a state or has an attribute n OBJ (Affected thing) Definition: Obj defines a thing in focus which is directly affected by an event or state 5
Examples n n n John broke the window. agt ( break. @entry. @past, John) This flower is beautiful. aoj ( beautiful. @entry, flower) He blamed John for the accident. obj ( blame. @entry. @past, John) 6
BEN n BEN (Beneficiary) Definition: Ben defines a not directly related beneficiary or victim of an event or state n Can I do anything for you? ben ( do. @entry. @interrogation. @politeness, you ) obj ( do. @entry. @interrogation. @politeness, anything ) agt (do. @entry. @interrogation. @politeness, I ) 7
PUR n PUR (Purpose or objective) Definition: Pur defines the purpose or objectives of the agent of an event or the purpose of a thing exist n This budget is for food. pur ( food. @entry, budget ) mod ( budget, this ) 8
RSN n n RSN (Reason) Definition: Rsn defines a reason why an event or a state happens They selected him for his honesty. agt(select(icl>choose). @entry, they) obj(select(icl>choose). @entry, he) rsn (select(icl>choose). @entry, honesty) 9
TIM n n TIM (Time) Definition: Tim defines the time an event occurs or a state is true I wake up at noon. agt ( wake up. @entry, I ) tim ( wake up. @entry, noon(icl>time)) 10
TMF n n TMF (Initial time) Definition: Tmf defines a time an event starts The meeting started from morning. obj ( start. @entry. @past, meeting. @def ) tmf ( start. @entry. @past, morning(icl>time) ) 11
TMT n n TMT (Final time) Definition: Tmt defines a time an event ends The meeting continued till evening. obj ( continue. @entry. @past, meeting. @def ) tmt ( continue. @entry. @past, evening(icl>time) ) 12
PLC n n PLC (Place) Definition: Plc defines the place an event occurs or a state is true or a thing exists He is very famous in India. aoj ( famous. @entry, he ) man ( famous. @entry, very) plc ( famous. @entry, India) 13
PLF n n PLF (Initial place) Definition: Plf defines the place an event begins or a state becomes true Participants come from the whole world. agt ( come. @entry, participant. @pl ) plf ( come. @entry, world ) mod ( world, whole) 14
PLT n n PLT (Final place) Definition: Plt defines the place an event ends or a state becomes false We will go to Delhi. agt ( go. @entry. @future, we ) plt ( go. @entry. @future, Delhi) 15
INS n INS (Instrument) Definition: Ins defines the instrument to carry out an event n I solved it with computer agt ( solve. @entry. @past, I ) ins ( solve. @entry. @past, computer ) obj ( solve. @entry. @past, it ) 16
Attributes n n Constitute syntax of UNL Play the role of bridging the conceptual world and the real world in the UNL expressions Show and when the speaker views what is said and with what intention, feeling, and so on Seven types: n n n n Time with respect to the speaker Aspects Speaker’s view of reference Speaker’s emphasis, focus, topic, etc. Convention Speaker’s attitudes Speaker’s feelings and viewpoints 17
Tense: @past He went there yesterday n The past tense is normally expressed by @past {unl} agt(go. @entry. @past, he) … {/unl} 18
Aspects: @progress It’s raining hard. {unl} man ( rain. @entry. @present. @progress, hard ) {/unl} 19
Speaker’s view of reference n n @def (Specific concept (already referred)) The house on the corner is for sale. @indef (Non specific class) There is a book on the desk n @not is always attached to the UW which is negated. He didn’t come. agt ( come. @entry. @past. @not, he ) 20
Speaker’s emphasis n @emphasis John his name is. mod ( name, he ) aoj ( John. @emphasis. @entry, name ) n @entry denotes the entry point or main UW of an UNL expression 21
Subcategorization Frames n Specify the categorial class of the lexical item. Specify the environment. n Examples: n kick: [V; _ NP] cry: [V; _ ] rely: [V; _PP] put: [V; _ NP PP] think: : [V; _ S` ] 22
Subcategorization Rules Subcategorization Rule: V y / _NP] _] _PP] _NP PP] _S`] 23
Subcategorization Rules The boy relied on the friend. 1. S NP VP 2. VP V (NP) (PP) (S`)… 3. NP Det N 4. V rely / _PP] 5. P on / _NP] 6. Det the 7. N boy, friend 24
Semantically Odd Constructions n Can we exclude these two ill formed structures ? n n n *The boy frightened sincerity. *Sincerity kicked the boy. Selectional Restrictions 25
Selectional Restrictions n Inherent Properties of Nouns: [+/ ABSTRACT], [+/ ANIMATE] n E. g. , Sincerity [+ ABSTRACT] Boy [+ANIMATE] 26
Selectional Rules n A selectional rule specifies certain selectional restrictions associated with a verb. V V y / frighten / [+/-ABSTARCT] __ __ [+/-ANIMATE] [+/-ABSTARCT] __ __ [+ANIMATE] 27
Subcategorization Frame forward V __ NP PP e. g. , We will be forwarding our new catalogue to you invitation N __ PP e. g. , An invitation to the party e. g. , A program making science is more accessible to young people accessible A __ PP 28
Thematic Roles The man forwarded the mail to the minister. forward V __ NP PP Event FORWARD ([ Thing THE MAN], [Path [Thing THE MAIL], TO THE MINISTER] ) 29
How to define the UWs in UNL Knowledge-Base? n Nominal concept n n n Verbal concept n n n Abstract Concrete Do Occur Be Adjective concept Adverbial concept 30
Nominal Concept: Abstract thing abstract thing{(icl>thing)} culture(icl>abstract thing) civilization(icl>culture{>abstract thing}) direction(icl>abstract thing) east(icl>direction{>abstract thing}) duty(icl>abstract thing) mission(icl>duty{>abstract thing}) responsibility(icl>duty{>abstract thing}) accountability{(icl>responsibility>duty)} event(icl>abstract thing{, icl>time>abstract thing}) meeting(icl>event{>abstract thing, icl>group>abstract thing}) conference(icl>meeting{>event}) TV conference{(icl>conference>meeting)} 31
Nominal Concept: Concrete thing concrete thing{(icl>thing, icl>place>thing)} building(icl>concrete thing) factory(icl>building{>concrete thing}) house(icl>building{>concrete thing}) substance(icl>concrete thing) cloth(icl>substance{>concrete thing}) cotton(icl>cloth{>substance}) fiber(icl>substance{>concrete thing}) synthetic fiber{(icl>fiber>substance)} textile fiber{(icl>fiber>substance)} liquid(icl>substance{>concrete thing}) beverage(icl>food, icl>liquid>substance}) coffee(icl>beverage{>food}) liquor(icl>beverage{>food}) beer(icl>liquor{>beverage}) 32
Verbal concept: do do({icl>do, }agt>thing, gol>thing, obj>thing) express({icl>do(}agt>thing, gol>thing, obj>thing{)}) state(icl>express(agt>thing, gol>thing, obj>thing)) explain(icl>state(agt>thing, gol>thing, obj>thing)) add({icl>do(}agt>thing, gol>thing, obj>thing{)}) change({icl>do(}agt>thing, gol>thing, obj>thing{)}) convert(icl>change(agt>thing, gol>thing, obj>thing) classify({icl>do(}agt>thing, gol>thing, obj>thing{)}) divide(icl>classify(agt>thing, gol>thing, obj>thing)) 33
Verbal concept: occur and be n n occur({icl>occur, }gol>thing, obj>thing) melt({icl>occur(}gol>thing, obj>thing{)}) divide({icl>occur(}gol>thing, obj>thing{)}) arrive({icl>occur(}obj>thing{)}) be({icl>be, }aoj>thing{, ^obj>thing}) exist({icl>be(}aoj>thing{)}) born({icl>be(}aoj>thing{)}) 34
How to define the UWs in UNL Knowledge Base? n In order to distinguish among the verb classes headed by 'do', 'occur' and 'be', the following features are used: [ need an agent ] [ need an object ] English + + "to kill" 'occur' - + "to fall" 'be' - - "to know" UW 'do' 35
How to define the UWs in UNL Knowledge. Base? n The verbal UWs (do, occur, be) also take some pre defined semantic cases, as follows: UW PRE-DEFINED CASES English 'do' takes necessarily agt>thing "to kill" 'occur' takes necessarily obj>thing "to fall" 'be' takes necessarily aoj>thing "to know" 36
Complex sentence I want to watch this movie. @entry. @past want (icl>) agt : 01 obj watch (icl>do)@entry. @inf agt I (iof>person) obj movie(icl>) I (iof>person) @def 37
Approach to UNL Generation 38
Problem Definition n Generate UNL expressions for English sentences n n n in a robust and scalable manner, using syntactic analysis and lexical resources extensively. This needs n n detecting semantically relatable entities and solving attachment problems
Semantically Relatable Sequences (SRS) Definition: A semantically relatable Sequence (SRS) of a sentence is a group of words in the sentence (not necessarily consecutive) that appear in the semantic graph of the sentence as linked nodes or nodes with speech act labels (This is motivated by UNL representation)
SRS as an intermediary to and intermediary Source Language Sentence SRS UNL Target Language Sentence
Example to illustrate SRS bought “The man bought a new car in June” agent past tense time object man car the: definite a: indefinite modifier new June in: modifier
Sequences from “the man bought a new car in June” a. b. c. d. e. f. {man, bought} }bought, car{ }bought, in, June{ }new, car{ }the, man{ }a, car{
Basic questions n Which words can form semantic constituents, which we call Semantically Relatable Sequences (SRS)? n n What after all are the SRSs of the given sentence? What semantic relations can link the words in an SRS and the SRSs themselves?
Postulate n A sentence needs to be broken into Sequences of at most three forms n n n {CW, CW} {CW, FW, CW} {FW, CW} where CW refers to content word or a clause and FW to function word
SRS and Language Phenomena
Movement: Preposition Stranding n John, we laughed at. n n (we , laughed. @entry)-----(CW, CW) (laughed. @entry, at, John)---(CW, FW, CW)
Movement: Topicalization n The problem, we solved. n n n (we , solved. @entry)------(CW, CW) (solved. @entry , problem)-----(CW, CW) (the, problem)----------(CW, CW)
Movement: Relative Clauses n John told a joke which we had already heard. n (John, told. @entry) ----------(CW, CW) n (told. @entry, : 01) -----------(CW, CW) n SCOPE 01(we, had, heard. @entry)-------(CW, FW, CW) n SCOPE 01(already, heard. @entry)-------(CW, CW) n SCOPE 01(heard@entry, which, joke)---(CW, FW, CW) n SCOPE 01(a, joke)-------------(FW, CW)
Movement: Interrogatives n Who did you refer her to? n (did , refer. @entry. @interrogative)-------(FW, CW) n (you, refer. @entry. @interrogative)----(CW, CW) n (refer. @entry. @interrogative , her)----(CW, CW) n (refer. @entry. @interrogative , to, who)---- (CW, FW, CW)
Empty Pronominals: to infinitivals n Bill was wise to sell the piano. n (wise. @entry , SCOPE 01)--------(CW, CW) n SCOPE 01(sell. @entry , piano)-----(CW, CW) n (Bill, was, wise. @entry) ---------(CW, FW, CW) n SCOPE 01(Bill, to, sell. @entry)-----(CW, FW, CW) n SCOPE 01(the, piano) ----------(FW, CW)
Empty pronominal: Gerundial n n n n The cat leapt down spotting a thrush on the lawn. (The, cat) ----------------(FW, CW) (cat, leapt. @entry) ----------(CW, CW) (leapt. @entry , down) --------(CW, CW) (leapt. @entry , SCOPE 01) ---------(CW, CW) SCOPE 01(spotting. @entry, thrush)----(CW, CW) SCOPE 01(spotting. @entry, on, lawn)---(CW, FW, CW)
PP Attachment n John cracked the glass with a stone. n n n (John, cracked. @entry)-------(CW, CW) (cracked. @entry, glass)-------(CW, CW) (cracked. @entry, with, stone)---(CW, FW, CW) (a, stone)---------------(FW, CW) (the, glass)-------------(FW, CW)
SRS and PP attachment (Mohanty, Almeida, Bhattacharyya, 04) Conditions Sub-conditions Attachment Point [PP] is subcategorized by the verb [V] [NP 2] is licensed by [NP 2] is attached to the a preposition [P] verb [V] (e. g. , He forwarded the mail to the minister) [PP] is subcategorized by the noun in [NP[1 [NP [2 is licensed by [NP 2] is attached to the a preposition [P [ noun in [NP)[1 e. g. , John published six articles on machine translation( [PP] is neither subcategorized by the verb [V] nor by the noun in [NP 1[ [NP 2] refers to [PLACE] / [TIME] feature [NP 2] is attached to the verb [V](e. g. , I saw Mary in her office; The girls met the teacher on different days)
Linguistic Study to Computation
Syntactic constituents to Semantic constituents n n n A probabilistic parser (Charniak, 04) is used. Other resources: Wordnet and Oxford Advanced Learner’s Dictionary In a parse tree, tags give indications of CW and FW: n NP, VP, ADJP and ADVP CW n PP (prepositional phrase), IN (preposition) and DT (determiner) FW
Observation: Headwords of sibling nodes form SRSs (C) VP bought “John has bought a car. ” SRS: {has, bought}, {a, car}, {bought, car} (C) VP bought (F) AUX has (C) VBD bought (C) NP car (F) DT a has (C) NN car bought a car
Need: Resilience to wrong PP attachment “John has published an article on linguistics” (C)VP published n Use PP attachment heuristics n n Get {article, on, linguistics} (C)VBD published (F)DT an (C)NP article (F) PP on (C)NNarticle (F)IN on published an (C)NPlinguistics article (C)NNS linguistics on linguistics
to-infinitival “I forced him to watch this movie” Clause boundary is the VP node, labeled with SCOPE (C)VP forced (C)VBD forced Tag is modified to TO, a FW tag, indicating that it heads a to-infinitival clause, The duplication and insertion of the NP node with head him (depicted by shaded nodes) as a sibling of the VBD node with head forced is done to bring out the existence of a semantic relation between force and him. (C) S SCOPE (C)NP him (C)PRP him (C)VP (C)NP him forced him (F)TO toto (C)PRP him to (C)VP watch
Linking of clauses: “John said that he was reading a novel” Head of S node marked as Scope SRS: {said, that, SCOPE}. n Adverbial clauses have similar parse tree structures except that the subordinating conjunctions are different from that. n (C) VP said (C)VBD said (F) SBAR that (F) IN that said that (C) S SCOPE
Implementation n Block Diagram of the system Input Sentence Word. Net 2. 0 Charniak Parser Noun classification Time and Place features Scope Handler Parse Tree modification and augmentation with head and scope information THAT clause as Subcat property Augmented Parse Tree Sub-categorization Database Preposition as Subcat property Attachment Resolver Semantically Relatable Sequences Generator Semantically Related Sequences
Head determination Uses a bottom up strategy to determine the headword for every node in the parse tree. Crucial in obtaining the SRSs, since wrong head information may end up getting propagated all the way up the tree Processes the children of every node starting from the rightmost child and checks the head information already specified against the node’s tag to determine the head of the node Some special cases are: n n n n SBAR node A VP node with PRO insertion, copula, Phrasal verbs etc. NP nodes with of PP cases and conjunctions under them, which lead to scope creation.
Scope handler n n Performs modification on the parse trees by insertion of nodes in to infinitival cases Adjusts of the tag and head information in case of SBAR nodes
Attachment resolver n Takes a (CW 1, FW, CW 2) as input and checks n n n the time and place features of CW 2, the noun class of CW 1 and the subcategorization information for the CW 1 and FW pair to decide the attachment. n If none of these yield any deterministic results, take the attachment indicated by the parser
SRS generator n n n Performs a breadth first search on the parse tree and performs detailed processing at every node N 1 of the tree. S nodes which dominate entire clauses (main or embedded) are treated as CWs. SBAR and TO nodes are treated as FWs.
Algorithm If the node N 1 is a CW (new/JJ, published/VBD, fact/NN, boy/NN, John/NNP) perform the following checks: If the sibling N 2 of N is a CW (car/NN, 1 article/NN, SCOPE/S( Then create {CW, CW} ({new, car} , {published, article} , {boy, SCOPE}( If the sibling N 2 is a FW (in/PP, that/SBAR, and/CC( Then, check if N 2 has a child FW, N 3 (in/IN, that/IN) and a child CW, N) 4 June/NN, SCOPE/S( If yes, Then use attachment resolver to decide the CW to which N 3 and N 4 attach. Create{CW, FW, CW} ({published, in, June} , {fact, that, SCOPE}( If no, Then check if next sibling N 5 of N is a CW 1 (Mary/NN( If yes, Create {CW, FW, CW} ({John, and, Mary}( If the node N 1 is a FW (the/DT, is/AUX, to/TO), perform the following checks: If the parent node is a CW (boy/NP , famous/VP( Check if sibling is an adjective. i. If yes, (famous/JJ( Then, create {CW, FW, CW} ({She, is, famous}( ii. If no, (boy/NN( Then, create {FW, CW} ({the, boy} , {has, bought}( If the parent node N 6 is a FW (to/TO) and the sibling node N 7 is a CW (learn/VB( Use attachment resolver to decide on the preceding CW to which N 6 and N 7 can attach. Create {CW, FW, CW} ({exciting, to, learn}(
Evaluation n Frame. Net corpus [Baker et. al. , 1998], a semantically annotated corpus, as the testdata. 92310 sentences (call this the gold standard) Created automatically from the Frame. Net corpus taking verbs, nouns and adjectives as the targets n n n Verbs as the target 37, 984 (i. e. , semantic frames of verbs) Nouns as the target 37, 240 Adjectives as the target 17, 086
Score for high frequency verbs Verb Swim Depend Look Roll Rush Phone 162 Reproduce Step Urge Avoid Frequency 280 215 187 173 172 159 157 152 Score 0. 695 0. 709 0. 804 0. 835 0. 775 0. 797 0. 795 0. 765 0. 789
Scores of 10 verb groups of high frequency in the Gold Standard
Scores of 10 noun groups of high frequency in the Gold Standard
An actual sentence n A. Sentence : A form of asbestos once used to make Kent cigarette filters has caused a high percentage of cancer deaths among a group of workers exposed to it more than 30 years ago, researchers reported.
Relative performance on SRS constructs
Results on sentence constructs Rajat Mohanty, Anupama Dutta and Pushpak Bhattacharyya, Semantically Relatable Sets: Building Blocks for Repesenting Semantics, 10 th Machine Translation Summit ( MT Summit 05), Phuket, September, 2005.
Statistical Approach
Use SRL marked corpora n n Daniel Gildea and Daniel Jurafsky. 2002. Automatic labeling of semantic roles. Computational Linguistics, 28(3): 245– 288. Prop. Bank corpus n n Role annotated WSJ part of Penn Treebank [10] Prop. Bank role-set [2, 4] n n Core roles: ARG 0 (Proto-agent), ARG 1 (Proto-patient) to ARG 5 Adjunctive roles: ARGM-LOC (for locatives), ARGM-TMP (for temporals), etc.
SRL marked corpora contd… n Prop. Bank roles: an example [ARG 0 It] operates] [ARG 1 stores] [ARGM−LOC mostly in Iowa and Nebraska] Fig. 4: Parse tree output, Source: [5] n Preprocessing systems [2] n n Part of speech tagger Base Chunker Full syntactic parser Named entities recognizer
Probabilistic estimation [1] n Empirical probability estimation over candidate roles for each constituent based upon extracted features here, t is the target word r is a candidate role, h , pt, gov, voice are features n Linear interpolation, with condition • Geometric mean, with condition
A state-of-art SRL system: ASSERT [4] n Main points [3, 4] n n Use of Support Vector Machine [13] as classifier Similar to Frame. Net “domains”, “Predicate Clusters” are introduced Named Entities [14] is used as a new feature Experiment I (Parser dependency testing) n n Use of Prop. Bank bracketed corpus Use of Charniak parser trained on Penn Treebank corpus Parse Recall (%) F-score (%) Accuracy (%) 97. 5 96. 1 96. 8 - Class. - - - 93. 0 Id. + Class. 91. 8 90. 5 91. 2 - Id. Charniak Precision (%) Id. Treebank Task 87. 8 84. 1 85. 9 - Class. - - - 92. 0 Id. + Class. 81. 7 78. 4 80. 0 - Table 1: Performance of ASSERT for Treebank and Charniak parser outputs. Id. Stands for identification task and Class. stands for classification task. Data source: [4]
Experiments and Results n Experiment II (Cross genre testing) 1. 2. Training on Prop. Banked WSJ data and testing on Brown Corpus Charniak parser trained on first Prop. Bank then Brown Table 2: Performance of ASSERT for various experimental combinations Date source: [4]
94131cc995753cb715141a320c6186df.ppt