
1a4002f5e82fe317defa899384b4e3e3.ppt
- Количество слайдов: 72
Penn Putting Meaning Into Your Trees Martha Palmer University of Pennsylvania Princeton Cognitive Science Laboratory November 6, 2003 Princeton 11/06/03 1
Outline Penn · Introduction · Background: Word. Net, Levin classes, Verb. Net · Proposition Bank – capturing shallow semantics · Mapping Prop. Bank to Verb. Net · Mapping Prop. Bank to Word. Net Princeton 11/06/03 2
Word sense in Machine Translation Penn · Different syntactic frames Ø John left the room Juan saiu do quarto. (Portuguese) Ø John left the book on the table. Juan deizou o livro na mesa. · Same syntactic frame? Ø John left a fortune. Juan deixou uma fortuna. Princeton 11/06/03 3
Ask Jeeves – A Q/A, IR ex. Penn What do you call a successful movie? Blockbuster · Tips on Being a Successful Movie Vampire. . . I shall call the police. · Successful Casting Call & Shoot for ``Clash of Empires''. . . thank everyone for their participation in the making of yesterday's movie. · Demme's casting is also highly entertaining, although I wouldn't go so far as to call it successful. This movie's resemblance to its predecessor is pretty vague. . . · VHS Movies: Successful Cold Call Selling: Over 100 New Ideas, Scripts, and Examples from the Nation's Foremost Sales Trainer. Princeton 11/06/03 4
Ask Jeeves – filtering w/ POS tag Penn What do you call a successful movie? · Tips on Being a Successful Movie Vampire. . . I shall call the police. · Successful Casting Call & Shoot for ``Clash of Empires''. . . thank everyone for their participation in the making of yesterday's movie. · Demme's casting is also highly entertaining, although I wouldn't go so far as to call it successful. This movie's resemblance to its predecessor is pretty vague. . . · VHS Movies: Successful Cold Call Selling: Over 100 New Ideas, Scripts, and Examples from the Nation's Foremost Sales Trainer. Princeton 11/06/03 5
Filtering out “call the police” Penn call(you, movie, what) call(you, police) Syntax Princeton 11/06/03 6
English lexical resource is required Penn · That provides sets of possible syntactic frames for verbs. · And provides clear, replicable sense distinctions. Ask. Jeeves: Who do you call for a good electronic lexical database for English? Princeton 11/06/03 7
Word. Net – call, 28 senses Penn 1. name, call -- (assign a specified, proper name to; "They named their son David"; …) -> LABEL 2. call, telephone, call up, phone, ring -- (get or try to get into communication (with someone) by telephone; "I tried to call you all night"; …) ->TELECOMMUNICATE 3. call -- (ascribe a quality to or give a name of a common noun that reflects a quality; "He called me a bastard"; …) -> LABEL 4. call, send for -- (order, request, or command to come; "She was called into the director's office"; "Call the police!") -> ORDER Princeton 11/06/03 8
Word. Net – Princeton (Miller 1985, Fellbaum 1998) Penn · On-line lexical reference (dictionary) Ø Nouns, verbs, adjectives, and adverbs grouped into synonym sets Ø Other relations include hypernyms (ISA), antonyms, meronyms · Limitations as a computational lexicon Ø Contains little syntactic information Ø No explicit predicate argument structures Ø No systematic extension of basic senses Ø Sense distinctions are very fine-grained, ITA 73% Ø No hierarchical entries Princeton 11/06/03 9
Levin classes (Levin, 1993) Penn · 3100 verbs, 47 top level classes, 193 second and third level · Each class has a syntactic signature based on alternations. John broke the jar. / The jar broke. / Jars break easily. John cut the bread. / *The bread cut. / Bread cuts easily. John hit the wall. / *The wall hit. / *Walls hit easily. Princeton 11/06/03 10
Levin classes (Levin, 1993) Penn · Verb class hierarchy: 3100 verbs, 47 top level classes, 193 · Each class has a syntactic signature based on alternations. John broke the jar. / The jar broke. / Jars break easily. change-of-state John cut the bread. / *The bread cut. / Bread cuts easily. change-of-state, recognizable action, sharp instrument John hit the wall. / *The wall hit. / *Walls hit easily. contact, exertion of force Princeton 11/06/03 11
Penn Princeton 11/06/03 12
Confusions in Levin classes? Penn · Not semantically homogenous Ø{braid, clip, file, powder, pluck, etc. . . } · Multiple class listings Øhomonymy or polysemy? · Conflicting alternations? ØCarry verbs disallow the Conative, (*she carried at the ball), but include {push, pull, shove, kick, draw, yank, tug} Øalso in Push/pull class, does take the Conative (she kicked at the ball) Princeton 11/06/03 13
Intersective Levin Classes Penn “apart” CH-STATE “across the room” CH-LOC Princeton 11/06/03 “at” ¬CH-LOC Dang, Kipper & Palmer, ACL 98 14
Intersective Levin Classes Penn · More syntactically and semantically coherent Øsets of syntactic patterns Øexplicit semantic components Ørelations between senses VERBNET www. cis. upenn. edu/verbnet Dang, Kipper & Palmer, IJCAI 00, Coling 00 Princeton 11/06/03 15
Verb. Net – Karin Kipper Penn · Class entries: Ø Capture generalizations about verb behavior Ø Organized hierarchically Ø Members have common semantic elements, semantic roles and syntactic frames · Verb entries: Ø Refer to a set of classes (different senses) Ø each class member linked to WN synset(s) (not all WN senses are covered) Princeton 11/06/03 16
Semantic role labels: Penn Christiane broke the LCD projector. break (agent(Christiane), patient(LCD-projector)) cause(agent(Christiane), broken(LCD-projector)) agent(A) -> intentional(A), sentient(A), causer(A), affector(A) patient(P) -> affected(P), change(P), … Princeton 11/06/03 17
Hand built resources vs. Real data Penn · Verb. Net is based on linguistic theory – how useful is it? · How well does it correspond to syntactic variations found in naturally occurring text? Prop. Bank Princeton 11/06/03 18
Proposition Bank: Penn From Sentences to Propositions Powell met Zhu Rongji battle wrestle join debate Powell and Zhu Rongji met Powell met with Zhu Rongji Powell and Zhu Rongji had a meeting consult Proposition: meet(Powell, Zhu Rongji) meet(Somebody 1, Somebody 2) . . . When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane. meet(Powell, Zhu) Princeton 11/06/03 discuss([Powell, Zhu], return(X, plane)) 19
Capturing semantic roles* Penn SUBJ · George broke [ ARG 1 the laser pointer. ] SUBJ · [ARG 1 The windows] were broken by the hurricane. SUBJ · [ARG 1 The vase] broke into pieces when it toppled over. *See also Framenet, http: //www. icsi. berkeley. edu/~framenet/ Princeton 11/06/03 20
English lexical resource is required Penn · That provides sets of possible syntactic frames for verbs with semantic role labels. · And provides clear, replicable sense distinctions. Princeton 11/06/03 21
A Tree. Banked Sentence Penn (S (NP-SBJ Analysts) (VP have (VP been VP (VP expecting (NP a GM-Jaguar pact) have VP (SBAR (WHNP-1 that) (S (NP-SBJ *T*-1) NP-SBJ been VP (VP would Analyst expecting. NP (VP give s (NP the U. S. car maker) SBAR NP (NP an eventual (ADJP 30 %) stake) S a GM-Jaguar WHNP-1 (PP-LOC in (NP the British company)))))) VP pact that NP-SBJ VP *T*-1 would NP give PP-LOC NP Analysts have been expecting a GM-Jaguar NP pact that would give the U. S. car maker an the US car NP an eventual maker eventual 30% stake in the British company. the British 30% stake in company S Princeton 11/06/03 22
The same sentence, Prop. Banked Penn (S Arg 0 (NP-SBJ Analysts) (VP have (VP been Arg 1 (VP expecting Arg 1 (NP a GM-Jaguar pact) (SBAR (WHNP-1 that) (S Arg 0 (NP-SBJ *T*-1) a GM-Jaguar (VP would pact (VP give Arg 2 (NP the U. S. car maker) Arg 1 (NP an eventual (ADJP 30 %) stake) Arg 0 (PP-LOC in (NP the British that would give Arg 1 company)))))) have been expecting Arg 0 Analyst s *T*-1 Arg 2 the US car maker Princeton 11/06/03 an eventual 30% stake in the British company expect(Analysts, GM-J pact) give(GM-J pact, US car maker, 30% stake) 23
Frames File Example: expect Penn Roles: Arg 0: expecter Arg 1: thing expected Example: Transitive, active: Portfolio managers expect further declines in interest rates. Arg 0: Portfolio managers REL: expect Arg 1: further declines in interest rates Princeton 11/06/03 24
Frames File example: give Penn Roles: Arg 0: giver Arg 1: thing given Arg 2: entity given to Example: double object The executives gave the chefs a standing ovation. Arg 0: The executives REL: gave Arg 2: the chefs Arg 1: a standing ovation Princeton 11/06/03 25
Word Senses in Prop. Bank Penn · Orders to ignore word sense not feasible for 700+ verbs Ø Mary left the room Ø Mary left her daughter-in-law her pearls in her will Frameset leave. 01 "move away from": Arg 0: entity leaving Arg 1: place left Frameset leave. 02 "give": Arg 0: giver Arg 1: thing given Arg 2: beneficiary How do these relate to traditional word senses in Verb. Net and Word. Net? Princeton 11/06/03 26
Annotation procedure Penn · PTB II - Extraction of all sentences with given verb · Create Frame File for that verb Paul Kingsbury Ø (3100+ lemmas, 4400 framesets, 118 K predicates) Ø Over 300 created automatically via Verb. Net · First pass: Automatic tagging (Joseph Rosenzweig) Ø http: //www. cis. upenn. edu/~josephr/TIDES/index. html#lexicon · Second pass: Double blind hand correction Paul Kingsbury · Tagging tool highlights discrepancies Scott Cotton · Third pass: Solomonization (adjudication) Ø Betsy Klipple, Olga Babko-Malaya Princeton 11/06/03 27
Trends in Argument Numbering Penn · Arg 0 = agent · Arg 1 = direct object / theme / patient · Arg 2 = indirect object / benefactive / instrument / attribute / end state · Arg 3 = start point / benefactive / instrument / attribute · Arg 4 = end point · Per word vs frame level – more general? Princeton 11/06/03 28
Additional tags (arguments or adjuncts? ) Penn · Variety of Arg. M’s (Arg#>4): Ø TMP - when? Ø LOC - where at? Ø DIR - where to? Ø MNR - how? Ø PRP -why? Ø REC - himself, themselves, each other Ø PRD -this argument refers to or modifies another Ø ADV –others Princeton 11/06/03 29
Function tags for Chinese (arguments or adjuncts? ) Penn · Variety of Arg. M’s (Arg#>4): Ø TMP - when? Ø LOC - where at? Ø DIR - where to? Ø MNR - how? Ø PRP -why? Ø TPC – topic Ø PRD -this argument refers to or modifies another Ø ADV –others Ø CND – conditional Ø DGR – degree Ø FRQ - frequency Princeton 11/06/03 30
Additional function tags for Chinese for phrasal verbs- Penn · Correspond to groups of “prepositions” Ø AS Ø AT Ø INTO Ø ONTO Ø TOWARDS Princeton 11/06/03 31
Inflection Penn · Verbs also marked for tense/aspect Ø Ø Ø Passive/Active Perfect/Progressive Third singular (is has does was) Present/Past/Future Infinitives/Participles/Gerunds/Finites · Modals and negations marked as Arg. Ms Princeton 11/06/03 32
Frames: Multiple Framesets Penn · Framesets are not necessarily consistent between different senses of the same verb · Framesets are consistent between different verbs that share similar argument structures, (like Frame. Net) · Out of the 787 most frequent verbs: Ø 1 Frame. Net – 521 Ø 2 Frame. Net – 169 Ø 3+ Frame. Net - 97 (includes light verbs) Princeton 11/06/03 33
Ergative/Unaccusative Verbs Penn Roles (no ARG 0 for unaccusative verbs) Arg 1 = Logical subject, patient, thing rising Arg 2 = EXT, amount risen Arg 3* = start point Arg 4 = end point Sales rose 4% to $3. 28 billion from $3. 16 billion. The Nasdaq composite index added 1. 01 to 456. 6 on paltry volume. Princeton 11/06/03 34
Actual data for leave Penn · http: //www. cs. rochester. edu/~gildea/Prop. Bank/Sort/ Leave. 01 “move away from” Arg 0 rel Arg 1 Arg 3 Leave. 02 “give” Arg 0 rel Arg 1 Arg 2 sub-ARG 0 obj-ARG 1 44 sub-ARG 0 20 sub-ARG 0 NP-ARG 1 -with obj-ARG 2 17 sub-ARG 0 sub-ARG 2 ADJP-ARG 3 -PRD 10 sub-ARG 1 ADJP-ARG 3 -PRD 6 sub-ARG 0 sub-ARG 1 VP-ARG 3 -PRD 5 NP-ARG 1 -with obj-ARG 2 4 obj-ARG 1 3 sub-ARG 0 sub-ARG 2 VP-ARG 3 -PRD 3 Princeton 11/06/03 35
Penn Prop. Bank/Frame. Net Buy Sell Arg 0: buyer Arg 0: seller Arg 1: goods Arg 2: seller Arg 2: buyer Arg 3: rate Arg 4: payment More generic, more neutral – maps readily to VN, TR Rambow, et al, PMLB 03 36 Princeton 11/06/03
Annotator accuracy – ITA 84% Princeton 11/06/03 Penn 37
English lexical resource is required Penn · That provides sets of possible syntactic frames for verbs with semantic role labels? · And provides clear, replicable sense distinctions. Princeton 11/06/03 38
English lexical resource is required Penn · That provides sets of possible syntactic frames for verbs with semantic role labels that can be automatically assigned accurately to new text? · And provides clear, replicable sense distinctions. Princeton 11/06/03 39
Automatic Labelling of Semantic Relations Penn • Stochastic Model • Features: Ø Predicate Ø Phrase Type Ø Parse Tree Path Ø Position (Before/after predicate) Ø Voice (active/passive) Ø Head Word Gildea & Jurafsky, CL 02, Gildea & Palmer, ACL 02 Princeton 11/06/03 40
Semantic Role Labelling Accuracy. Known Boundaries Gold St. parses Framenet Prop. Bank ≥ 10 inst 77. 0 Automatic parses 82. 0 73. 6 Penn Prop. Bank ≥ 10 instances 83. 1 79. 6 • Accuracy of semantic role prediction for known boundaries--the system is given the constituents to classify. • Frame. Net examples (training/test) are handpicked to be unambiguous. • Lower performance with unknown boundaries. • Higher performance with traces. • Almost evens out. Princeton 11/06/03 41
Additional Automatic Role Labelers Penn · Performance improved from 77% to 88% Colorado Ø (Gold Standard parses, < 10 instances) Ø Same features plus § Named Entity tags § Head word POS § For unseen verbs – backoff to automatic verb clusters Ø SVM’s § Role or not role § For each likely role, for each Arg#, Arg# or not § No overlapping role labels allowed Pradhan, et. al. , ICDM 03, Sardeneau, et. al, ACL 03, Chen & Rambow, EMNLP 03, Gildea & Hockemaier, EMNLP 03 Princeton 11/06/03 42
Word Senses in Prop. Bank Penn · Orders to ignore word sense not feasible for 700+ verbs Ø Mary left the room Ø Mary left her daughter-in-law her pearls in her will Frameset leave. 01 "move away from": Arg 0: entity leaving Arg 1: place left Frameset leave. 02 "give": Arg 0: giver Arg 1: thing given Arg 2: beneficiary How do these relate to traditional word senses in Verb. Net and Word. Net? Princeton 11/06/03 43
Mapping from Prop. Bank to Verb. Net Penn Frameset id = leave. 02 Sense = give Verb. Net class = future-having 13. 3 Arg 0 Giver Agent Arg 1 Thing Theme given Benefactiv Recipient e Arg 2 Princeton 11/06/03 44
Mapping from PB to Verb. Net Princeton 11/06/03 Penn 45
Mapping from Prop. Bank to Verb. Net Penn · Overlap with Prop. Bank framesets Ø 50, 000 Prop. Bank instances Ø < 50% VN entries, > 85% VN classes · Results Ø MATCH - 78. 63%. (80. 90% relaxed) Ø (Verb. Net isn’t just linguistic theory!) · Benefits Ø Thematic role labels and semantic predicates Ø Can extend Prop. Bank coverage with Verb. Net classes Ø Word. Net sense tags Kingsbury & Kipper, NAACL 03, Text Meaning Workshop http: //www. cs. rochester. edu/~gildea/Verb. Net/ Princeton 11/06/03 46
Word. Net as a WSD sense inventory Penn · Senses unnecessarily fine-grained? · Word Sense Disambiguation bakeoffs Ø Senseval 1 – Hector, ITA = 95. 5% Ø Senseval 2 – Word. Net 1. 7, ITA verbs = 71% Ø Groupings of Senseval 2 verbs, ITA =82% § Used syntactic and semantic criteria Princeton 11/06/03 47
Groupings Methodology (w/ Dang and Fellbaum) Penn · Double blind groupings, adjudication · Syntactic Criteria (Verb. Net was useful) Ø Distinct subcategorization frames § call him a bastard § call him a taxi Ø Recognizable alternations – regular sense extensions: § play an instrument § play a song § play a melody on an instrument SIGLEX 01, SIGLEX 02, JNLE 04 Princeton 11/06/03 48
Groupings Methodology (cont. ) Penn · Semantic Criteria Ø Differences in semantic classes of arguments § Abstract/concrete, human/animal, animate/inanimate, different instrument types, … Ø Differences in the number and type of arguments § Often reflected in subcategorization frames § John left the room. § I left my pearls to my daughter-in-law in my will. Ø Differences in entailments § Change of prior entity or creation of a new entity? Ø Differences in types of events § Abstract/concrete/mental/emotional/…. Ø Specialized subject domains Princeton 11/06/03 49
Results – averaged over 28 verbs Dang and Palmer, Siglex 02, Dang et al, Coling 02 Penn Total WN polysemy 16. 28 Group polysemy 8. 07 ITA-fine 71% ITA-group 82% MX-fine 60. 2% MX-group 69% MX – Maximum Entropy WSD, p(sense|context) Features: topic, syntactic constituents, semantic classes 50 +2. 5%, +1. 5 to +5%, +6% Princeton 11/06/03
Grouping improved ITA and Maxent WSD Penn · Call: 31% of errors due to confusion between senses within same group 1: Ø name, call -- (assign a specified, proper name to; They named their son David) Ø call -- (ascribe a quality to or give a name of a common noun that reflects a quality; He called me a bastard) Ø call -- (consider or regard as being; I would not call her beautiful) Ø 75% with training and testing on grouped senses vs. Ø 43% with training and testing on fine-grained senses Princeton 11/06/03 51
Word. Net: - call, 28 senses, groups WN 5, WN 16, WN 12 Loud cry WN 3 WN 19 WN 1 Label WN 22 WN 18 WN 27 Challenge WN 2 WN 13 Phone/radio WN 28 WN 17 , WN 11 Princeton 11/06/03 Penn WN 15 WN 26 Bird or animal cry WN 4 WN 7 WN 8 WN 9 Request WN 20 WN 6 WN 25 Call a loan/bond WN 23 Visit WN 10, WN 14, WN 21, WN 24, Bid 52
Word. Net: - call, 28 senses, groups WN 5, WN 16, WN 12 Loud cry WN 3 WN 19 WN 1 Label WN 22 WN 18 WN 27 Challenge WN 2 WN 13 Phone/radio WN 28 WN 17 , WN 11 Princeton 11/06/03 Penn WN 15 WN 26 Bird or animal cry WN 4 WN 7 WN 8 WN 9 Request WN 20 WN 6 WN 25 Call a loan/bond WN 23 Visit WN 10, WN 14, WN 21, WN 24, Bid 53
Overlap between Groups and Framesets – 95% Penn Frameset 2 Frameset 1 WN 2 WN 3 WN 4 WN 6 WN 7 WN 8 WN 5 WN 9 WN 10 WN 11 WN 12 WN 13 WN 19 WN 14 WN 20 develop Palmer, Dang & Fellbaum, NLE 2004 Princeton 11/06/03 54
Sense Hierarchy Penn · Prop. Bank Framesets – coarse grained distinctions Ø Sense Groups (Senseval-2) intermediate level (includes Levin classes) – 95% overlap § Word. Net – fine grained distinctions Princeton 11/06/03 55
English lexical resource is available Penn ü That provides sets of possible syntactic frames for verbs with semantic role labels that can be automatically assigned accurately to new text. ü And provides clear, replicable sense distinctions. Princeton 11/06/03 56
Summary of English Prop. Bank Penn Paul Kingsbury, Olga Babko-Malaya, Scott Cotton Genre Words Frames Files Frameset Tags Released Wall Street Journal* (financial subcorpus) 300 K < 2000 400 July, 02 Wall Street Journal* (Penn Tree. Bank II) 1000 K < 4000 700 Dec, 03? (March, 03) English Translation of Chinese Tree. Bank * ITIC funding 100 K <1500 July, 04 Sinorama, English corpus NSF-ITR funding 150 K <2000 July, 05 English half of DLI Military Corpus ARL funding 50 K < 1000 July, 05 Princeton 11/06/03 57
A Chinese Treebank Sentence Penn 国会/Congress 最近/recently 通过/pass 了/ASP 银行法 /banking law “The Congress passed the banking law recently. ” (IP (NP-SBJ (NN 国会/Congress)) (VP (ADV 最近/recently)) (VP (VV 通过/pass) (AS 了/ASP) (NP-OBJ (NN 银行法/banking law))))) Princeton 11/06/03 58
The Same Sentence, Prop. Banked Penn (IP (NP-SBJ arg 0 (NN 国会)) (VP arg. M (ADVP (ADV 最近)) (VP f 2 (VV 通过) (AS 了) arg 1 (NP-OBJ (NN 银行法))))) 通过(f 2) (pass) arg 0 arg. M arg 1 国会 最近 银行法 (law) (congress) Princeton 11/06/03 59
Chinese Prop. Bank Status - (w/ Bert Xue and Scott Cotton) Penn · Create Frame File for that verb - Ø Similar alternations – causative/inchoative, unexpressed object Ø 5000 lemmas, 2000 DONE, (hired Jiang) · First pass: Automatic tagging 2000 DONE Ø Subcat frame matcher (Xue & Kulick, MT 03) · Second pass: Double blind hand correction Ø In progress (includes frameset tagging), 600 DONE Ø Ported RATS to CATS, in use since May · Third pass: Solomonization (adjudication) Princeton 11/06/03 60
Penn Summary of Chinese Prop. Bank Nianwen Xue, Meiyu Chang, Zhiyi, Ping Genre Words Frames Files Frameset Tags Released Xinhua News DOD funding 250 K 4867 200 July, 04 Sinorama NSF-ITR funding 150 K < 4000 Princeton 11/06/03 July, 05 61
A Korean Treebank Sentence Penn 그는 르노가 3 월말까지 인수제의 시한을 갖고 있다고 덧붙였다. He added that Renault has a deadline until the end of March for a merger proposal. (S (NP-SBJ 그/NPN+은/PAU) (VP (S-COMP (NP-SBJ 르노/NPR+이/PCA) (VP (NP-ADV 3/NNU 월/NNX+말/NNX+까지/PAU) (VP (NP-OBJ 인수/NNC+제의/NNC 시한/NNC+을/PCA) 갖/VV+고/ECS)) 있/VX+다/EFN+고/PAD) 덧붙이/VV+었/EPF+다/EFN) . /SFN) Princeton 11/06/03 62
The same sentence, Prop. Banked 덧붙이었다 Arg 0 그는 Arg 2 갖고 있다 Arg 0 르노가 Arg 1 Arg. M 3 월말까지 Penn (S Arg 0 (NP-SBJ 그/NPN+은/PAU) (VP Arg 2 (S-COMP ( Arg 0 NP-SBJ 르노/NPR+이/PCA) (VP ( Arg. M NP-ADV 3/NNU 월/NNX+말/NNX+까지/PAU) (VP ( Arg 1 NP-OBJ 인수/NNC+제의/NNC 시한/NNC+을/PCA) 갖/VV+고/ECS)) 있/VX+다/EFN+고/PAD) 덧붙이/VV+었/EPF+다/EFN). /SFN) 인수제의 시한을 덧붙이다 (그는, 르노가 3 월말까지 인수제의 시한을 갖고 있다) (add) (he) 갖다 (르노가, (has) Princeton 11/06/03 (Renaut has a deadline until the end of March for a merger proposal) 3 월말까지, 인수제의 시한을) (Renaut) (until the end of March) (a deadline for a merger proposal) 63
Korean Prop. Bank funded by ARO - CORK Penn · 50 K words Virginia Corpus of Korean Treebank · New semantic augmentations Ø Predicate-argument relations for predicates Ø Label arguments: Arg 0, Arg 1, Arg 2 … · Korean lexical resource – Frames files Ø Use “semantic role glosses” unique to each predicate Ø Create manually for 900 predicates Ø Refer to the hand-corrected DSynt files from Virginia · Extend to newswire domain – 130 K words Princeton 11/06/03 64
Prop. Bank II Penn · Nominalizations NYU · Lexical Frames DONE · Event Variables, (including temporals and locatives) · More fine-grained sense tagging Ø Tagging nominalizations w/ Word. Net sense Ø Selected verbs and nouns · Nominal Coreference Ø not names · Clausal Discourse connectives – selected subset Princeton 11/06/03 65
Prop. Bank II Event variables; Penn sense tags; nominal reference; discourse connectives {Also} [Arg 0 substantially lower Dutch corporate tax rates] , helped [Arg 1[Arg 0 the company] keep [Arg 1 its tax outlay] [Arg 3 PRD flat] [Arg. M-ADV relative to earnings growth]]. ID# REL h 23 help tax rates help 2, 5 tax rate 1 the company keep its tax outlay flat k 16 keep 1 its tax outlay Princeton 11/06/03 Arg 0 the company 1 company Arg 1 Arg 3 PRD Arg. M-ADV flat relative to earnings… 66
Summary of Multilingual Tree. Banks, Prop. Banks Parallel Text Corpora Chinese Treeban k Arabic Treeban k Korean Treeban Princeton 11/06/03 Treebank Penn Prop. Bank Prop II I Chinese 500 K Ch 100 K English 400 K English 100 K English 350 K En 100 K Arabic 500 K English 500 K Arabic 500 K English ? ? ? Korean 180 K English 50 K 67
Levin class: escape-51. 1 -1 Penn · Word. Net Senses: WN 1, 5, 8 · Thematic Roles: Location[+concrete] Theme[+concrete] · Frames with Semantics Basic Intransitive "The convict escaped" motion(during(E), Theme) direction(during(E), Prep, Theme, ~Location) Intransitive (+ path PP) "The convict escaped from the prison" Locative Preposition Drop "The convict escaped the prison" Princeton 11/06/03 68
Levin class: future_having-13. 3 Penn · Word. Net Senses: WN 2, 10, 13 · Thematic Roles: Agent[+animate OR +organization] Recipient[+animate OR +organization] Theme[] · Frames with Semantics Dative "I promised somebody my time" Agent V Recipient Theme has_possession(start(E), Agent, Theme) future_possession(end(E), Recipient, Theme) cause(Agent, E) Transitive (+ Recipient PP) "We offered our paycheck to her" Agent V Theme Prep(to) Recipient ) Transitive (Theme Object) "I promised my house (to somebody)" Agent V Theme Princeton 11/06/03 69
Actual data for leave Penn · http: //www. cs. rochester. edu/~gildea/Prop. Bank/Sort/ Leave. 01 “move away from” Arg 0 rel Arg 1 Arg 3 Leave. 02 “give” Arg 0 rel Arg 1 Arg 2 sub-ARG 0 obj-ARG 1 44 sub-ARG 0 20 sub-ARG 0 NP-ARG 1 -with obj-ARG 2 17 sub-ARG 0 sub-ARG 2 ADJP-ARG 3 -PRD 10 sub-ARG 1 ADJP-ARG 3 -PRD 6 sub-ARG 0 sub-ARG 1 VP-ARG 3 -PRD 5 NP-ARG 1 -with obj-ARG 2 4 obj-ARG 1 3 sub-ARG 0 sub-ARG 2 VP-ARG 3 -PRD 3 Princeton 11/06/03 70
SENSEVAL – Word Sense Disambiguation Evaluation Penn DARPA style bakeoff: training data, testing data, scoring algorithm. Languages Systems Eng. Lexical Sample Verbs/Poly/Instanc es Princeton 11/06/03 SENSEVAL 1 1998 3 24 Yes 13/12/215 SENSEVAL 2 2001 12 90 Yes 29/16/110 NLE 99, CHUM 01, NLE 02, NLE 03 71
Maximum Entropy WSD Hoa Dang, best performer on Verbs Penn · Maximum entropy framework, p(sense|context) · Contextual Linguistic Features ØTopical feature for W: § keywords (determined automatically) ØLocal syntactic features for W: § presence of subject, complements, passive? § words in subject, complement positions, particles, preps, etc. ØLocal semantic features for W: § Semantic class info from Word. Net (synsets, etc. ) § Named Entity tag (PERSON, LOCATION, . . ) for proper Ns § words within +/- 2 word window Princeton 11/06/03 72