e3c3136e0a086ced71b65466341ea4ab.ppt
- Количество слайдов: 30
Whose presentation is this? SUBJ(present, Violeta Seretan) (Decoding the predicate-argument structure of nominalizations) OBL(collaborate, Lorenzo Thione) PP-OBJ(with, Lorenzo Thione) 10/25/2005 SUBJ(supervise, Martin van den Berg)
Overview n n n nominalization problem NOMLEX resource Denominalizer service based on NOMLEX additional resources (CSLI) APIs for NOMLEX, CSLI related and future work demo 10/25/2005 2
Text normalization for QA n Mark Twain published Adventures of Huckleberry Finn in 1885 in America. – – – n Who published H. F. ? Where was H. F. published? When was H. F. published? QA/NLU needs to deal with a large spectrum of variation in text: 1. 2. 3. 4. n Normalization (via parsing): 1. 2. n base word form: publishes -> publish; published -> publish canonical word order: SUBJ(publish, Mark Twain); OBJ(publish, H. F. ) Lexical semantic resources: 3. n morphological: published, publishes syntactic: H. F. was published lexical: {novel, book, masterpiece, work} {publish, write, author, appear} nominalization: the publication synonyms, hyponyms, hypernyms , … What about nominalization? 10/25/2005 3
Nominalization Since the publication of Huckleberry Finn in 1885, there have been many reactions to the novel, some of them quite extreme. – When was H. F. published? deverbal noun publication of Huckleberry Finn nominalization OBJ(publish, Huckleberry Finn) matrix verb Nominalization : NP having “a systematic correspondence with a clause structure” (Quirk et al. 1985) Goal: decoding the clause structure 10/25/2005 4
Mapping nominal arguments into verbal roles n Mark Twain’s publication of his book possessive determiner PP adjunct (nominal arguments) n the book publication by Mark Twain modifier PP adjunct (nominal arguments) Mark Twain - publish – book SUBJECT 10/25/2005 OBJECT (verbal roles) 5
Role ambiguity Rome’s destruction – SUBJ or OBJ? OBJ(destroy, Rome) SUBJ(destroy, Rome) A. B. Rome’s destruction by barbarians Rome’s destruction of Carthage OBJ SUBJ Rome’s destruction – OBJ (by default) John’s admiration – SUBJ (by default) 10/25/2005 6
NOMLEX – NOMinalization LEXicon n Macleod et al. , New York University 1’ 025 deverbal nouns detailed mapping from nominal arguments to verb roles : ORTH "destruction" role to assign : VERB "destroy" : VERB-SUBC ((NOM-NP : SUBJECT ((N-N-MOD) (DET-POSS) (PP : PVAL ("by"))) : OBJECT ((DET-POSS) (N-N-MOD) (PP : PVAL ("of"))) : REQUIRED ((OBJECT : DET-POSS-ONLY T : N-N-MOD-ONLY T)))) default role 10/25/2005 7
NOMLEXML (NOM : ORTH "accusation" : PLURAL "accusations" : PLURAL-FREQ "not rare" : VERB "accuse" : NOUN-SUBC ((NOUN-PP : PVAL ("about"))) : NOM-TYPE ((VERB-NOM)) : VERB-SUBJ ((DET-POSS) (N-N-MOD) (PP : PVAL ("by"))) Perl : SUBJ-ATTRIBUTE ((COMMUNICATOR)) : OBJ-ATTRIBUTE ((COMMUNICATOR)) : VERB-SUBC ((NOM-NP-PP : SUBJECT ((DET-POSS) (N-N-MOD) (PP : PVAL ("by"))) : OBJECT ((PP : PVAL ("against"))) : PVAL ("of")) (NOM-NP : SUBJECT ((DET-POSS) … 10/25/2005 8
NOMLEX API in Java com. fxpal. sake. test (Nom. Lex. Interface) com. fxpal. ltng. services. normalization. noun. nomlex (Nom. Lex, Nom. Lex. Entry, Nom. Lex. Class. Constants, Subcat) 10/25/2005 9
How useful? Oracle acquired People. Soft at the end of last year. Oracle’s acquisition of People. Soft at the end of last year… Google hits, 10/25/2005: "Oracle acquisition of People. Soft" ~14’ 500 "Oracle acquired People. Soft" 587 "Oracle's People. Soft acquisition" 693 More hits: "Oracle acquires People. Soft" 1’ 020 "Oracle has acquired People. Soft" "Oracle will acquire People. Soft" 10/25/2005 248 424 10
Argument-role mapping Oracle's acquisition of People. Soft possessive PP (of ) : ORTH "acquisition" : VERB "acquire" : VERB-SUBC ((NOM-NP : SUBJECT ((DET-POSS) (N-N-MOD) (PP : PVAL ("by"))) : OBJECT ((N-N-MOD) (PP : PVAL ("of")))) 10/25/2005 SUBJ(acquire, Oracle) Oracle OBJ(acquire, People. Soft) People. Soft 11
Denominalizer n n Input: Output: sentence pairs nominal argument – verb role for each nominalization (noun, (argument –role)*)* Exemples: • Oracle's acquisition of People. Soft finally materialized after an 18 months struggle between the two companies. (acquisition, (Oracle - SUBJECT) (People. Soft - OBJECT)) • Oracle acquisition finally materialized. (acquisition, (Oracle - SUBJECT) (Oracle - OBJECT)) 10/25/2005 12
Algorithm com. fxpal. ltng. services. normalization. noun. * parse sentence for each deverbal noun get noun arguments for each NOMLEX entry for noun for each subcat of the entry 1. match arguments against subcat 2. filter assignment results select a subcat output assignments for selected subcat Note: 10/25/2005 overlapping nominalizations ok: an increase in product sales 13
1. Matching Oracle's acquisition of People. Soft finally materialized. Arguments (acquisition): POSS(acquisition, Oracle) ADJUNCT(acquisition, of) PP-OBJ(of, People. Soft) NOM-NP : SUBJECT : OBJECT 10/25/2005 ((DET-POSS) (N-N-MOD) (PP : PVAL ("by"))) ((N-N-MOD) (PP : PVAL ("of"))) 14
2. Filtering Oracle's People. Soft acquisition finally materialized. Arguments (acquisition): POSS(acquisition, Oracle) MOD(acquisition, People. Soft) NOM-NP SUBJECT OBJECT 10/25/2005 ((DET-POSS) (N-N-MOD) (PP : PVAL ("by"))) ((N-N-MOD) (PP : PVAL ("of"))) Alternatives: Oracle: SUBJECT People. Soft: SUBJECT, OBJECT 15
NOMLEX constraints (1) n Uniqueness Constraint: A verbal role may be filled only once. Oracle's People. Soft acquisition Matching alternatives: Oracle: SUBJECT People. Soft: SUBJECT, OBJECT 10/25/2005 16
NOMLEX constraints (2) n Ordering Constraint: If there are multiple pre-nominal arguments, they must appear in the order: SUBJECT, INDIRECT OBJECT, OBLIQUE. FX’s printer sales grew by 50%. Matching alternatives: FX: printer: order: verbal roles: 10/26/2005 SUBJECT, OBJECT FX, printer SUBJECT, OBJECT 17
NOMLEX constraints (3) n Obligatoriness Constraint: By default, the subject and object are optional. A NOMLEX entry can specify obligatory roles to be filled. circulation - REQUIRED (SUBJECT) blood circulation SUBJ(circulate, blood) destruction - REQUIRED ((OBJECT : DET-POSS-ONLY T : N-N-MOD-ONLY T)))) Rome’s destruction OBJ(destroy, Rome) 10/25/2005 18
Selectional Restrictions com. fxpal. ltng. services. normalization. noun. csli (Nouns, Verbs, Nouns. Verbs) 10/25/2005 19
Applying selectional restrictions n room reservation Alternatives: room - SUBJECT, OBJECT reserve - selectional restrictions: SUBJECT: sentient; OBJECT: * room - location, physobj n n semantic types for about 5000 N selectional restrictions for about 5000 V 459/941 verbs from NOMLEX (48. 77%) 10/25/2005 20
Coverage extension n What if a noun is not in NOMLEX? 1. additional deverbal nouns in the CSLI data 2. NOMLEX template: 4’ 087 “event nouns” 3348 new, 739 already in NOMLEX 3348/1025 326% more data NOM-NP : SUBJECT : OBJECT 10/25/2005 ((DET-POSS) (N-N-MOD) (PP : PVAL ("by"))) ((DET-POSS) (N-N-MOD) (PP : PVAL ("of"))) 21
Future work n extensive test and evaluation n other nominalization data n other lexical resources – deverbal noun recognition – mapping information (Frame. Net) Prop. Bank – semantic roles Verb. Lex – selectional restrictions n role assignment in context – word sense disambiguation, anaphora, discourse – collocations the author will make no accusation SUBJ(make, author) -> SUBJ (accuse, author) 10/25/2005 22
Related work n n PUNDIT system (Dahl et al. , 1987) SNOWY QA system (Hull and Gomez 1996) NOMLEX for IE (Meyers et al. , 1998) N-N interpretation (Lapata 2002, Girju et al. 2004) 10/25/2005 23
References n n n n Dahl, Deborah A. , Palmer, Martha S. ; and Passonneau, Rebecca J. 1987. "Nominalizations in PUNDIT. " Proceedings of the 25 th Annual Meeting of the Association for Computational Linguistics, Stanford, CA. Girju, Roxana, Ana-Maria Giuglea, Marian Olteanu, Ovidiu Fortu, Orest Bolohan, and Dan Moldovan. Support vector machines applied to the classification of semantic relations in nominalized noun phrases. In Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics, 2004. Hull, Richard and Fernando Gomez (1996). Semantic Interpretation of Nominalizations. PDF Format. Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, Oregon, August, 1996, pp. 1062 -8. Lapata, Maria. 2002. The Disambiguation of Nominalisations. Computational Linguistics 28: 3, 357 -388. Macleod, Catherine, Ralph Grishman, Adam Meyers, Leslie Barrett, and Ruth Reeves. 1998. Nomlex: A lexicon of nominalizations. In Proceedings of the 8 th International Congress of the European Association for Lexicography, pages 187– 193, Li ège, Belgium. Meyers A. , et al. Using NOMLEX to produce nominalization patterns for information extraction. In Proceedings of the COLING-ACL Workshop on Computational Treatment of Nominals, 1998. Quirk, S. R. , Greenbaum, G. Leech, and J. Svartvik. 1985. A comprehensive grammar of English language, Longman, Harlow. Terada Akira, Tokunaga Takenobu. Corpus based method of transforming nominalized phrases into clauses for text mining application. IEICE Transactions on Information and Systems. Vol. E 86 -D. No. 9. pp. 1736 -- 1744. 2003. 10/25/2005 24
Thank you! 10/25/2005 25
Selectional restrictions data n CSLI resource: – nouns n semantic types (ontology) – verbs n n 10/25/2005 4858 subcategorizations selectional restrictions – noun-verb n 4447 5700 V (9415 N) noun-verb pairs 26
Grammatical Transfer NOMLEX XLE Example DET-POSS Rome's destruction PP ADJUNCT, PP-OBJ (POS=NOUN) destruction of Carthage TO-INF XCOMP the desire to leave AS-NPPHRASE ADJUNCT, PP-OBJ (as, POS=NOUN) his resignation as chairman N-N-MOD the room reservation P-ING ADJUNCT, PP-OBJ (POS=VERB) the accusation against launching ING ADJUNCT, QA_PROG(+) my appreciation being there FOR-TO-INF ADJUNCT, SUBJ the wish for him to go ADVP ADJUNCT (POS=ADV) his departure abroad AS-ING ADJUNCT, PP-OBJ (as, POS=VERB), QA_PROG(+) characterization as being AS-ADJP ADJUNCT, PP-OBJ (as, POS=ADJ) the characterization as useful P-POSSING ADJUNCT, PP-OBJ(POS=VERB), POSS the acceptance of his talking 10/25/2005 27
Frame. Net n n aim: word – semantico-syntactic mapping semantic roles: frame elements (frame-specific) BNC corpus (100 M words); American English – LDC, ANC more than 600 frames, about 9. 000 words Example: accusation frame: Judgment_communication FE (for this word) and their realization: communicator not expressed (27/48) possessive determiner (6/48) PP (from) (2/48) … 10/25/2005 evaluee not expressed (40/48) PP (against) (5/48) PP (about) (3/48) … reason PP (of) (9/48) S (that) (9/48) not expressed (8/48) … PP (about) (3/48) … 28
NOMLEX constraints (4) n restrictions on possible combinations – specified in NOMLEX entry adaptation : NOT ((AND : SUBJECT ((DET-POSS) (N-N-MOD)) : OBJECT ((N-N-MOD)) *plants' weather adaptation plants’ adaptation to weather Note: Not implemented (cannot decide which assignment to remove). 10/25/2005 29
Denominalizer UI com. fxpal. sake. test. Denominalizer. Test parse triples output 10/25/2005 30
e3c3136e0a086ced71b65466341ea4ab.ppt