
7e90fec981a651fb41fc605cc47b1861.ppt
- Количество слайдов: 127
Ontologies German Rigau i Claramunt http: //www. lsi. upc. es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya
Ontologies Outline • Word. Net (Miller et al. 90, Fellbaum 98) • Euro. Word. Net (Vossen et al. 98) • Spanish Word. Net • Combining Methods (Atserias et al. 97) • Mapping hierarchies (Daudé et al. 01) • Mikrokosmos (Viegas et al. 96) • Cyc (Malesh et al. 96) • Word. Net 2 (Harabagiu 98) • Mind. Net (Richardson et al. 97) • Thought. Treasure (Mueller 00) • Meaning. . .
Word. Net & Euro. Word. Net German Rigau i Claramunt http: //www. lsi. upc. es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya
Word. Net & Euro. Word. Net • Universidad de Princeton (Miller et al. 1990) • Conceptos lexicalizados (parabras, lexíes) • Relacionados entre sí por relaciones semánticas • sinonimia • antonimia • hiperonimia-hiponimia • meronimia • implicación • causa • . . .
Word. Net & Euro. Word. Net Relaciones Semánticas de WN 1. 5 • Sinonimia • Conceptos Lexicalizados (SYNSETS) • Noción débil de sinonimia: Sinonimia en contexto • Synset: Conjunto de palabras o lexías que en un contexto dado expresan un concepto • Hiperonimia / Hiponimia • Relación de clase a subclase
Word. Net & Euro. Word. Net Relacions Semàntiques de WN 1. 5 • Meronimias • Parte componente {mano} {brazo} • Elemento de colectividad {persona} {gente} • Sustancia {periódico} {papel}
Word. Net & Euro. Word. Net Relaciones Semánticas de WN 1. 5 • Antonimia {grande} {pequeño} • Causa {matar} {morir} • Implicación {divorciarse} {casarse} • Derivación {presidencial} {presidente} • Similitud {bueno} {positivo}
Word. Net & Euro. Word. Net Ejemplo Word. Net
Word. Net & Euro. Word. Net • Proyecto LE-2 4003 • Telematics Application Programme de la UE • Redes semánticas de diversas lenguas • Integradas e interconectadas • Inglés • Holandés • Italiano • Español Universidad de Sheffield Univ. de Amsterdam I. L. C. de Pisa UB, UPC, UNED. • Computers and the Humanities • (Vol. monográfico, 1998) • http: //www. hum. uva. nl/~ewn/
Word. Net & Euro. Word. Net Extensiones Euro. Word. Net • EWN 2 Alemán, Francés, Checo, Sueco, Estonio • Proyecto ITEM Castellano, Catalán, Vasco • CREL (Centre de Referència d’Enginyeria Lingüística) Catalán (UB, UPC)
Word. Net & Euro. Word. Net Aplicaciones • Desarrollo de recursos Básicos • Tratamiento interlingüístico de la información - Sistemas multilingües de recuperación de información (p. e. , Internet) - Módulo léxico-semántico de los sistemas de ingeniería lingüística Extracción de información Traducción automática
Word. Net & Euro. Word. Net Requisitos de Diseño • Preservación de las relaciones semánticas específicas de cada lengua • Máxima compatibilidad entre los diferentes recursos • Relativa independencia de los Word. Nets • en el proceso de construcción • en el resultado final
Word. Net & Euro. Word. Net Componentes de Euro. Word. Net • Núcleo • El ILI • La Top Concept Ontology (TCO) • Ontología de dominios (DO) • Periferia • Word. Nets específicos
Word. Net & Euro. Word. Net Interlingual Index of Euro. Word. Net • Colección no estructurada de elementos • Ligados con • al menos, un synset de un EWN • un elemento de la TCO o DO • Asociados a synsets de WN 1. 5
Word. Net & Euro. Word. Net Top Concept Ontology of Euro. Word. Net • Jerarquía de conceptos independientes de la lengua • distinciones semánticas: objeto, lugar, dinámico, … • abstracta (no léxica) • Superpuesta al ILI • Tres tipos de entidades: • Primer orden: entidades concretas • Segundo orden: situaciones estáticas o dinámicas • Tercer orden: proposiciones abstractas
Word. Net & Euro. Word. Net Top Concept Ontology of Euro. Word. Net
Word. Net & Euro. Word. Net Domain Ontology of Euro. Word. Net • Jerarquía de etiquetas de dominio • Reducción de la polisemia • Dominios: • Tráfico rodado, tráfico aéreo • Información Internacional • Micología • Medicina
Word. Net & Euro. Word. Net Relaciones de Euro. Word. Net • Riqueza superior a WN • Entre: • synsets (módulos monolingües) • registros ILI (multilingües): {actuar-1} EQ-SYNONYM {‘behave in a certain manner’} • registros ILI y TCO o OD
Word. Net & Euro. Word. Net Relaciones Interlingüísticas de Euro. Word. Net
Word. Net & Euro. Word. Net Relaciones de Euro. Word. Net
Spanish Word. Net: Building Process German Rigau i Claramunt http: //www. lsi. upc. es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya
Spanish Word. Net General Methodology 1) Mapping to WN 1. 5 manual work n automatic derivation of equivalents, using bilingual dictionaries n 2) Manual correction 3) Re-structuring
Spanish Word. Net Main Steps: First Core (Manual Translation) – Nouns: A) WN 1. 5’s Tops File plus first level of hyponyms (about 800 synsets). n B) The rest of EWN’s Common Base Concepts (which were not in our set). n C) Manual translation of synsets intermediate between (A) and (B) following WN 1. 5 hyerarchy ¾thus building a compact taxonomy equivalent to WN 1. 5 without gaps¾ n – Verbs: n Manual translation of EWN’s Base Concepts (about 150 synsets)
Spanish Word. Net Main Steps: Subset 1 (Semi-automatic) n Nouns: – Applying authomatic methods using bi-lingual dictionaries – Manual validation of several subsets to check if the link is correct – Deriving a Confidence Score (CS) for every authomatic method (heuristic) – Selecting pairs synset-word above 85% CS – Some manual correction of this Subset 1 (mainly, filling gaps) n Verbs: – 3600 English verbs connected to WN 1. 5 senses and ambiguously translated to Spanish are manually inspected and disambiguated
Spanish Word. Net Main Steps: Subset 1 (Results 1)
Spanish Word. Net Main Steps: Subset 1 (Results 2)
Spanish Word. Net Main Steps: Subset 2 Main goals n enhance the quality of the Subset 1 by manual revision n extend it by manual building of synsets n 4 Sub-tasks
Spanish Word. Net Main Steps: Subset 2 1) Covering manually those gaps in the hyponymy chains covered by other languages 2) Manual cleaning of some automatically-generated variants. – (a) pairs of synsets which are adjacent in the hyponymy chain and share at least one variant. n deleting redundant variants n re-locating to either pre-existant or newly created synsets – (b) multi-word expressions present in synsets. n Deleting non-lexicalized
Spanish Word. Net Main Steps: Subset 2 3) Manual addition of new vocabulary which has been considered relevant. – It mainly comes from the Catalan Word. Net: since we are building both wordnets in parallell, we detected those synsets which were built for Catalan and not for Spanish 4) Manual addition of cross-part of speech relations between nominal and verbal synsets. – This work has been based mainly on noun-verb pairs obtained by means of morphological criteria. (Work carried out by UNED –Madrid-)
Spanish Word. Net Main Steps: Subset 2 (Results)
Spanish Word. Net Main Steps: Subset 2 (Results)
Spanish Word. Net Main Steps: Beyond Subset 2 n Massive Manual Checking (from Nov’ 98) – Using WEI – Variants automatically generated – Filling gaps in the hierachy – New vocabulary – New Adjectives
Spanish Word. Net Main Steps: Beyond Subset 2
Spanish Word. Net Main Steps: Beyond Subset 2
Spanish Word. Net Main Steps: Parole Coverage
Spanish Word. Net Current Figures – Spanish, Catalan, Basque, (English) – http: //nipadio. lsi. upc. es/wei 2. html
Combining Multiple Methods for the Automatic Construction of Multilingual Word. Nets German Rigau i Claramunt http: //www. lsi. upc. es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya
Combining Multiple Methods. . . Outline n Ten class methods – Four monosemic criteria – Four polysemic criteria – two hybrid criteria n Three conceptual distance methods – CD 1: using pairwise word coocurrences – CD 2: using headword and genus – CD 3: using bilingual Spanish entries with multiple translations
Combining Multiple Methods. . . Ten class methods – Four Classes SW EW EW SW SW EW
Combining Multiple Methods. . . Ten class methods – Four monosemic criteria SW EW Synset SW EW Synset SW SW
Combining Multiple Methods. . . Ten class methods – Four polysemic criteria SW EW Synset+ SW EW Synset+ SW SW
Combining Multiple Methods. . . Ten class methods – Variant criterion <. . . , EW, . . . > SW – Field criterion <. . . , headword-EW, . . . , Ind-EW, . . . > SW
Combining Multiple Methods. . . Ten class methods n Results
Combining Multiple Methods. . . Conceptual Distance methods n Conceptual Distance (Agirre et al. 94) – length of the shortest path – specificity of the concepts using Word. Net n Bilingual dictionary n
Combining Multiple Methods. . . Conceptual Distance methods n Three conceptual distance methods – CD 1: using pairwise word coocurrences – CD 2: using headword and genus – CD 3: using bilingual Spanish entries with multiple translations
Combining Multiple Methods. . . Conceptual Distance methods (Example CD 2)
Combining Multiple Methods. . . Conceptual Distance methods (Example CD 2)
Combining Multiple Methods. . . Three CD methods n Results
Combining Multiple Methods. . . Combining methods n Results
Combining Multiple Methods. . . Resulting Spanish Word. Nets
Mapping Conceptual Hierarchies Using Relaxation Labelling German Rigau i Claramunt TALP Research Center UPC
Mapping Conceptual Hierarchies using Relaxation Labelling Outline – Setting – Relaxation Labelling Algorithm – Constraints – Experiments & Results I (multilingual) – Experiments & Results II (monolingual) – Further work
Mapping Conceptual Hierarchies using Relaxation Labelling Setting C 1 C 2 C 3 C 4 C 5 C 6
Mapping Conceptual Hierarchies using Relaxation Labelling Setting C 1 C 2 C 3 C 4 C 5 C 6
Mapping Conceptual Hierarchies using Relaxation Labelling Setting Connecting already existing Hierarchies – Relaxattion labelling Algorithn – Constraints Between – Spanish taxonomy automatically derived from an MRD (Rigau et al. 98) – Word. Net n using a bilingual MRD
Mapping Conceptual Hierarchies using Relaxation Labelling Setting animal (Tops
Mapping Conceptual Hierarchies using Relaxation Labelling Outline – Setting – Relaxation Labelling Algorithm – Constraints – Experiments & Results I (multilingual) – Experiments & Results II (monolingual) – Further work
Mapping Conceptual Hierarchies using Relaxation Labelling Algorithm – Iterative algorithm for function optimization based on local information – it can deal with any kind of constraints variables (senses of the taxonomy) n labels (synsets) n – Finds a weight assignment for each possible label for each variable weights for the labels of the same variable add up to one n weigth assignation satisfies -to the maximum possible extent- the set of constraints n
Mapping Conceptual Hierarchies using Relaxation Labelling Algorithm 1) Start with a random weight assigment 2) Compute the support value for each label of each variable (according to the constraints) 3) Increase the weights of the labels more compatible with context and decrease those of the less compatible labels. 4) If a stopping/convergence is satisfied, stop, otherwiese go to step 2.
Mapping Conceptual Hierarchies using Relaxation Labelling Outline – Setting – Relaxation Labelling Algorithm – Constraints – Experiments & Results I (multilingual) – Experiments & Results II (monolingual) – Further work
Mapping Conceptual Hierarchies using Relaxation Labelling Constraints – Rely on the taxonomy structure – Coded with three characters X: Spanish Taxonomy, I (immediate), n Y: English Taxonomy, A (ancestor) n X: Relation, E (hypernym), O (hyponym), B (both) n – Examples: IIE AAB + +
Mapping Conceptual Hierarchies using Relaxation Labelling Hierarchical Constraints – II Constraints IIE NAACL’ 2001 IIO IIB
Mapping Conceptual Hierarchies using Relaxation Labelling Hierarchical Constraints – AI Constraints + + + AIE NAACL’ 2001 + AIO AIB
Mapping Conceptual Hierarchies using Relaxation Labelling Hierarchical Constraints – IA Constraints + + + IAE NAACL’ 2001 IAO + IAB
Mapping Conceptual Hierarchies using Relaxation Labelling Hierarchical Constraints – AA Constraints + + AAE NAACL’ 2001 + AAO + + + AAB
Mapping Conceptual Hierarchies using Relaxation Labelling Outline – Setting – Relaxation Labelling Algorithm – Constraints – Experiments & Results I (multilingual) – Experiments & Results II (monolingual) – Further work
Combining Multiple Methods. . . RANLP’ 97 Eight class methods – Four monosemic criteria Prec. Cov. SW EW Synset 92% 5% SW EW Synset 89% 1% EW Synset 89% 2% SW EW Synset 85% 4% SW EW Synset SW SW
Combining Multiple Methods. . . RANLP’ 97 Eight class methods – Four polysemic criteria Prec. Cov. SW EW Synset+ 80% 8% SW EW Synset+ 75% 2% EW Synset+ 58% 17% SW EW Synset+ 61% 60% SW EW Synset+ SW SW
Combining Multiple Methods. . . RANLP’ 97 Experiments & Results Poly total TOK, FOK TOK, FNOK animal 279 (90%) 30 (91%) 209 (90%) food 166 (94%) 3 (100%) 169 (94%) cognition 198 (67%) 27 (90%) 225 (69%) communication 533 (77%) 40 (97%) 573 (78%) all total animal (90%) TOK, FOK 424 (93%) TOK, FNOK 62 (95%) 486
Combining Multiple Methods. . . RANLP’ 97 Experiments & Results piel (substance
Mapping Conceptual Hierarchies using Relaxation Labelling Outline – Setting – Relaxation Labelling Algorithm – Constraints – Experiments & Results I (multilingual) – Experiments & Results II (monolingual) – Further work
A Complete WN 1. 5 to WN 1. 6 Mapping. . . ACL’ 00, NAACL’ 01 Generalized Constraints n All Relationships – also-see, similar-to, attribute, antonym, etc. R R
A Complete WN 1. 5 to WN 1. 6 Mapping. . . ACL’ 00, NAACL’ 01 Generalized Constraints n Non-structural constraints – W: number of word coincidences – G: word coincidences in glosses – F: number of frame coincidences (verbs)
A Complete WN 1. 5 to WN 1. 6 Mapping. . . ACL’ 00, NAACL’ 01 POS mapping depencences Nouns Adjectives Verbs Adverbs
A Complete WN 1. 5 to WN 1. 6 Mapping. . . ACL’ 00, NAACL’ 01 Constraints for Verbs n Structural constraints – – – n hyper/hyponymy antonymy also-see Non-structural constraints – W, G and F
A Complete WN 1. 5 to WN 1. 6 Mapping. . . ACL’ 00, NAACL’ 01 Constraints Adjectives n Structural constraints – Adj-to-Adj n antonymy, similar-to and also-see – Adj-to-Verb n participle-of – Adj-to-Noun n n pertains and attribute Non-structural constraints – W and G
A Complete WN 1. 5 to WN 1. 6 Mapping. . . ACL’ 00, NAACL’ 01 Constraints Adverbs n Structural constraints – Adv-to-Adv n antonymy – Adv-to-Adj n n derived Non-structural constraints – W and G
A Complete. . . ACL’ 00, NAACL’ 01 Example extra-POS WN 1. 5 02025107 a evangelical evangelistic pertainym 04237485 n Gospels evangel WN 1. 6 00843344 a evangelical evangelistic Similar to 00842521 a enthusiastic 02025107 a evangelical pertainym 04853575 n Gospels evangel
A Complete WN 1. 5 to WN 1. 6 Mapping. . . ACL’ 00, NAACL’ 01 Example extra-POS WN 1. 5 00057615 r impossibly absurdly WN 1. 6 00294844 r impossibly derived from 01393725 a impossible 01752468 a impossible antonym 00294658 a possibly
A Complete WN 1. 5 to WN 1. 6 Mapping. . . ACL’ 00, NAACL’ 01 Results n Basic constraint set: structural constraints – Nouns: AA hyper/hyponym – Verbs: AA hyper/hyponym, II also-see – Adjectives: II antonymy, similar-to, also-see – Adverbs: II antonymy
A Complete WN 1. 5 to WN 1. 6 Mapping. . . ACL’ 00, NAACL’ 01 Results n Basic constraint set: structural constraints N V A Coverage 99. 7% 96. 9% 94. 1% Ambigous 94. 9% - 99. 6% 93. 5% - 99. 2% 82. 8% - 98. 9% Overall 97. 6% - 99. 8% 94. 6% - 99. 2% 89. 5% - 99. 4% R 80. 8% 97. 5% - 100% 99. 0% - 100% Precision - recall
A Complete WN 1. 5 to WN 1. 6 Mapping. . . ACL’ 00, NAACL’ 01 Results n Basic constraint set + W, G and F for verbs N V A Coverage 99. 9% 99. 8% 98. 9% R 99. 5% Ambigous Overall 97. 5% - 97. 7 % 98. 8% - 98. 9% 99. 4% - 99. 7% 99. 3% - 99. 6% 96. 5% - 98. 8% 97. 9% - 99. 3% 97. 5% - 100% 99. 0% - 100% Precision - recall
A Complete WN 1. 5 to WN 1. 6 Mapping. . . ACL’ 00, NAACL’ 01 Results n Basic + extra-POS relationships Ambigous Overall N V A Coverage 95. 8% - 98. 9% 90. 9% - 99. 4% R 88. 0% 69. 2% - 94. 2% 97. 9% - 98. 1% Precision - recall
A Complete WN 1. 5 to WN 1. 6 Mapping. . . ACL’ 00, NAACL’ 01 Results n Basic + extra-POS relationships + WGF N V A Coverage 99. 9% 99. 8% 99. 0% R 99. 6% Ambigous Overall 97. 5% - 97. 7 % 98. 8% - 98. 9% 99. 4% - 99. 7% 99. 3% - 99. 6% 96. 5% - 99. 1% 97. 9% - 99. 5% 98. 3% - 100% 99. 3% - 100% Precision - recall
Mapping Conceptual Hierarchies using Relaxation Labelling Conclusions – First complete mapping between Wordnet versions – Combining structural and non-structural information – Robust approach based on local information, but with global effects – Incremental POS approach – http: //www. lsi. upc. es/~nlp – 90 downloads (since November 2000)
Mapping Conceptual Hierarchies using Relaxation Labelling Further Work – mapping other structures WN-EDR, WN-LDOCE, etc. n Other language taxonomies to Euro. Word. Net n – Spanish. EWN to WN 1. 6 – symmetrical philosophy rather than sourcetarget
Mikrokosmos German Rigau i Claramunt http: //www. lsi. upc. es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya
Mikrokosmos Outline • Introduction • Representational Issues • The Lexicon • The Ontology • Acquisition Process • Lexicon Acquisition • Guidelines • Ontology/Lexicon Trade-off • Semantics in Action
Mikrokosmos Introduction • Knowledge Base Machine Translation (KBMT) • CRL, NMSU • 5, 000 concepts • Events • Objects • Properties • 7, 000 Spanish word senses • 40, 000 word senses • after expansion with productive Lexical Rules • comprar -> comprador, comprable, . . . • Text Meaning Representation
Mikrokosmos Representational Issues: The Lexicon • Typed Feature Structures (Pollard and Sag 87) • language-dependant • 10 zones • phonology • orthography • morphology • Syntactic (subcategorization) • Semantic (Lexical Semantic Representation) • syntax-semantic linking • stylistics • paradigmatic • syntacmatic
Mikrokosmos Representational Issues: The Lexicon Adquirir-V 1 syn: subj: cat: obj: cat: sem: acquire agent: theme: Adquirir-V 2 syn: subj: cat: obj: cat: sem: acquire agent: theme: NP NP HUMAN OBJECT NP NP HUMAN INFORMATION
Mikrokosmos Representational Issues: The Ontology • Taxonomic multi-hierarchical • 14 local or inherited links in average • language-impartial • EVENTS, OBJECTS, PROPERTIES • Methodology & Guidelines
Mikrokosmos Representational Issues: The Ontology • ACQUIRE DEFINITION “The transfer of possession event where the agent transfers an object to its possession” IS - A TRANSFER-POSSESSION SOURCE HUMAN PLACE THEME OBJECT (NOT HUMAN) AGENT ANIMAL (DEFAULT HUMAN) DESTINATION ANIMAL PLACE (DEFAULT HUMAN) INHERITED BENEFICIARY HUMAN
Mikrokosmos Acquisition Process: The Lexicon • Multi-lingual • French, English, Japanese, Russian, Spanish, etc. • Multi-media • Multi-process • Analysis • Generation (mono and multilingual) • MT • Summarization • IE • Speech Processing • Tools • corpus-search, lookup dictionary, ontology browser
Mikrokosmos Acquisition Process: The Ontology • Guidelines 1) Do not add instances as concepts • Instances do not have their own instances • Concepts do not have fixed position in space/time 2) Do not decompose concepts further 3) Use close concepts 4) Do not add EVENTs with particular arguments 5) Do not add concepts with instance-specific aspects, temporal relations 6) Do not add language-specific concepts 7) Do not add ontologycal concepts for collections
Mikrokosmos Acquisition Process: Ontology/Lexicon Trade-off • Daily negociations • lexicon acquirers • ontology acquirers • Possibilities • one-to-one mapping • lexicon unspecification • lexicon ontology balance
Mikrokosmos Acquisition Process: Ontology/Lexicon Trade-off • one-to-one mapping PREPARE-FOOD INST: COOKING-EQUIPMENT COOK BAKE INST: STOVE cook : cuire sur le feu • Problems INST: OVEN bake : cuire ou four • Lexical: every word in a language is a concept • conceptual: cuire in french is not ambiguous
Mikrokosmos Acquisition Process: Ontology/Lexicon Trade-off • Lexicon Unspecification PREPARE-FOOD INST: COOKING-EQUIPMENT cook : cuire sur le feu • Problems bake : cuire ou four • BAKE is not in the ontology INST: OVEN
Mikrokosmos Acquisition Process: Ontology/Lexicon Trade-off • Lexicon-Ontology Balance PREPARE-FOOD INST: COOKING-EQUIPMENT BAKE FRY INST: STOVE INST: FRYING-PAN INST: OVEN bake cook : cuire
Mikrokosmos Semantics in Action • El grupo Roche, a través de su compañía en España, adquirió Doctor Andreu. • El grupo Roche adquirió Doctor Andreu a través de su compañía en España. • La adquisición de Doctor Andreu por el grupo Roche fue hecha a través de su compañía en España. ACQUIRE-1 ORGANIZATION-2 ORGANIZATION-3 Agent: ORGANIZATION-1 Theme: ORGANIZATION-2 Instrument: ORGANIZATION-3 Object-Name: Grupo Roche Object-Name: Doctor Andreu Location: España
Mikrokosmos Semantics in Action • Onto-Search: Ontological search mechanism to check constraints • check-onto(ACQUIRE, EVENT) = 1 • since ACQUIRE is a type of EVENT • check-onto(ORGANIZATION, HUMAN) = 0. 9 • since ORGANIZATION HAS-MEMBER HUMAN
Mikrokosmos Semantics in Action 1) a-través-de INSTRUMENT, LOCATION adquirir require PHYSICAL-OBJECT 2) en LOCATION, TEMPORAL España is not a TEMPORAL-OBJECT 3) adquirir ACQUIRE, LEARN Doctor Andreu is not an INFORMATION 4) Doctor Andreu ORGANIZATION, HUMAN the Theme of ACQUIRE is not HUMAN 5) compañía CORPORATION, SOCIAL-EVENT ORGANIZATIONs typically fill the INSTRUMENT slot of ACQUIRE acts
Mikrokosmos Experiment: WSD Text words/sentence open-class words ambiguous words syntax correct % 1 347 16. 5 183 57 21 51 97 2 385 24. 0 167 42 19 41 99 3 370 26. 4 177 57 20 45 93 4 353 20. 8 177 35 12 34 99 Mean 364 21. 4 176 48 18 43 97
Mikrokosmos Experiment: WSD Text words/sentence open-class words ambiguous words syntax correct % Mean 364 21. 4 176 48 18 43 97 Mean Unseen 390 26 104 26 9 23 97
Word. Net 2 German Rigau i Claramunt http: //www. lsi. upc. es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya
Word. Net 2 Outline • Introduction • Text Inferences • Defining Features • Plausible inferences • Inference Rules • Semantic Paths • What Word. Net cannot do
Word. Net 2 Introduction • (Harabagiu 98) • Commonse reasoning requires extensive knowledge • ~ 100 millions of concepts and relations • Word. Net • represents almost all English words • 100. 000 synsets • linked by semantic relations • Word. Net 2 • each synset has a gloss that, when disambiguated may increase the number of relations • Word. Net glosses into semantic networks • NEW RELATIONS
Word. Net 2 Text Inferences German was hungry He opened the refrigerator • hungry (feeling a need or desire to eat) • eat (take in solid food) • refrigerator (an appliance in which foods can be stored at low temperature)
Word. Net 2 Defining Features • Transform each concept’s gloss into a graph where concepts are nodes and lexical relations are links •
Word. Net 2 Defining Features ship OBJECT guide PURPOSE pilot LOCATION person GLOSS water ATTRIBUTE qualified difficult
Word. Net 2 Inference Rules Rule 1 Rule 2 VC 1 IS-A VC 2 IS-A VC 3 ------------VC 1 IS-A VC 3 Rule 3 VC 1 IS-A VC 2 ENTAIL VC 3 ------------VC 1 ENTAIL VC 3 Rule 2 VC 1 IS-A VC 2 R_IS-A VC 3 ------------VC 1 PLAUSIBLE (not VC 3) • 16 + 1 regles VC 1 IS-A VC 2 R_ENTAIL VC 3 ------------VC 1 EXPLAINS VC 3
Word. Net 2 Semantic Paths 0) Create and load the KB 1) Place markers on KB concepts 2) Propagate markers The algorithm avoids cycles 3) Detect collisions To each marker collision it corresponds a path 4) Extract Inferences
Word. Net 2 Semantic Paths Inference sequence • German was hungry • German felt a desire to eat • German felt a desire to take in food COLLISION: German=he felt a desire to take food, stored in an appliance, which he opened • He opened an appliance where food is stored • He opened the refrigerator
Word. Net 2 What Word. Net cannot do Major Word. Net limitations: 1) The lack of compound concepts 2) The small number of causation and entailment relations 3) the lack of preconditions for verbs 4) the absence of case relations
Thought. Treasure German Rigau i Claramunt http: //www. lsi. upc. es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya
Thought. Treasure Overview • a comprehensive platform for • NLP English, French • commonsense reasoning • A hotel room has a bed, night table, . . . • People has fingernails • soda is a drink • one hangs up at the end of a phone call • the sky is blue • dogs bark • someone who is 16 years old is a teenager
Thought. Treasure Overview • 25, 000 concepts organized into a hierarchy EVIAN -> FLAT-WATER -> DRINKING-WATER • 55, 000 words (English, French) food <-> aliment <-> FOOD • 50, 000 asertions about concepts green-pea is green • 100 scripts
Thought. Treasure Overview • Text Agents for recognizing names, phones, etc • mechanisms for learning new words • X-phile is someone who likes X • a syntactic parser • a NL generator • a semantic parser • an anaphoric parser • planning agents for achieving goals • understanding agents
Thought. Treasure Example • Who created Bugs Bunny? • 1. 0 (create human-interrogative-pronoun Bugs-Bunny) • 0. 9 (create rock-group-the-Who Bugs-Bunny) • 1. 0 (create Tex-Avery Bugs-Bunny) • 0. 1 (not (create rock-group-the-Who Bugs-Bunny))
Meaning German Rigau i Claramunt http: //www. lsi. upc. es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya
Meaning Overview n Bases de Conocimiento – Enriquecimiento automático de EWN (modelos verbales, etc. ) – Aproximación mixta (KB + ML) – Q/A n Problema – ambigüedad estructural y léxica n Aproximación – localizar automáticamente ejemplos de sentidos (Leacock et al. 98, Mihalcea y Moldovan 99) – WSD a gran escala (Boosting, SVM, transductivos …) – Acquisición Conocimiento (Ribas 95, Mc. Carthy 01)
Meaning Exploiting EWN Semantic Relations
Meaning Exploiting EWN Semantic Relations partido 1 Todos los partidos piden reformas legales para TV 3. La derecha planea agruparse en un partido. El diputado reiteró que ni él ni UDC, “como partido”, han recibido dinero de Pellerols. partido 2 Pero España puso al partido intensidad, ritmo y coraje. El seleccionador cree que el partido de hoy contra Italia dará la medida de España El Racing no gana en su campo desde hace seis partidos.
Meaning Exploiting EWN Semantic Relations partido 1 No negociaremos nunca com un partido político que sea partidario de la independencia de Taiwan. Una vez más es noticia la desviación de fondos destinadoss a la formación ocupacional hacia la financiación de un partido político. Estas lleyess fueron votadas gracias a un consenso general de los partidos políticos. partido 2 Rivera pide el suporte de la afición para encarrilar las semifinales. Sólo el equipo de Valero Ribera puede sentenciar una semifinal como lo hizo ayer en un Palau Blaugrana completamente entregado. El Racing ganó los cuartos de final en su campo.
Meaning Arquitecture English Web Corpus ACQ WSD English EWN WSD UPLOAD PORT ACQ Spanish EWN Spanish Web Corpus Catalan EWN Catalan Web Corpus WSD Italian EWN Italian Web Corpus ACQ PORT Multilingual Central Repository UPLOAD PORT Basque EWN WSD ACQ Basque Web Corpus