Скачать презентацию From Word Net to Euro Word Net to Скачать презентацию From Word Net to Euro Word Net to

19e0e63c3afe29b4ae3439b9a6f8d702.ppt

  • Количество слайдов: 78

From Word. Net, to Euro. Word. Net, to the Global Wordnet Grid: anchoring languages From Word. Net, to Euro. Word. Net, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Overview • • • Wordnet, Euro. Word. Net Global Wordnet Grid Stevin project Cornetto Overview • • • Wordnet, Euro. Word. Net Global Wordnet Grid Stevin project Cornetto 7 th Frame work project KYOTO Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Word. Net · • • http: //wordnet. princeton. edu/ Lexical semantic database for English Word. Net · • • http: //wordnet. princeton. edu/ Lexical semantic database for English Developed by George Miller and his team at Princeton University, as the implementation of a mental model of the lexicon • Organized around the notion of a synset: a set of synonyms in a language that represent a single concept • Semantic relations between concepts (synsets) and not between words • Currently covers over 117, 000 concepts (synsets) and over 150, 000 English words Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Relational model of meaning animal kitten animal man boy man woman boy girl puppy Relational model of meaning animal kitten animal man boy man woman boy girl puppy cat dog kitten puppy cat dog woman Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Wordnet: a network of semantically related words {conveyance; transport} {vehicle} meronyms {motor vehicle; automotive Wordnet: a network of semantically related words {conveyance; transport} {vehicle} meronyms {motor vehicle; automotive vehicle} hyper(o)nym {car; automobile; machine; motorcar} {car mirror} {armrest} {car door} {doorlock} {bumper} {car window} hyponym {cruiser; squad car; patrol car; police car; prowl car} {hinge; flexible joint} {cab; taxi; hack; taxicab} Hyponymy and meronymy relations are: • transitive • directed Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 5

Wordnet Semantic Relations WN 1. 5 starting point The ‘synset’ as a weak notion Wordnet Semantic Relations WN 1. 5 starting point The ‘synset’ as a weak notion of synonymy: “two expressions are synonymous in a linguistic context C if the substitution of one for the other in C does not alter the truth value. ” (Miller et al. 1993) Relations between synsets: Example HYPONYMY noun-to-noun verb-to-verb MERONYMY noun-to-noun ANTONYMY adjective-to-adjective verb-to-verb ENTAILMENT verb-to-verb CAUSE verb-to-verb car/ vehicle walk/ move head/ nose good/bad open/ close buy/ pay kill/ die Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 6

Wordnet Data Model Relations type-of part-of Concepts rec: 12345 1 - financial institute rec: Wordnet Data Model Relations type-of part-of Concepts rec: 12345 1 - financial institute rec: 54321 2 - side of a river rec: 9876 - small string instrument rec: 65438 - musician playing violin rec: 42654 - musician rec: 35576 1 - string of instrument rec: 29551 2 - underwear rec: 25876 - string instrument Vocabulary of a language bank 1 2 polysemy fiddle violin fiddler violist polysemy & synonymy string polysemy Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 7

Some observations on Wordnet • synsets are more compact representations for concepts than word Some observations on Wordnet • synsets are more compact representations for concepts than word meanings in traditional lexicons • synonyms and hypernyms are substitutional variants: – begin – commence – I once had a canary. The bird got sick. The poor animal died. • hyponymy and meronymy chains are important transitive relations for predicting properties and explaining textual properties: object -> artifact -> vehicle -> 4 -wheeled vehicle -> car • strict separation of part of speech although concepts are closely related (bed – sleep) and are similar (dead – death) • lexicalization patterns reveal important mental structures Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Lexicalization patterns entity object garbage threat artifact building bird 25 unique beginners organism animal Lexicalization patterns entity object garbage threat artifact building bird 25 unique beginners organism animal plant waste tree flower basic level church canary dog crocodile rose concepts • balance of two principles: abbey common • predict most features canary • apply to most subclasses • where most concepts are created • amalgamate most parts • most abstract level to draw a pictures Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Wordnet top level Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 10 Wordnet top level Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 10

Meronymy & pictures beak tail leg Guest lecture, Language Engineering Applications, February, 26 th Meronymy & pictures beak tail leg Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Meronymy & pictures Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven Meronymy & pictures Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Wordnet 3. 0 statistics POS Unique Strings Synsets Total Word-Sense Pairs Noun 117, 798 Wordnet 3. 0 statistics POS Unique Strings Synsets Total Word-Sense Pairs Noun 117, 798 82, 115 146, 312 Verb 11, 529 13, 767 25, 047 Adjective 21, 479 18, 156 30, 002 4, 481 3, 621 5, 580 155, 287 117, 659 206, 941 Adverb Totals Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Wordnet 3. 0 statistics POS Monosemou s Polysemous Words and Senses Words Senses Noun Wordnet 3. 0 statistics POS Monosemou s Polysemous Words and Senses Words Senses Noun 101, 863 15, 935 44, 449 Verb 6, 277 5, 252 18, 770 16, 503 4, 976 14, 399 3, 748 733 1, 832 128, 391 26, 896 79, 450 Adjective Adverb Totals Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

http: //www. visuwords. com Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven http: //www. visuwords. com Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 16

Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 17 Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 17

Usage of Wordnet • Mostly used database in language technology • Enormous impact in Usage of Wordnet • Mostly used database in language technology • Enormous impact in language technology development • Large • Free and downloadable • English Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Usage of Wordnet • Improve recall of textual based analysis: – Query -> Index Usage of Wordnet • Improve recall of textual based analysis: – Query -> Index • • • Synonyms: commence – begin Hypernyms: taxi -> car Hyponyms: car -> taxi Meronyms: trunk -> elephant Lexical entailments: gun -> shoot • Inferencing: – what things can burn? • Expression in language generation and translation: – alternative words and paraphrases Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Improve recall • Information retrieval: – effective on small databases without redundancy, e. g. Improve recall • Information retrieval: – effective on small databases without redundancy, e. g. image captions, video text • Text classification: – expand small training sets – reduce training effort • Question & Answer systems – question classification: who, where, what, when – match answers to question types Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Improve recall • Anaphora resolution: – The girl fell off the table. She. . Improve recall • Anaphora resolution: – The girl fell off the table. She. . – The glass fell of the table. It. . . • Coreference resolution: – When he moved the furniture, the antique table got damaged. • Information extraction (unstructed text to structured databases): – generic forms or patterns "vehicle" - > text with specific cases "car" Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Improve recall • Summarizers: – Sentence selection based on word counts -> concept counts Improve recall • Summarizers: – Sentence selection based on word counts -> concept counts – Avoid repetition in summary -> language generation, pick out another synonym or hypernym • Limited inferencing: detect locations, people, organisations, etc. Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Enabling technologies • Semantic similarity: what sentences or expressions are semantically similar? • Semantic Enabling technologies • Semantic similarity: what sentences or expressions are semantically similar? • Semantic relatedness and textual entailment: smoke entails fire, fire entails damage • Word-Senses-Disambiguation • Erwin Marsi, University of Tilbug, http: //daeso. uvt. nl/demos/index. html Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 24 Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 24

Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 25 Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 25

Recall & Precision “jail” “nerve cell” “police cell” “neuron” found query: “cell” “cell phone” Recall & Precision “jail” “nerve cell” “police cell” “neuron” found query: “cell” “cell phone” intersection “mobile phones” relevant recall = Recall < 20% fordoorsnede / relevantengines! basic search precision = doorsnede / gevonden (Blair & Maron 1985)

Many others • Data sparseness for machine learning: hapaxes can be replaced by semantic Many others • Data sparseness for machine learning: hapaxes can be replaced by semantic classes that match classes from the training set • Use redundancy for more robustness: spelling correction and speech recognition can built semantic expectations using Wordnet and make better choices • Sentiment and opinion mining • Natural language learning Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Euro. Word. Net • The development of a multilingual database with wordnets for several Euro. Word. Net • The development of a multilingual database with wordnets for several European languages • Funded by the European Commission, DG XIII, Luxembourg as projects LE 2 -4003 and LE 4 -8328 • March 1996 - September 1999 • 2. 5 Million EURO. • http: //www. hum. uva. nl/~ewn • http: //www. illc. uva. nl/Euro. Word. Net/finalresultsewn. html Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Euro. Word. Net • Languages covered: – Euro. Word. Net-1 (LE 2 -4003): English, Euro. Word. Net • Languages covered: – Euro. Word. Net-1 (LE 2 -4003): English, Dutch, Spanish, Italian – Euro. Word. Net-2 (LE 4 -8328): German, French, Czech, Estonian. • Size of vocabulary: – Euro. Word. Net-1: 30, 000 concepts - 50, 000 word meanings. – Euro. Word. Net-2: 15, 000 concepts- 25, 000 word meaning. • Type of vocabulary: – the most frequent words of the languages – all concepts needed to relate more specific concepts Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Euro. Word. Net Model Domains Traffic move go 2 Order. Entity Air III ride Euro. Word. Net Model Domains Traffic move go 2 Order. Entity Air III ride Ontology Location Dynamic Road` III rijden drive I III bewegen gaan I II II Lexical Items Table III cabalgar jinetear conducir III mover transitar Lexical ILI-record {drive} Lexical Items Table II berijden Items Table III II Inter-Lingual-Index cavalcare guidare I = Language Independent link II = Link from Language Specific to Inter lingual Index III = Language Dependent Link Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven III andare muoversi 30

Differences in relations between Euro. Word. Net and Word. Net • Added Features to Differences in relations between Euro. Word. Net and Word. Net • Added Features to relations • Cross-Part-Of-Speech relations • New relations to differentiate shallow hierarchies • New interpretations of relations Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 31

EWN Relationship Labels {airplane} HAS_MERO_PART: conj 1 HAS_MERO_PART: conj 2 disj 2 {door} {jet EWN Relationship Labels {airplane} HAS_MERO_PART: conj 1 HAS_MERO_PART: conj 2 disj 2 {door} {jet engine} {propeller} {door} HAS_HOLO_PART: disj 1 HAS_HOLO_PART: disj 2 HAS_HOLO_PART: disj 3 {car} {room} {entrance} Default Interpretation: non-exclusive disjunction Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 32

Overview of the Language Internal relations in Euro. Wordnet Same Part of Speech relations: Overview of the Language Internal relations in Euro. Wordnet Same Part of Speech relations: HYPERONYMY/HYPONYMY ANTONYMY HOLONYMY/MERONYMY NEAR_SYNONYMY car - vehicle open - close head – nose apparatus - machine Cross-Part-of-Speech relations: XPOS_NEAR_SYNONYMY XPOS_HYPERONYMY/HYPONYMY XPOS_ANTONYMY CAUSE SUBEVENT ROLE/INVOLVED STATE MANNER BELONG_TO_CLASS dead - death; to adorn - adornment to love - emotion to live - dead die - death buy - pay; sleep - snore write - pencil; hammer - hammer the poor - poor to slurp - noisily Rome - city Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 33

Co_Role relations criminal novel writer/ poet dough photograpic camera guitar player to play music Co_Role relations criminal novel writer/ poet dough photograpic camera guitar player to play music guitar CO_AGENT_PATIENT CO_AGENT_RESULT CO_PATIENT_RESULT CO_INSTRUMENT_RESULT HAS_HYPERONYM CO_AGENT_INSTRUMENT HAS_HYPERONYM ROLE_AGENT CO_AGENT_INSTRUMENT HAS_HYPERONYM ROLE_INSTRUMENT HAS_HYPERONYM CO_INSTRUMENT_AGENT victim novel/ poem pastry/ bread photo player guitar person to play musical instrument to make musical instrument guitar player Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 34

Horizontal & vertical semantic relations chronical patient ; mental patient ρ-PATIENT HYPONYM patient cure Horizontal & vertical semantic relations chronical patient ; mental patient ρ-PATIENT HYPONYM patient cure ρ-CAUSE ρ-PATIENT STATE docter treat ρ-AGENT HYPONYM child docter disease; disorder HYPONYM stomach disease, kidney disorder, ρ-PROCEDURE physiotherapy medicine etc. ρ-LOCATION co-ρAGENT-PATIENT hospital, etc. Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven child 35

The Multilingual Design • Inter-Lingual-Index: unstructured fund of concepts to provide an efficient mapping The Multilingual Design • Inter-Lingual-Index: unstructured fund of concepts to provide an efficient mapping across the languages; • Index-records are mainly based on Word. Net synsets and consist of synonyms, glosses and source references; • Various types of complex equivalence relations are distinguished; • Equivalence relations from synsets to index records: not on a word-to-word basis; • Indirect matching of synsets linked to the same index items; Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Equivalent Near Synonym 1. Multiple Targets (1: many) Dutch wordnet: schoonmaken (to clean) matches Equivalent Near Synonym 1. Multiple Targets (1: many) Dutch wordnet: schoonmaken (to clean) matches with 4 senses of clean in Word. Net 1. 5: • make clean by removing dirt, filth, or unwanted substances from • remove unwanted substances from, such as feathers or pits, as of chickens or fruit • remove in making clean; "Clean the spots off the rug" • remove unwanted substances from - (as in chemistry) 2. Multiple Sources (many: 1) Dutch wordnet: versiersel near_synonym versiering ILI-Record: decoration. 3. Multiple Targets and Sources (many: many) Dutch wordnet: toestel near_synonym apparaat ILI-records: machine; device; apparatus; tool 37 Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Equivalent Hyperonymy Typically used for gaps in English Word. Net: • genuine, cultural gaps Equivalent Hyperonymy Typically used for gaps in English Word. Net: • genuine, cultural gaps for things not known in English culture: – Dutch: klunen, to walk on skates over land from one frozen water to the other • pragmatic, in the sense that the concept is known but is not expressed by a single lexicalized form in English: – Dutch: kunststof = artifact substance <=> artifact object Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Euro. Word. Net statistics Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven Euro. Word. Net statistics Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 39

Wordnets as semantic structures • Wordnets are unique language-specific structures: – same organizational principles: Wordnets as semantic structures • Wordnets are unique language-specific structures: – same organizational principles: synset structure and same set of semantic relations. – different lexicalizations – differences in synonymy and homonymy: • "decoration" in English versus "versiersel/versiering" in Dutch • "bank" in English (money/river) versus "bank" in Dutch (money/furniture) • BUT also different relations for similar synsets Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Autonomous & Language-Specific Wordnet 1. 5 Dutch Wordnet voorwerp {object} object natural object (an Autonomous & Language-Specific Wordnet 1. 5 Dutch Wordnet voorwerp {object} object natural object (an object occurring naturally) blok {block} instrumentality body artifact, artefact (a man-made object) block implement container instrument spoon lichaam {body} device tool box werktuig{tool} bag bak {box} lepel {spoon} Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven tas {bag} 41

Linguistic versus Artificial Ontologies Artificial ontology: • better control or performance, or a more Linguistic versus Artificial Ontologies Artificial ontology: • better control or performance, or a more compact and coherent structure. • introduce artificial levels for concepts which are not lexicalized in a language (e. g. instrumentality, hand tool), • neglect levels which are lexicalized but not relevant for the purpose of the ontology (e. g. tableware, silverware, merchandise). What properties can we infer for spoons? spoon -> container; artifact; hand tool; object; made of metal or plastic; for eating, pouring or cooking Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 42

Linguistic versus Artificial Ontologies Linguistic ontology: • Exactly reflects the relations between all the Linguistic versus Artificial Ontologies Linguistic ontology: • Exactly reflects the relations between all the lexicalized words and expressions in a language. • Captures valuable information about the lexical capacity of languages: what is the available fund of words and expressions in a language. What words can be used to name spoons? spoon -> object, tableware, silverware, merchandise, cutlery, Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Wordnets versus ontologies • Wordnets: • autonomous language-specific lexicalization patterns in a relational network. Wordnets versus ontologies • Wordnets: • autonomous language-specific lexicalization patterns in a relational network. • Usage: to predict substitution in text for information retrieval, • text generation, machine translation, wordsense-disambiguation. • Ontologies: • data structure with formally defined concepts. • Usage: making semantic inferences. Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

From Euro. Word. Net to Global Word. Net • Euro. Word. Net ended in From Euro. Word. Net to Global Word. Net • Euro. Word. Net ended in 1999 • Global Wordnet Association was founded in 2000 to maintain the framework: http: //www. globalwordnet. org • Currently, wordnets exist for more than 50 languages, including: – Arabic, Bantu, Basque, Chinese, Bulgarian, Estonian, Hebrew, Icelandic, Japanese, Kannada, Korean, Latvian, Nepali, Persian, Romanian, Sanskrit, Tamil, Thai, Turkish, Zulu. . . • Many languages are genetically and typologically unrelated Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Some downsides of the Euro. Word. Net model • Construction is not done uniformly Some downsides of the Euro. Word. Net model • Construction is not done uniformly • Coverage differs • Not all wordnets can communicate with one another, i. e. linked to different versions of English wordnet • Proprietary rights restrict free access and usage • A lot of semantics is duplicated • Complex and obscure equivalence relations due to linguistic differences between English and other languages Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Next step: Global Word. Net Grid Fahrzeug 1 vehicle car Inter-Lingual Ontology voertuig 1 Next step: Global Word. Net Grid Fahrzeug 1 vehicle car Inter-Lingual Ontology voertuig 1 auto trein 1 Object train 2 English Words Transport. Device Dutch Words véhicule veicolo 1 voiture auto treno 2 Italian Words dopravní prostředník 1 auto vlak 2 Guest lecture, Language Engineering Applications, Czech Words February, 26 th 2009, Leuven liiklusvahend auto killavoor auto tren Spanish Words 1 3 vehículo 1 2 2 German Words 2 Device 3 Auto Zug 2 Estonian Words 1 train 2 French Words 48

GWNG: Main Features • Construct separate wordnets for each Grid language • Contributors from GWNG: Main Features • Construct separate wordnets for each Grid language • Contributors from each language encode the same core set of concepts plus culture/language-specific ones • Synsets (concepts) are mapped crosslinguistically via an ontology instead of just the English Wordnet Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

The Ontology: Main Features • List of concepts is not just based on the The Ontology: Main Features • List of concepts is not just based on the lexicon of a particular language (unlike in Euro. Word. Net) but uses ontological observations • Ontology contains only upper and mid-level concepts • Concepts are related in a type hierarchy • Concepts are defined with axioms Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

The Ontology: Main Features • Minimal set of concepts (Reductionist view): – to express The Ontology: Main Features • Minimal set of concepts (Reductionist view): – to express equivalence across languages – to support inferencing • Ontology need not and cannot provide a concept for all concepts found in the Grid languages – Lexicalization in a language is not sufficient to warrant inclusion in the ontology – Lexicalization in all or many languages may be sufficient • Ontological observations will be used to define the concepts in the ontology • Ontological framework still must be powerful enough to encode all concepts that are lexically expressed in any of the Grid languages • Additional lexicalized concepts are related to the ontology through complex relations Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Ontological observations • Identity criteria as used in Onto. Clean (Guarino & Welty 2002), Ontological observations • Identity criteria as used in Onto. Clean (Guarino & Welty 2002), : – rigidity: to what extent are properties true for entities in all worlds? You are always a human, but you can be a student for a short while. – essence: what properties are essential for an entity? Shape is essential for a statue but not for the clay it is made of. – unicity: what represents a whole and what entities are parts of these wholes? An ocean is a whole but the water it contains is not. Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Type-role distinction Current Word. Net treatment, hyponyms of dog: • lapdog: 1 # toy Type-role distinction Current Word. Net treatment, hyponyms of dog: • lapdog: 1 # toy dog: 1, toy: 4 # hunting dog: 1 # working dog: 1, etc. • dalmatian: 2, coach dog: 1, carriage dog: 1 # Leonberg: 1 # Newfoundland: 1 # poodle: 1, poodle dog: 1, etc. (1) a husky is a kind of dog(type) (2) a husky is a kind of working dog (role) • What’s wrong? (2) is defeasible, (1) is not: *This husky is not a dog This husky is not a working dog Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Ontology and lexicon • Hierarchy of disjunct types: Canine Poodle. Dog; Newfoundland. Dog; German. Ontology and lexicon • Hierarchy of disjunct types: Canine Poodle. Dog; Newfoundland. Dog; German. Shepherd. Dog; Husky • Lexicon: – NAMES for TYPES: {poodle}EN, {poedel}NL, {pudoru}JP ((instance x Poodle) – LABELS for ROLES: {watchdog}EN, {waakhond}NL, {banken}JP ((instance x Canine) and (role x Guarding. Process)) Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Ontology and lexicon • Hierarchy of disjunct types: River; Clay; etc… • Lexicon: – Ontology and lexicon • Hierarchy of disjunct types: River; Clay; etc… • Lexicon: – NAMES for TYPES: {river}EN, {rivier, stroom}NL ((instance x River) – LABELS for dependent concepts: {rivierwater}NL (water from a river => water is not a unit) {kleibrok}NL (irregularly shared piece of clay=>non-essential) ((instance x water) and (instance y River) and (portion x y) ((instance x Object) and (instance y Clay) and (portion x y) and (shape X Irregular)) Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

KIF expression for gender marking • {teacher}EN ((instance x Human) and (agent x Teaching. KIF expression for gender marking • {teacher}EN ((instance x Human) and (agent x Teaching. Process)) • {Lehrer}DE ((instance x Man) and (agent x Teaching. Process)) • {Lehrerin}DE ((instance x Woman) and (agent x Teaching. Process)) Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

KIF expression for perspective sell: subj(x), direct obj(z), indirect obj(y) versus buy: subj(y), direct KIF expression for perspective sell: subj(x), direct obj(z), indirect obj(y) versus buy: subj(y), direct obj(z), indirect obj(x) (and (instance x Human)(instance y Human) (instance z Entity) (instance e Financial. Transaction) (source x e) (destination y e) (patient e) The same process but a different perspective by subject and object realization: marry in Russian two verbs, apprendre in French can mean teach and learn Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Advantages of the Global Wordnet Grid • Shared and uniform world knowledge: – universal Advantages of the Global Wordnet Grid • Shared and uniform world knowledge: – universal inferencing – uniform text analysis and interpretation • More compact and less redundant databases • More clear notion how languages map to the knowledge – better criteria for expressing knowledge – better criteria for understanding variation Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

CORNETTO (STEVIN TENDER) Combinatorial and Relational Network as Toolkit for Dutch Language Technology http: CORNETTO (STEVIN TENDER) Combinatorial and Relational Network as Toolkit for Dutch Language Technology http: //www 2. let. vu. nl/oz/cornetto Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 67

Goals of the Cornetto project • Goal: to develop a lexical semantic database for Goals of the Cornetto project • Goal: to develop a lexical semantic database for Dutch: – – 40 K Entries: generic and central part of the language Rich horizontal and vertical semantic relations Combinatoric information Ontological information • Method: merge data from Dutch Wordnet (DWN) and Referentie bestand Nederlands (RBN) • April 2006 -March 2008, extended to July 2008 • The data of the final results of the Cornetto project available through the TST-centrale of the Nederlandse Taalunie (free for research). Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Database • Collections: ▪ Lexical Units (LU): mainly derived from the RBN ▪ Synsets Database • Collections: ▪ Lexical Units (LU): mainly derived from the RBN ▪ Synsets (SY): mainly derived from DWN ▪ Terms (TE) and axioms: mainly derived on SUMO and MILO ▪ Domains (DM): based on Wordnet domains • Mappings: ▪ LU<-> SY ▪ SY <-> SY (within Dutch and from Dutch to English) ▪ SY <-> TE ▪ SY <-> DM Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Data Organization Lexical Unit (LU) Correspond to wordmeaning pair Internal relations Synonyms form morphology Data Organization Lexical Unit (LU) Correspond to wordmeaning pair Internal relations Synonyms form morphology syntax semantics pragmatics usage examples Synset Model meaning relations Princeton Wordnet Domains Czech Wordnet Spanish Wordnet German Wordnet Korean Wordnet Arabic French Wordnet Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven Collection of Terms and Axioms SUMO MILO

Database • Implemented in Deb. Vis. Dic: – http: //deb. fi. muni. cz/index. php Database • Implemented in Deb. Vis. Dic: – http: //deb. fi. muni. cz/index. php • Demo version available: http: //www 2. let. vu. nl/oz/cornetto/demo. html Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 74 Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 74

Overview of results ALL NOUNS VERBS ADJ ADV OTHERS 70, 371 52, 847 9, Overview of results ALL NOUNS VERBS ADJ ADV OTHERS 70, 371 52, 847 9, 017 7, 689 220 598 119, 108 85, 449 17, 314 15, 712 475 158 92, 686 70, 315 9, 051 12, 288 1, 032 n. a. Synonyms in synsets 103, 762 75, 476 14, 138 12, 914 408 826 CID records 104, 556 76, 537 14, 214 13, 132 483 190 Synonym per synset 1. 47 1. 43 1. 57 1. 68 1. 85 1. 38 Senses per lemma 1. 29 1. 22 1. 91 1. 28 0. 46 n. a. Synsets Lexical Units Lemmas (form+pos) Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

DWN and RBN matches Mapping relations 35, 289 37. 74% LUs only in DWN DWN and RBN matches Mapping relations 35, 289 37. 74% LUs only in DWN 54, 983 58. 81% LUs only in RBN Total 3, 223 3. 45% 93, 495 No status value 55976 53. 54% Status value 48580 46. 46% 10108 9. 67% B-95 4944 4. 73% BM-90 4215 4. 03% D-55 adjectives 171 0. 16% D-58 verbs 774 0. 74% D-75 nouns 2085 1. 99% 25236 24. 14% 1047 1. 00% manual M-97 RESUME-75 TOTAL 104556 Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Overview of synset data Synsets 70371 Synonyms 103762 Internal. Relations 153370 Equivalence. Relations 86830 Overview of synset data Synsets 70371 Synonyms 103762 Internal. Relations 153370 Equivalence. Relations 86830 Definitions 35620 Word. Net Domains mappings 93822 Sumo mappings 70654 Base Level Concepts 8828 Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

English Wordnet to SUMO mapping through two-place relations • = the synset is equivalent English Wordnet to SUMO mapping through two-place relations • = the synset is equivalent to the SUMO concept, circle (= Circle) • + the synset is subsumed by the SUMO concept, branch (+ Plant. Branch) • @ the synset is an instance of the SUMO concept, Amsterdam (@ City) Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Cornetto SUMO Mappings through triplets • Equality: – • cirkel: (=, 0, Circle) or Cornetto SUMO Mappings through triplets • Equality: – • cirkel: (=, 0, Circle) or (=, , Circle) Subsumption: – • tak: (+, 0, Plant. Branch) or (+, , Plant. Branch) Related: – • blad: (part, 0, Plant. Branch) or (part, , Plant. Branch) Axiomatized: – theewater: (instance, 0, Water) (instance, 1, Making) (instance, 2, Tea) (resource, 0, 1) (result, 2, 1) OR (instance, , Water) (instance, 1, Making) (instance, 2, Tea) (resource, , 1) (result, 2, 1) Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Ontology mapping: female/male variants teacher (a person whose occupation is teaching) SUMO: equivalent to Ontology mapping: female/male variants teacher (a person whose occupation is teaching) SUMO: equivalent to Teacher In Dutch: no neutral form leraar (male teacher) (+, , Teacher), (+, , Man) lerares (female teacher) (+, , Teacher), (+, , Woman) Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP 7: Intelligent Content and Semantics http: KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP 7: Intelligent Content and Semantics http: //www. kyoto-project. eu/ Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 81

KYOTO (ICT-211423) Overview • Title: Yielding Ontologies for Transition-Based Organization • Funded: – 7 KYOTO (ICT-211423) Overview • Title: Yielding Ontologies for Transition-Based Organization • Funded: – 7 th Framework Program-ICT of the European Union: Intelligent Content and Semantics – Taiwan and Japan funded by national grants • Goal: – Platform for knowledge sharing across languages and cultures – Enables knowledge transition and information search across different target groups, transgressing linguistic, cultural and geographic boundaries. – Open text mining and deep semantic search – Wiki environment that allows people in the field to maintain their knowledge and agree on meaning without knowledge engineering skills • URL: http: //www. kyoto-project. eu/ • Duration: – March 2008 – March 2011 • Effort: – 364 person months of work. Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

KYOTO cycle (garden pont, haven, wild life) (garden pont, has_food, frog) (garden pont, has_food, KYOTO cycle (garden pont, haven, wild life) (garden pont, has_food, frog) (garden pont, has_food, newt) (garden pont, has_food, aquatic insect) (garden pont, is_shelter, frog) (garden pont, is_shelter, newt) (garden pont, is_shelter, aquatic insect) Garden ponds are havens for wildlife. They provide food and shelter for frogs, newts and aquatic insects, including damselflies and dragonflies, frog endemic frogs common frog poison frog Golden poison frog gopher frog Dusky gopher frog forest frog Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 83

Environmental organizations Distributed, diverse & dynamic data 1 Citizens 4 Governments maintain terms & Environmental organizations Distributed, diverse & dynamic data 1 Citizens 4 Governments maintain terms & concepts Companies Wikyoto Capture text: "Sudden increase of CO 2 emissions in 2008 in Europe" Wordnets Ontology 2 Top Abstract Physical Tybot: term yielding robot 3 Process Substance CO 2 emission Middle H 20 CO 2 Greenhouse Pollution Emission Gas Domain Kybot: knowledge yielding robot Index facts: 5 Process: Increase Involves: CO 2 emission Text & Fact Index When: 2008 Where: Europe Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 6 Semantic Search 84

Kyoto main application • Wikyoto (Wiki platform) – Connects people with shared interest as Kyoto main application • Wikyoto (Wiki platform) – Connects people with shared interest as a community – Upload documents and sources – View and edit terms and concepts learned from these documents – Combines concepts with other taxonomies – Discuss and agree with others in the community, different languages, regions and cultures Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Kyoto main application • Tybots – Learns terms and concepts from document collection – Kyoto main application • Tybots – Learns terms and concepts from document collection – Organizes terms as a hierarchy – Connects terms to other hierarchies – Defines: • definitions • relations to other terms • properties and criteria for terms Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Kyoto main application • Kybot: – Detects facts of interest in text and combines Kyoto main application • Kybot: – Detects facts of interest in text and combines these in a comprehensive overview – Uses knowledge represented for terms to detect facts in any document, regardless of language – Allows you to specify any collection of types of knowledge of your interest Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Kyoto databases • Database of users that forms the community • Database of sources Kyoto databases • Database of users that forms the community • Database of sources and documents provided by the users • Database of terms, presented as a domain wordnet in each language • Database of concepts (so-called ontology) that connects the terms of the different languages • Databases of facts derived from various document and source collections provided by the user Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven

Thank you for your attention Guest lecture, Language Engineering Applications, February, 26 th 2009, Thank you for your attention Guest lecture, Language Engineering Applications, February, 26 th 2009, Leuven 89