Скачать презентацию The Italian CLIPS Lexicon and its reuse in Скачать презентацию The Italian CLIPS Lexicon and its reuse in

594d9bfe3491862a7a80d6e28b9aae18.ppt

  • Количество слайдов: 68

The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Outline Part II Ø The origin of the CLIPS lexicon Ø The PAROLE-SIMPLE model Outline Part II Ø The origin of the CLIPS lexicon Ø The PAROLE-SIMPLE model Ø Creating a bilingual resource Ø The two scenarios Ø Scenario I Ø General encoding criteria Ø Ø Phonological and morphological levels Syntactic level: information content The semantic lexicon Theoretical background: GL theory Ø The original Qualia Structure Ø Ø Ø Ø The SIMPLE ontology The Extended Qualia Structure Semantic level: information content Predicative structure Syntax-semantics mapping Encoding methodology CLIPS essential features & applications Ø Drawbacks Ø Scenario II Ø The cognate approach Ø The sense indicator approach Ø Results Ø Concluding remarks Nilda Ruimy september 2004

CLIPS: a bit of genealogy s u orp C LE units O AR ical CLIPS: a bit of genealogy s u orp C LE units O AR ical P ex l DM Semantic Information for Multifunctional Plurilingual Lexica SIMPLE lexicons PAROLE lexicons I pho SIMPLE European project 12 harmonized lexicons nolo gy Italy: enlargment of these core lexicons in a national follow-up project 12 harmonized lexicons PAROLE European project morphology: 20, 000 entries syntax: 20, 000 lemmas semantics: 10, 000 senses Nilda Ruimy CLIPS lexicon XML format phonology: 374, 000 entries morphology: 49, 000 entries syntax: 55, 000 lemmas semantics: 55, 000 senses september 2004

The PAROLE-SIMPLE Model PAROLE-SIMPLE GENELEX-PAROLE Theoretical model Representational Model • EAGLES recommendations • Extended The PAROLE-SIMPLE Model PAROLE-SIMPLE GENELEX-PAROLE Theoretical model Representational Model • EAGLES recommendations • Extended GENELEX model • Results from EU projects: • EUROWORDNET • ACQUILEX • DELIS • GENERATIVE LEXICON Nilda Ruimy september 2004

The Linguistic Model ØInnovative ØTackles misrepresented areas of knowledge ØExtendible and multifunctional ØMultilingual perspective The Linguistic Model ØInnovative ØTackles misrepresented areas of knowledge ØExtendible and multifunctional ØMultilingual perspective PAROLE-SIMPLE lexicons § common EAGLES-conformant model § common representation language § common building methodology Nilda Ruimy september 2004 R E U S A B I L I T Y

Representational Model (1) Entity/Relationship Model: Ø implemented through a DTD that defines: q the Representational Model (1) Entity/Relationship Model: Ø implemented through a DTD that defines: q the structure of every descriptive element q the relationships holding among the various descriptive elements as well as their co-occurence restrictions Ø non ridondant data representation Nilda Ruimy september 2004

Representational Model (2) Øspecific representational structures for the every level of linguistic description; Ølink Representational Model (2) Øspecific representational structures for the every level of linguistic description; Ølink among the different levels although the information encoded at each level is perfectly autonomous Nilda Ruimy september 2004

General encoding criteria Ø Reduce the lexicographer’s margin of subjectivity by setting precise guidelines General encoding criteria Ø Reduce the lexicographer’s margin of subjectivity by setting precise guidelines for the treatment of particular phenomena Ø Base as much as possible the encoding on corpus data Ø Find a balance between the encoding of attested structures / senses only and an exhaustive encoding including rare structures / senses as well Nilda Ruimy september 2004

Splitting entries Ø Avoid both redundancy and over-powerful gatherings Ø Use criteria strictly relevant Splitting entries Ø Avoid both redundancy and over-powerful gatherings Ø Use criteria strictly relevant to the description level, e. g. at the syntactic level, syntactic-driven criteria: ü arity ü syntactic function: disporre i libri negli scaffali / disporre di due auto ü complement optionality: attraversare (la strada) (lit. sense) / attraversare un momento difficile ü different (non alternative) realization of complements: Leo evita Lia / L. ha evitato di guardare L. , che L. si ferisse Ø Encode, at the semantic level, most common senses distinguished in average size dictionaries (ca. 150, 000 words) Nilda Ruimy september 2004

The four-level architecture The first three levels stress position vowel openness cons. prononciation Phonological The four-level architecture The first three levels stress position vowel openness cons. prononciation Phonological Unit Corresp. Phn. U-Mrph. U Po. S & subcat. inflectional paradigm position synt. restr. a. head properties b. subcat. frame Morphological Unit syntactic structure 1 Frameset syntactic structure 2 Nilda Ruimy Corresp. Mrph. U-Syn. U Syntactic Unit september 2004

Syntactic entry information content Aumentare: Il governo ha aumentato i prezzi del 3%. I Syntactic entry information content Aumentare: Il governo ha aumentato i prezzi del 3%. I prezzi sono aumentati del 3% ‘to increase: The government has increased the prices by 3%. Prices have increased by 3%’ q Specific properties of the entry in the syntactic context described main verb aux. : avere q Subcategorization frame syntactic frame: MAIN P 0 optional subject NP complex synt. RELATED syntactic frame: entry P 0 optional subject NP P 1 oblig. object NP P 2 optional adverbial di_PP P 1 optional adverbial di_PP decausativization q Link between syntactic structures FRAMESET relating systematic frame alternations: relates main syntactic frame to alternating one locative alternation reciprocal altern. symmetrical altern. relates respective frame positions Nilda Ruimy september 2004

The semantic lexicon Theoretical linguistic background: Extended version of Pustejovsky’s Generative Lexicon (GL) theory The semantic lexicon Theoretical linguistic background: Extended version of Pustejovsky’s Generative Lexicon (GL) theory Nilda Ruimy september 2004

Generative Lexicon theory Ø lexical meanings of various levels of complexity bambino o dottore Generative Lexicon theory Ø lexical meanings of various levels of complexity bambino o dottore o giornale o HUMAN, age (childhood), sex (male) HUMAN, age (adult), sex (male), function 1. printed paper, 2. location polysemy 3. istitution 4. human group Ø simplest ones : definable by a taxonomic relation Ø more complex ones: hypernymic relation not sufficient Ø Qualia Structure allows : üto coherently model the pluridimensionality of meaning üto capture the relationships holding btw. semantic units üto represent uniformly semantic units of different degree of complexity Nilda Ruimy september 2004

The Original Qualia structure Consists of four roles: Ø formal role: distinguishes the denoted The Original Qualia structure Consists of four roles: Ø formal role: distinguishes the denoted entity from others Ø constitutive role: expresses its components Ø agentive role: expresses its coming about Ø telic role: specifies its funtion formal = what is X? constitutive = what is X made of? agentive = how does X come about? telic = what is X’s function? Qualia Nilda Ruimy september 2004

The SIMPLE ontology (1) Lexicon structured on the basis of a type ontology: Ø The SIMPLE ontology (1) Lexicon structured on the basis of a type ontology: Ø Core Ontology: ü top level, general types; ü large consensus; ü provide essential information; ü mappable on Euro. Word. Net ontology Ø Recommended Ontology: ü hierarchically lower and more specific types; ü provide finer-grained information Possible creation of language / application specific types Nilda Ruimy september 2004

The SIMPLE ontology (2) 157 language independent semantic types Ø simple types (one-dimensional) : The SIMPLE ontology (2) 157 language independent semantic types Ø simple types (one-dimensional) : can be fully characterized in terms of a hypernymic relation, e. g. Entity Concrete_entity Living_entity Animal Earth_Animal Nilda Ruimy september 2004

The SIMPLE ontology (3) Ø unified types (multi-dimensional) : can only be defined through The SIMPLE ontology (3) Ø unified types (multi-dimensional) : can only be defined through the combination of: ü the relation to their supertype ü the reference to orthogonal dimensions of meaning Agentive Entity Telic Abstract_Entity Institution Nilda Ruimy september 2004

The SIMPLE ontology (4) Simple Ontology: multidimensional type hierarchy based on both hierarchical and The SIMPLE ontology (4) Simple Ontology: multidimensional type hierarchy based on both hierarchical and non-hierarchical conceptual relations Nilda Ruimy september 2004

Semantic types Ø In the SIMPLE ontology, types are not mere labels but the Semantic types Ø In the SIMPLE ontology, types are not mere labels but the repository of a specific set of structured semantic information Nilda Ruimy september 2004

some semantic types for abstract & concrete entities TELIC AGENTIVE TOP CONSTITUTIVE . . some semantic types for abstract & concrete entities TELIC AGENTIVE TOP CONSTITUTIVE . . . Representation . . . Concrete_entity ENTITY • Living_entity Event. . . Property Abstract_entity • Sign • Human • Quality • Language • Animal • Quantity • Information • Vegetal_entity • . . . • Artifact • Convention • Cognitive_fact • Physical_prop • . . . • Psychol_prop Artifact • . . . • Susbstance • Furniture • Instrument • Location • Food • Clothing • Artwork • Material Nilda Ruimy Artifactual_material september 2004

some semantic types for events EVENT Phenomenon Aspectual. . State Act . . . some semantic types for events EVENT Phenomenon Aspectual. . State Act . . . Relational_state. . . Non_relational_act Relational_act Cause_change Psych_event Change. . . Relational_change Move Change_possession Cause_act Speech_act. . . Nilda Ruimy Creation. . . Acquire_knowledge Natural_transition Change_location september 2004

some semantic types for adjectives TOP Intensional Temporal Extensional Psychological_prop Modal Emotive Relational_prop Social_prop some semantic types for adjectives TOP Intensional Temporal Extensional Psychological_prop Modal Emotive Relational_prop Social_prop Emphasizer Manner Object_related Physical_prop Nilda Ruimy Intensifying_prop Temporal_prop september 2004

Descriptive elements Ø Features: Plus. Human, Plus. Collective, . . Ø Relations between semantic Descriptive elements Ø Features: Plus. Human, Plus. Collective, . . Ø Relations between semantic units: R (, ) Nilda Ruimy september 2004

E Formal x t isa antonym_comp e antonym_grad mult_opposition n d. Q E e E Formal x t isa antonym_comp e antonym_grad mult_opposition n d. Q E e u x d a t e l n i S d a t e r d u c r t o u l r e e s Constitutive made_of is_a_follower_of has_as_member is_a_member_of has_as_part instrument kinship is_a_part_of resulting_state relates uses C O N S T I T U T I V E Agentive result_of agentive_prog agentive_cause agentive_experience caused_by source created_by derived_from causes concerns affects constitutive_activity P contains R has_as_colour has_as_effect O has_as_property P measured_by E measures R produces produced_by T property_of Y quantifies related_to successor_of precedes typical_of contains feeling is_in lives_in LOCATION Nilda Ruimy typical_location Telic A G E N T I V E ARTIFACTUAL AGENTIVE used_for used_as used_by used_against INSTRUMENTAL indirect_telic purpose is_the_activity_of is_the_ability_of is_the_habit_of object_of_activity september 2004 TELIC ACTIVITY DIRECT TELIC

Formal isa antonym_comp antonym_grad mult_opposition disgusto, provare (disgust, feel) casa, costruire (house, build) mohair, Formal isa antonym_comp antonym_grad mult_opposition disgusto, provare (disgust, feel) casa, costruire (house, build) mohair, capra (mohair, goat) proiettile, colpire (projectile, hit) metano, combustibile (methane, fuel) bisturi, chirurgo (lancet, surgeon) medico, curare antitarmico, tarma (doctor, cure) (moth balls, moth) fumatore, fumare (smoker, smoke) Constitutive made_of is_a_follower_of has_as_member is_a_member_of has_as_part instrument kinship is_a_part_of resulting_state relates uses C O N S T I T U T I V E Agentive result_of agentive_prog agentive_cause agentive_experience caused_by source created_by derived_from Telic A G E N T I V E ARTIFACTUAL AGENTIVE used_for used_as used_by used_against INSTRUMENTAL indirect_telic purpose is_the_activity_of is_the_ability_of is_the_habit_of TELIC ACTIVITY causes DIRECT concerns object_of_activity TELIC pane, farina affects constitutive_activity (bread, flour) senato, senatore P contains R (senate, senator) has_as_colour has_as_effect O manubrio, bicicletta has_as_property P measured_by (handlebar, bicycle) E measures R produces produced_by T property_of Y quantifies related_to successor_of arancio, arancia precedes typical_of (orange tree, orange) contains feeling abbaiare, cane is_in lives_in LOCATION (bark, Nilda Ruimy typical_location september 2004 dog)

Orthogonal dimensions of meaning le c ro used_for Teli y ed_b is_made_of instrument crea Orthogonal dimensions of meaning le c ro used_for Teli y ed_b is_made_of instrument crea t Con stitu tive role is_a Formal role Agentive role Nilda Ruimy september 2004

Orthogonal dimensions of meaning playing le c ro used_for Teli mak e by violin Orthogonal dimensions of meaning playing le c ro used_for Teli mak e by violin ed_ wood strings ical _ins mus has_as_part is_made_of crea t Con stitu tive is_a role trum ent Formal role Agentive role Nilda Ruimy september 2004

meaning dimensions expressed by Qualia relations botte barrel recipiente di legno traditional dictionary definition meaning dimensions expressed by Qualia relations botte barrel recipiente di legno traditional dictionary definition fatto Formal: isa Agentive: created_by Constitutive: made_of di doghe arcuate tenute unite da cerchi di ferro che serve per la conservazione e il trasporto di liquidi, specialmente vino Constitutive: contains Nilda Ruimy Telic: Used_for september 2004

Qualia informative power (1) Within a semantic type population, further clusterings can be made Qualia informative power (1) Within a semantic type population, further clusterings can be made through the is-a relation: Nilda Ruimy september 2004

Qualia informative power (2) utensile INSTRUMENT is-a graticolabrodo frusta posata is-a forchetta coltello used Qualia informative power (2) utensile INSTRUMENT is-a graticolabrodo frusta posata is-a forchetta coltello used for mangiare pentola contenitore CONTAINER used for cucinare is-a tegame Nilda Ruimy used for padella september 2004

semantic level: information content stress position vowel openness cons. prononciation Phonological Unit Corresp. Phn. semantic level: information content stress position vowel openness cons. prononciation Phonological Unit Corresp. Phn. U-Mrph. U Po. S & subcat. inflectional paradigm position synt. restr. a. head properties b. subcat. frame semant. class domain derivation synonymy formal role constitutive role agentive role telic role sem. restr. arguments Morphological Unit syntactic structure 1 Frameset syntactic structure 2 Corresp. Mrph. U-Syn. U Syntactic Unit ontological type Corresp. Syn. U-Sem. U event type semant. features semant. relations Semantic Unit Extended Qualia Structure regular polysemy type of link predicate predicative represent. Nilda Ruimy september 2004

Predicative Representation q Describes the semantic scenario a word sense is involved in q Predicative Representation q Describes the semantic scenario a word sense is involved in q Assigned to predicative semantic units Ø assignment of a lexical predicate Ø type of link holding btw. entry and predicate Ø predicate argument stucture ü semantic role of arguments ü selection restrictions of arguments Ø link semantic arguments / syntactic complements Nilda Ruimy september 2004

Assignment of a lexical predicate Øverbs; Øpredicative nouns: deverbals (costruzione) and collective simple nouns Assignment of a lexical predicate Øverbs; Øpredicative nouns: deverbals (costruzione) and collective simple nouns (gruppo), nouns denoting a relation (madre), quantity (bottiglia), part (fetta), unit of measurement (metro), property (bellezza); Øadjectives; Øsome adverbs (indipendentemente da) Nilda Ruimy september 2004

Predicate-semantic unit link accusare accusation to accuse process nominalisation master PRED_ACCUSARE patient nominalisation agent Predicate-semantic unit link accusare accusation to accuse process nominalisation master PRED_ACCUSARE patient nominalisation agent nominalisation accusatore accused accusator Nilda Ruimy september 2004

Semantic arguments: thematic roles ØProto. Agent: volitional subject of verb: ARG 0 of kill Semantic arguments: thematic roles ØProto. Agent: volitional subject of verb: ARG 0 of kill ØProto. Patient: object undergoing an action: ARG 1 of kill Ø 2 nd. Participant: indirect object: ARG 2 of give ØSo. A (State of Affair): sentential complement: ARG 2 of ask ØLocation: ARG 2 of put ØDirection: ARG 2 of move ØOrigin: ARG 1 of move ØKinship: ARG 0 of father ØHead. Quantified: ARG 0 of metre, bottle Nilda Ruimy september 2004

Semantic arguments: selectional restrictions ØNot proper restrictions, but rather preferences of combinations in prototypical Semantic arguments: selectional restrictions ØNot proper restrictions, but rather preferences of combinations in prototypical situations ØExpressible through: üsemantic types; ünotions (combination of types or type + feature…) üfeatures; üsemantic units ØFeatures, used transversely across semantic types (eg. : plus. Edible), allow to capture wider preferences w. r. t. single semantic types: ARG 1 eat : [Plus. Edible] / ARG 1 eat : [FOOD] Nilda Ruimy september 2004

Semantic entry information content (1) Aumento: L’aumento dei prezzi da parte del governo increase: Semantic entry information content (1) Aumento: L’aumento dei prezzi da parte del governo increase: the increase of prices by the government • Semantic type: Cause_change_of_value • Supertype: Cause_relational_change ONTOLOGICAL INFO. • Eventype: transition • Domain: general, economics • Gloss: accrescimento in dimensione o quantità • aumento isa cambiamento • aumento resulting_state maggiore • Agentivecause: yes • Direction: up EXTENDED QUALIA INFO. • Morphological derivation: Eventverb aumentare • Lexical semantic predicate: PRED_aumentare • Type of link: event nominalization PREDICATIVE REPRESENTATION • Predicate arg. struct. : range, semantic role & selectional restrictions of args. : Arg 0 Arg 1 Arg 2 Protoagent Proto. Patient Quantifier Human / Institution Entity Nilda Ruimy Amount september 2004

Semantic entry information content (2) vaporizzatore: spruzzare acqua con un vaporizzatore spray: to spray Semantic entry information content (2) vaporizzatore: spruzzare acqua con un vaporizzatore spray: to spray water with a spray • Semantic type: Instrument ONTOLOGICAL INFO. • Supertype: Artifact • Eventype: === • Domain: general, cleaning, gardening, cosmetics • Gloss: apparecchio usato per ridurre in minuscole particelle un liquido • vaporizzatore isa apparecchio • vaporizzatore has_as_part pulsante • vaporizzatore created_by fabbricare • vaporizzatore used_for atomizzare • Synonymy: nebulizzatore • Morphological derivation: Eventverb vaporizzare EXTENDED QUALIA INFO. • Lexical semantic predicate: PRED_vaporizzare • Type of link: instrument nominalization PREDICATIVE REPRESENTATION • Predicate arg. struct. : range, semantic role & selectional restrictions of args. : Arg 0 Arg 1 Arg 2 Protoagent Proto. Patient Location Human / Instrument +liquid Nilda Ruimy Concrete_entity september 2004

Syntax-semantics mapping (1) position synt. restr. a. head properties b. subcat. frame syntactic structure Syntax-semantics mapping (1) position synt. restr. a. head properties b. subcat. frame syntactic structure 1 Frameset syntactic structure 2 Syntactic Unit Corresp. Syntax-Semantics semant. class domain derivation synonymy sem. restr. arguments formal role constitutive role agentive role telic role predicate ontological type Corresp. Syn. U-Sem. U event type semant. features semant. relations Semantic Unit Extended Qualia Structure regular polysemy type of link predicative represent. Nilda Ruimy september 2004

Syntax-semantics mapping (2) SYNTACTIC LEVEL Syn. U_migliorare ‘to improve’ Transitive structure Intransitive structure P Syntax-semantics mapping (2) SYNTACTIC LEVEL Syn. U_migliorare ‘to improve’ Transitive structure Intransitive structure P 0 Frameset P 0 P 1 SEMANTIC LEVEL Sem. U 1_migliorare Sem. U 2_migliorare CAUSE_CHANGE_OF_STATE LINK PREDICATE-SEMANTIC UNIT SEMANTIC PREDICATE PRED_ migliorare ARG 0 : Agent ARG 1 : Patient Nilda Ruimy september 2004

Syntax-semantics mapping (2) Syn. U_migliorare Transitive structure ‘to improve’ Intransitive structure P 0 Frameset Syntax-semantics mapping (2) Syn. U_migliorare Transitive structure ‘to improve’ Intransitive structure P 0 Frameset P 0 P 1 CORRESPONDENCE SYNTACTIC-SEMANTIC FRAME isomorphic non-isomorphic Sem. U 1_migliorare Sem. U 2_migliorare CAUSE_CHANGE_OF_STATE PRED_ migliorare ARG 0 : Agent ARG 1 : Patient Nilda Ruimy september 2004

Template-driven encoding methodology Ø a template is a schema providing, for each semantic type, Template-driven encoding methodology Ø a template is a schema providing, for each semantic type, a set of structured information that are deemed crucial to its definition Ø twofold function: üinterface between ontology and lexicon üguide for the lexicographer Ø ensures systematicity, consistency and uniformity of representation of the lexical meaning Nilda Ruimy september 2004

A template Nilda Ruimy september 2004 A template Nilda Ruimy september 2004

CLIPS’ key features ØThe largest electronic, multilevel lexical resource of Italian language § 55, CLIPS’ key features ØThe largest electronic, multilevel lexical resource of Italian language § 55, 000 words encoded § 4 description levels: phonology, morphology, syntax, semantics Ø Based on a rich and multifunctional linguistic and representational model shared by 11 other European lexica Ø Lexical description conformant to international standards Ø Respect of the principles of uniformity, consistency and exhaustivity Ø Generic lexicon large coverage (vocabulary and synt. structures) Ø Fine-grained information, highly structured, innovative, most useful for HLT applications Ø High level of reusability Nilda Ruimy september 2004

Application fields q surface and deep analysis of texts q information retrieval q machine Application fields q surface and deep analysis of texts q information retrieval q machine translation q natural language understanding, etc. The wealth of information the lexicon contains allows: q building semantic networks q extracting the vocabulary of a specific domain q NP recognition: disambiguating the semantic contribution of some PPs in complex nominals Nilda Ruimy september 2004

To lend itself to further uses, a lexicon must have: Ø flexible model Ø To lend itself to further uses, a lexicon must have: Ø flexible model Ø generic database Ø uniformly structured data Ø precise and explicit linguistic description as the PAROLE and SIMPLE lexicons, CLIPS does meet these requirements Nilda Ruimy september 2004

Creating a bilingual electronic lexical resource Strategy I: 1) Use CLIPS and the PAROLE-SIMPLE Creating a bilingual electronic lexical resource Strategy I: 1) Use CLIPS and the PAROLE-SIMPLE French lexicon 2) Perform a semi-automatic linking of their respective entries Nilda Ruimy september 2004

Creating a bilingual electronic lexical resource Strategy II: 1) Derive , in a semi-automatic Creating a bilingual electronic lexical resource Strategy II: 1) Derive , in a semi-automatic way, a semantically annotated French lexicon from CLIPS 2) Use source and derived lexicons as a basis for building a bilingual resource Nilda Ruimy september 2004

Strategy I: bilingual dictionary ALGORIT IT-FR & FR-IT HM CLIPS PAR-SIMPLE French lex. capo Strategy I: bilingual dictionary ALGORIT IT-FR & FR-IT HM CLIPS PAR-SIMPLE French lex. capo ufficio gentile residenza tessere pompa scrivere tessuto vestibolo testo amministrator e vincere capo xxxxx tête yyyyy chef zzzzz bout ufficio xxxxx bureau yyyyy charge ……. . tête xxxxx testa yyyyy capo zzzzz faccia www cima bureau xxxxx ufficio yyyyy scrivania ……. . Nilda Ruimy capo_1 phon: … …. morph: . ……syn: ………. se m: ……. capo_2 …. ufficio_1 ………… ……. ? ? september 2004 tête_1 morph: . … …syn: … ……. sem : ……. tête_2 …. . tête_3 … bureau_1 ………… …….

ØAnalysis of the inherent properties of the SL & TL senses: • identity of ØAnalysis of the inherent properties of the SL & TL senses: • identity of ontological classification or subsumption relation btw. the semantic type of the SL & TL senses • identity of semantic class or subsumption relation btw. their semantic class • identity of domain or subsumption relation btw. their domain info. • identity / corrispondence of semantic features • identity / corrispondence of semantic relations ØAnalysis of their contextual properties: • compatibility of syntactic valency • function and grammatical instantiation of complements • compatibility of semantic valency • semantic role and semantic restrictions of arguments cf. Villegas et al. LREC 2000, Athens Nilda Ruimy september 2004

evento évènement freedefinition=”cio' che e' accaduto o potra' accadere, avvenimento” Tipo semantico: EVENT Supertype: evento évènement freedefinition=”cio' che e' accaduto o potra' accadere, avvenimento” Tipo semantico: EVENT Supertype: ENTITY Classe semantica: EVENT freedefinition="something that happens at a given place and time" Tipo semantico: EVENT Supertype: ----Classe semantica: EVENT scrivere écrire freedefinition=”creare qualcosa di scritto” Tipo semantico: SYMBOLIC_CREATION Supertype: CREATION Classe semantica: CREATION Domain: CREATIVE_WRITING freedefinition=”create written works & semi” Tipo semantico: CREATION Supertype: ----Classe semantica: CREATION Domain: ---- pompa pompe freedefinition=”macchina o apparecchio usato per sollevare liquidi o comprimere gas” Tipo semantico: INSTRUMENT Unification. Path: Concrete. Entity. Artifactagenti ve -Materialtelic Classe semantica: APPARATUS freedefinition= "a device that moves fluid or gas by pressure or suction" Tipo semantico: ----Unification. Path: ---- Classe semantica: APPARATUS Nilda Ruimy september 2004

testo_1 Tipo semantico: INFORMATION Supertype: REPRESENTATION Classe semantica: ABSTRACT Domain: MEDIA Tratto distintivo: PLUS_SEMIOTIC testo_1 Tipo semantico: INFORMATION Supertype: REPRESENTATION Classe semantica: ABSTRACT Domain: MEDIA Tratto distintivo: PLUS_SEMIOTIC texte Tipo semantico: RELATIONAL_ACT Supertype: ----Classe semantica: OBJECT Domain: ---Tratto distintivo: PLUS_SEMIOTIC testo_2 Tipo semantico: SEMIOTIC_ARTIFACT Unfication. Path: Concrete. Entity. Artifactagentive -Telic Classe semantica: ARTIFACT Domain: MEDIA Tratto distintivo: PLUS_SEMIOTIC vincere freedefinition=”portare a termine successo” Tipo semantico: RELATIONAL_ACT Classe semantica: ACTIVITY Rel. Sem: ---PREDICATE_vincere_1 vaincre con freedef. =”be the winner in contest/competition” Tipo semantico: CAUSE_RELAT. -CHANGE Classe semantica: CHANGE Rel. Sem: Resulting_action/state: victoire Agentive_cause: cause PREDICATE_vaincre_2 Nilda Ruimy september 2004

Drawbacks of this strategy Ø Discrepancy of lexical coverage between the lexicons => method Drawbacks of this strategy Ø Discrepancy of lexical coverage between the lexicons => method applicable to 10, 000 senses only Ø SIMPLE-FR does not always encode all information => necessity of manual intervention wherever SL and TL entries have NO corresponding element due to: ü lack of information ü encoding error ü having privileged different although complementary aspects of meaning, e. g. : imprigionare: PURPOSE_ACT vs. emprisonner: CAUSE_RELATIONAL_CHANGE Nilda Ruimy september 2004

Strategy II – Phase 1: Deriving a FR lexicon from CLIPS Ø Feasibility study Strategy II – Phase 1: Deriving a FR lexicon from CLIPS Ø Feasibility study for deriving a semantically annotated French lexicon using CLIPS lexical knowledge Ø Crucial step for deriving the French entries: correctly pair off each FR w. sense with the relevant CLIPS semantic unit whose information we want to ultimately assign to the French entry Nilda Ruimy september 2004

exploits the cognateness of Italian and French endings to relate the FR word to exploits the cognateness of Italian and French endings to relate the FR word to the IT CLIPS entry and infer the FR entry CLIPS cognate approach villaggio: 1. (piccolo centro abitato) village 2. (complesso urbanistico) village capo: 1. (testa) tête; 2. (persona che. . . ) chef. . . sense indicator approach matches onto the CLIPS data the information provided in bilingual dictionaries by sense indicators, in order to identify the relevant CLIPS entry Nilda Ruimy semantically annotated French lexicon september 2004

The cognate approach P. Bouillon, B. Cartoni, TIM/ISSCO, ETI, Geneva IT—FR bilingual dict. up The cognate approach P. Bouillon, B. Cartoni, TIM/ISSCO, ETI, Geneva IT—FR bilingual dict. up ookl villaggio : 1. (piccolo centro abitato) village 2. (complesso urbanistico) village deriva tion Condition: unique French constructed word translate all IT senses IT–CLIPS FR–LEX naming="villaggio" weightvalsemfeatrel= «Geopolitical_Location» […] naming="village" weightvalsemfeaturel= «Geopolitical_Location» […] naming="village" weightvalsemfeaturel= «Human_group» […] Nilda Ruimy september 2004

N. Ruimy, The sense indicator approach ILC-CNR, Pisa IT word SENSE INDICATOR FR word N. Ruimy, The sense indicator approach ILC-CNR, Pisa IT word SENSE INDICATOR FR word capo (persona chef che…) capo (testa) tête aspirare aspirer aspirare LING. aspirer aspirare tr. (inalare) aspirer aspirare intr. (avere) prep. a (avvisare) aspirer à avvertire (percepire) sentir asfalto … tr. (con un tubo) (per rivestire) asphalte compagnia (gruppo) compagnia (presenza) extracted from bilingual dictionary prévenir compag nie analysis & classification of sense indicators Nilda Ruimy september 2004

Types of sense indicators (1) Atkins, Bouillon, 2003 q indicators conveying morphosyntactic information: ü Types of sense indicators (1) Atkins, Bouillon, 2003 q indicators conveying morphosyntactic information: ü verb subclass, auxiliary selection, plural form of nouns, typical subject / object, PP type, etc. Italian–French COVARE typical subj. A. v. tr. 1 (di uccelli) [dar calore col proprio corpo alle uova per sviluppare l’embrione] couver 2 (fig. ) [custodire con gelosia] couver 3 (fig. )[nutrire, alimentare in segreto dentro di sé] nourrir, mijoter [tramare, macchinare in segreto] couver [incubare] couver: covare un malanno B. v. intr. (aus. avere)(fig. )[stare chiuso, nascosto] couver: il fuoco cova sotto la cenere verbal class Nilda Ruimy verbal class auxiliary september 2004

Types of sense indicators (2) q indicators conveying inferential information: ü synonyms, hypernyms, meronyms Types of sense indicators (2) q indicators conveying inferential information: ü synonyms, hypernyms, meronyms ü domain of use Italian–French synonym CAPO I (persone) 1 [testa] tête 2 (fig. ) [mente, intelligenza] tête 3 [persona investita di comando, di potere] chef synonym domain of use hypernym II (animali) 1 (raro) -> testa 2 spec. al plur [ciascun individuo di una specie determinata] têtes, pièces III (cose) 1 [la parte più grossa e più sporgente di un oggetto] tête 2 [la parte più alta] haut 3 [ciascuna delle due estremità di qlco. ] bout, tête 4 [inizio, principio] début 5 [fine, conclusione; sbocco] bout 6 loc. …. . 7 (nei filati) fil 8 [singolo oggetto appartenente ad una serie] pièce 9 (geog. ) cap Nilda Ruimy synonym domain of use september 2004

IT word SENSE INDICATOR FR word capo (persona chef che…) capo (testa) tête aspirare IT word SENSE INDICATOR FR word capo (persona chef che…) capo (testa) tête aspirare tr. (con un tubo) aspirer aspirare LING. aspirer aspirare tr. (inalare) aspirer aspirare aspirer à avvertire intr. (avere) prep. a (avvisare) avvertire (percepire) sentir asfalto (per rivestire) asphalte gioielleria (negozio) bijouterie gioielleria (arte) bijouterie CLIPS prévenir … sense indicators used as search keys for identifying, in CLIPS, the semantic entry relevant to the IT sense of the bilingual pair Nilda Ruimy september 2004

Using sense indicators q indicators usable straightforwardly sem. type of analizzatore belongs to HUMAN Using sense indicators q indicators usable straightforwardly sem. type of analizzatore belongs to HUMAN hierarchy q indicators to be converted into the descriptive language of CLIPS: ü analizzatore (chi effettua analisi) analyseur (who performs analyses) ü illuminare (rendere luminoso) illuminer (to make luminous) sem. type of iluminare belongs to causative types hierarchy Nilda Ruimy september 2004

Rule types Ø search for a CLIPS entry containing the s. i. as target Rule types Ø search for a CLIPS entry containing the s. i. as target üof the synonymic relation synonym_rel testa capo üof the hypernymic relation gioielleria isa_rel negozio üof any qualia relation Ø search for a CLIPS entry sharing properties with the entry of the s. i. comunicare (notificare) isa_rel dire üshared hypernym üshared semantic type avvertire (percepire) semtype EXP. _EVENT Ø search for a CLIPS entry containing information inferred from the s. i. üspecific type conoscere (pron. (reciprocamente)) reciprocal syn. struct. üspecific relation or feature (esp. domain info. ) üspecific syntactic structure Nilda Ruimy september 2004

IT word SENSE INDICATOR FR word capo (persona chef che…) capo (testa) tête aspirare IT word SENSE INDICATOR FR word capo (persona chef che…) capo (testa) tête aspirare tr. (con un tubo) aspirer aspirare LING. aspirer aspirare tr. (inalare) aspirer aspirare aspirer à avvertire intr. (avere) prep. a (avvisare) avvertire (percepire) sentir asfalto (per rivestire) asphalte compagnia (gruppo) compagnia (presenza) … CLIPS prévenir compag nie Sem. U 3615 capo, sem. type=Role, where isa Sem. U 61397 capo, sem. type=Body_part, where synonym Sem. U 79372 aspirare, sem. type=Speech_act, where domain: phonetics Sem. U 7040 aspirare, sem. type=Modal_event, linked to Syn. Uaspirare, intr. pp_a Sem. U 68603 asfalto, sem. type=Artifact_Material, where used_for Nilda Ruimy september 2004

Cognate approach: results IT constructed words whose different senses are translated by a unique Cognate approach: results IT constructed words whose different senses are translated by a unique FR constructed word IT constructed words having more than one translation –aggio 89. 9 % 10. 1 % –tà 77. 4 % 22. 6 % –zione 80. 4 % 19. 6 % recall ratio FR constructed words sharing the IT CLIPS entries –aggio 99. 97 % –tà 99. 98 % –zione 99. 98 % Nilda Ruimy september 2004 Small percentage of errors due to a different granularity of sense distinctions in CLIPS and in the blingual dictionary

Sense indicator approach: results Itword – sense indicator – FRword X – A 1 Sense indicator approach: results Itword – sense indicator – FRword X – A 1 rule type application order – 3 2 search for an entry of X search for entry of containing string A X sharing properties with an entry of A 1 investigated target of lex. data syn. rel. 2 Y 9 7 search for an entry of X containing information inferred from A 8 shared target of hyper. rel. any qualia hypernym 6 3 5 4 shared semtype specific domain specific feat/rel specific syn. struct the higher the rule rank, the more reliable the result success 16. 6% 26. 8% 0. 92% 8. 9% rate 5. 8% Nilda Ruimy 3. 9% 12. 3% 9. 2% 15. 4% september 2004

distribution of success rate over the algorithm rules recall ratio: 69% Nilda Ruimy september distribution of success rate over the algorithm rules recall ratio: 69% Nilda Ruimy september 2004

Combining the two methods successful handling of: results may be enhanced by gleaning the Combining the two methods successful handling of: results may be enhanced by gleaning the most informative sense indicators from different sources Ø 95% of constructed words + Ø 69% of non constructed words represent 68. 2% of the vocabulary Nilda Ruimy september 2004

Concluding remarks ØDeriving new lexical resources from existing ones: a worthwhile venture in terms Concluding remarks ØDeriving new lexical resources from existing ones: a worthwhile venture in terms of time and effort ØDerived lexicon building process is simplified and shortened ØSuch practice entails coverage and consistency assessment of the source lexical resource ØSource and derived lexiconstitute a most reliable basis for developing a bilingual resource ØApproaches taken applicable to other language pairs sharing similarities in terms of morphological structure Nilda Ruimy september 2004