
0f97d48ec4ed275c57f5fd492f967fff.ppt
- Количество слайдов: 48
Metodi per ontology management and construction
Syllabus • Come costruire un’ontologia • Quali strumenti sono a disposizione • Come apprendere automaticamente ontologie da risorse
Costruire un’ontologia • Da zero • Ri-ingegnerizzando ontologie esistenti • Integrando ontologie esistenti
Costruire un’ontologia • Identificare gli scopi • Identificare i “termini” rilevanti (albergo, prenotazione) • Distinguere concetti e relazioni fra i termini usati per denotare entrambi PERSONA – Es: prenota(persona, albergo) • Codificare l’ontologia prenota ALBERGO
Costruire un’ontologia • Utilizzare risorse semi-formali disponibili (glossari, tesauri, data-document warehouses) • Usare strumenti di consensus building e collaborative working • Integrare competenze diverse (lessicografi, knowledge engineers, esperti di dominio, utenti dell’applicazione)
Ontology Editing and Management Systems
Protégé • http: //protege. stanford. edu/
Creare un “progetto” OWL in Protégé
OWL in Protégé Individuals (e. g. , “Four. Seasons”) Properties Object. Properties (references) Datatype. Properties (simple values) Classes (e. g. , “Hotel”)
Object Properties • Collega due istanze (individuals) • Tipi di relazioni (0. . n, n. . m) t Par as Bondi. Beach h Sydney has. Ac como dation Four. Seasons
Inverse Properties • Rappresentno relazioni bidirezionali • Aggiungere un valore ad una proprietà implica aggiungerlo alla sua inversa t Par has Sydney rt. Of is. Pa Bondi. Beach
Proprietà transitive New. South. Wales has. P art Sydney has. P art has. Part (derived) Bondi. Beach
Datatype. Properties • Collega individui a valori primitivi (integers, floats, strings, booleans ecc) Sydney has. Size = 4, 500, 000 is. Capital = true rdfs: comment = “Don’t miss the opera house”
Classi • Gruppi di individui con caratteristiche comuni • Tutti gli individui sono istanza almeno di una classe Beach City Sydney Cairns Bondi. Beach Currawong. Beach
Range e Domain • Specifica di una proprietà – Domain: “è il lato sinistro di una relazione” (Destination) – Range: “il lato destro” (Accomodation) Accomodation Destination n ccomodatio has. A Best. Western has. Accomod Four. Seasons Sydney ation
Dominio • Gli individui possono assumere solo i valori di proprietà con dominii corrispondenti, es: “Only Destinations can have Accomodations” • I dominii possono contenere classi multiple – Objects and Animates have Parts • I dominii possono essere indefiniti: le proprietà possono essere usate ovunque
Relazioni fra Classi e Superclassi • Le classi sono strutturate in gerarchie • Le istanze dirette di classi sono anche istanze indirette di super-classi Cairns Sydney Canberra Coonabarabran
Relazioni fra classi • Le classi si sovrappongono arbitrariamente Retiree. Destination City Cairns Sydney Bondi. Beach
Disgiunzione fra classi • Le classi possono sovrapporsi • Ma in alcuni casi si desidera che non abbiano istanze in comune disjoint. With Urban. Area Sydney City Rural. Area Woomera Cape. York Destination
Onto. Edit • Onto. Edit http: //www. ontoprise. de/products/ontoedit – Karlsruhe University – Ambiente grafico per editing di ontologie – Architettura estensibile per aggiungere plug-in – Ontologia viene memorizzata in un database relazionale – Implementata in XML, Flogic, RDF(S) DAML+OIL
Assiomi e inferenze in Onto. Edit
Altri strumenti di ontology management • • • Chimaera OILEd Apollo MOMIS Sym. Onto. X
Ontology learning and population
Ontology building • Costruire un’ontologia interamente a mano è faticoso e time-consuming • Poche ontologie contengono più di qualche centinaio di concetti • Sforzi opratutto per la definizione di core ontologies • Strumenti di apprendimento automatico e NLP per popolare automaticamente core ontologies
Metodi automatici per ontology building • In genere basati su NLP e machine learning • Alcuni metodi: – Cercare nei testi “patterns” sintattici che sussumano relazioni (ad esempio, l’apposizione (es. “Shakespeare, the poet” hypernim(N 2, N 1): -appositive(N 2, N 1)) – Metodi statistici per estrarre termini e “string inclusion” per derivare relazioni di iperonimia (es: color laser printer) – Metodi di machine learning “apprendono” regole di assegnazione di relazioni semantiche fra termini utilizzando training sets di testi manualmente etichettati
Il sistema Ontolearn • Integra tecniche di machine learning, natural language processing e analisi statistica • Utilizza vari tipi di risorse (lessici semantici, corpora di testi, glossari) • Sperimentato in vari progetti nazionali e internazionali in vari mabiti (e-learning, interoperabilità , turismo, economia, compuer networks, arte)
Architettura del Sistema Onto. Learn
1. Terminology Extraction • Extract terminological candidates – Use a natural language processor (English or Italian) – Extract multiword strings conforming to syntactic structures for terminology (compounds, adj_noun, PP) • Filter candidates, using two entropy-based measures and “contrastive” domains Next week Project partner – Domain Relevance – Domain Consensus DR DC D 1. . . Di. . . Dn d 1. . . di. . . dn • Obtain a terminology T (list of domain relevant single and multiword expressions)
2. Search Definitions • Use existing glossaries and Google’s define feature to search for term definitions • Extract term definitions from documents (tutorials, seminal papers) • Parse the glossary definitions • Use grammar-based approach to detect hyperonymy relations in glossary definitions • Use a WSD algorithm to attach sub-tree roots to the concepts of the core ontology
An example of root attachment core algorithm#1 No definition Cryptographic_algorithm Data Encryption Standard “A cryptographic algorithm for the protection of unclassified computer data and …” extension Type_3_Algorithm “A cryptographic algorithm that has been registered by the National Institute of Standards and Technology and …”
Computer Networks domain
Interoperability domain
Glossary Parsing Experiments • 6, 800 terms of computer network application • Minor experiments on Economy , Art techniques, Tourism, and Interoperability (200 -1000 terms) • Computer networks: about 550 sub-trees, 98% precision in detection of hypernyms from definitions. • About 82% precision in sub-trees root attachment to Word. Net (on-going experiments to reinforce the concept choice with “classical” similarity measures) • Similar performances in the other domains
Ongoing experiment within INTEROP • An interoperability glossary of 377 terms built from online glossaries (to be further extended) • Ongoing evaluation being performed by 7 domain experts from different areas of interoperability (enterprise modeling, architectures & platforms, ontologies) Term Definition Hypernym Source Knowledge engineer Ontology construction A person who implements an expert system. Ontology construction is usually a manual, iterative process consisting […] Analytical examination of a process for the purpose of […] Person www. pera. net/Tools/Glossary/ Enterprise_Integration/Glossary. ht ml Iterative process mia. ece. uic. edu/~papers/Me dia. Bot/pdf 00002. pdf Analytical examination www. pera. net/Tools/Glossar y/Enterprise_Integration/Glo ssary. html Process Analysis
What if no definitions are found? 3. Compositional interpretation • Objective: Attempt a compositional interpretation, i. e. complex term meaning is obtained composing the meaning of its parts • Example: computer terminal not in core ontology, but (e. g. in Word. Net) 2 senses for computer, 3 senses for terminal • Method: find appropriate core ontology concept for each term component using WSD algorithm
SSI: an algorith for WSD • In Onto. Learn, used for: – Compositional interpretation of multi word expressions (e. g. “computer terminal” = computer#1 terminal#3) – Attaching a sub-tree under the appropriate node of a general purpose ontology (e. g. for the tree rooted in artificial_language#1
Structural Semantic Interconnection (SSI) • A WSD algorithm based on Structural Pattern Recognition • Starts from terms (singleton and multiword expressions) in a context T and produces an inventory I of concept labels • Relevant applications: – Ontology Learning – WSD tasks – Query Expansion – Semantic Annotation
SSI is a knowlede-based algorithm • Step 1: A Lexical knowledge base (LKB) was built integrating several available on-line resources: – Word. Net – Oxford Collegiate Dictionary of collocations – Sem. Cor and LDC (semantically annotated corpora) –… – Integration in part manual in part automatic
Step 2. A structural representation of a concept c is a cut over a LKB, centered in c, including all nodes in LKB at a maximum distance of 3 from c Bus, transport Bus, connector
The task: given a context, build a semantic interpretation T = [t 1, t 2, …, tn ] context SSI I = [St 1, St 2, …, Stn] semantic interpretation
(a) Build Semantic Networks T = [ bus, network, redundancy, connection ] • For each alternative sense of a word in T, find the best matching graph with respect to already disambiguated senses in I • I is initialised with unambiguous words in T, if no monosempus words, with the first sense of the less ambiguous word w in T (then algorithm is forked into as many executions as the senses of w) A vehicle carrying many passengers; used for transport The topology of a network whose components are connected by a busbar
Semantic Interpretation of Terms: (b) Intersect Semantic Nets T = [ bus, network, redundancy, connection ] Intersect all alternative I = [ bus#2, network#5, semantic networks and choose redundancy#3, connection#3 ] the networks with the higher number of relevant intersections. A CF grammar is used to detect relevant intersection patterns, e. g. gloss rule “network#5” appears in the definition (gloss) of “bus#2” or hyperonymy/hyponymy rule e. g.
Example of an intersection pattern for taxi#1 T [taxi, license, traveller, driver]
The algorithm is used in two ways • Compositional interpretation: – Given a list of multiword expression components (e. g. component, interacting, computer. . ) Interacting-component interact#1 component#3 – Given the elements of a sub-tree, find the appropriate attachment between the root node and a node in Word. Net ( artificial_language#1
4. Build a Semantic Tree • A complex term now corresponds to a complex concept • Arrange concepts according to detected hyperonymy relations • Hyperonymy relations have been detected either parsing natural language difinitions, or disambiguating the components of a complex term
An example of root attachment core algorithm#1 No definition Cryptographic_algorithm Data Encryption Standard “A cryptographic algorithm for the protection of unclassified computer data and …” extension Type_3_Algorithm “A cryptographic algorithm that has been registered by the National Institute of Standards and Technology and …”
0f97d48ec4ed275c57f5fd492f967fff.ppt