a0d414563ff066752be925234eac826f.ppt
- Количество слайдов: 48
Ontology mapping: a way out of the medical tower of Babel? Frank van Harmelen Vrije Universiteit Amsterdam The Netherlands Antilles
Before we start… n a talk on ontology mappings is difficult talk to give: n no concensus in the field • on merits of the different approaches • on classifying the different approaches n no one can speak with authority on the solution n this is a personal view, with a sell-by date n other speakers will entirely disagree (or disapprove)
Good overviews of the topic n Knowledge Web D 2. 2. 3: “State of the art on ontology alignment” n Ontology Mapping Survey talk by Siyamed Seyhmus SINIR n ESWC'05 Tutorial on Schema and Ontology Matching by Pavel Shvaiko Jerome Euzenat n KER 2003 paper Kalfoglou & Schorlemmer n These are all different & incompatible…
Ontology mapping: a way out of the medical tower of Babel?
The Medical tower of Babel n Mesh • Medical Subject Headings, National Library of Medicine • 22. 000 descriptions n EMTREE • Commercial Elsevier, Drugs and diseases • 45. 000 terms, 190. 000 synonyms n UMLS • Integrates 100 different vocabularies n SNOMED • 200. 000 concepts, College of American Pathologists n Gene Ontology • 15. 000 terms in molecular biology n NCI Cancer Ontology: • 17, 000 classes (about 1 M definitions),
Ontology mapping: a way out of the medical tower of Babel?
What are ontologies & what are they used for world concept language no shared understanding Conceptual and terminological confusion Agree on a conceptualization Make it explicit in some language. Actors: both humans and machines
Ontologies come in very different kinds n From lightweight to heavyweight: • Yahoo topic hierarchy • Open directory (400. 000 general categories) • Cyc, 300. 000 axioms n From very specific to very general • METAR code (weather conditions at air terminals) • SNOMED (medical concepts) • Cyc (common sense knowledge)
What’s inside an ontology? n terms + specialisation hierarchy n classes + class-hierarchy n instances n slots/values n inheritance (multiple? defaults? ) n restrictions on slots (type, cardinality) n properties of slots (symm. , trans. , …) n relations between classes (disjoint, covers) n reasoning tasks: classification, subsumption Increasing semantic “weight”
In short (for the duration of this talk) n Ontologies are not definitive descriptions of what exists in the world (= philosphy) n Ontologies are models of the world constructed to facilitate communication n Yes, ontologies exist (because we build them)
Ontology mapping: a way out of the medical tower of Babel?
Ontology mapping is old & inevitable n Ontology mapping is old • db schema integration • federated databases n Ontology mapping is inevitable • ontology language is standardised, • don't even try to standardise contents
Ontology mapping is important n database integration, heterogeneous database retrieval (traditional) n catalog matching (e-commerce) n agent communication (theory only) n web service integration (urgent) n P 2 P information sharing (emerging) n personalisation (emerging)
Ontology mapping is now urgent n Ontology mapping has acquired new urgency • physical and syntactic integration is ± solved, (open world, web) • automated mappings are now required (P 2 P) • shift from off-line to run-time matching n Ontology mapping has new opportunities • larger volumes of data • richer schemas (relational vs. ontology) • applications where partial mappings work
Different aspects of ontology mapping n how to discover a mapping n how to represent a mapping • subset/equal/disjoint/overlap/ is-somehow-related-to • logical/equational/category-theoretical n atomic/complex arguments, n confidence measure n how to use it We only talk about “how to discover”
Many experimental systems: (non-exhaustive!) n n n Prompt (Stanford SMI) Anchor-Prompt (Stanford SMI) Chimerae (Stanford KSL) Rondo (Stanford U. /ULeipzig) Mo. A (ETRI) Cupid (Microsoft research) Glue (Uof Washington) FCA-merge (UKarlsruhe) IF-Map Artemis (UMilano) T-tree (INRIA Rhone-Alpes) S-MATCH (UTrento) n n n Coma (ULeipzig) Buster (UBremen) MULTIKAT (INRIA S. A. ) ASCO (INRIA S. A. ) OLA (INRIA R. A. ) Dogma's Methodology Art. Gen (Stanford U. ) Alimo (ITI-CERTH) Bibster (UKarlruhe) QOM (UKarlsruhe) KILT (INRIA LORRAINE)
Different approaches to ontology matching n Linguistics & structure n Shared vocabulary n Instance-based matching n Shared background knowledge
Linguistic & structural mappings n normalisation (case, blanks, digits, diacritics) n lemmatization, N-grams, edit-distance, Hamming distance, n distance = fraction of common parents n elements are similar if their parents/children/siblings are similar decreasing order of boredom
Different approaches to ontology matching n Linguistics & structure n Shared vocabulary n Instance-based matching n Shared background knowledge
Matching through shared vocabulary Q Low(Q) Q Up(Q) U Low(Q) µ Q µ I Up(Q)
Matching through shared vocabulary n Used in mapping geospatial databases from German land-registration authorities (small) n Used in mapping bio-medical and genetic thesauri (large)
Different approaches to ontology matching n Linguistics & structure n Shared vocabulary n Instance-based matching n Shared background knowledge
Matching through shared instances
Matching through shared instances n Used by Ichise et al (IJCAI’ 03) to succesfully map parts of Yahoo to parts of Google n Yahoo = 8402 classes, 45. 000 instances n Google = 8343 classes, 82. 000 instances n Only 6000 shared instances n 70% - 80% accuracy obtained (!) n Conclusions from authors: • semantics is needed to improve on this ceiling
Different approaches to ontology matching n Linguistics & structure n Shared vocabulary n Instance-based matching n Shared background knowledge
Matching using shared background knowledge ontology 1 ontology 2
Ontology mapping using background knowledge Case study 1 PHILIPS Work with Zharko Aleksovski @ Philips • Michel Klein @ VU KIK @ AMC
Overview of test data Two terminologies from intensive care domain n OLVG list • List of reasons for ICU admission n AMC list • List of reasons for ICU admission n DICE hierarchy • Additional hierarchical knowledge describing the reasons for ICU admission
OLVG list n developed by clinician n 3000 reasons for ICU admission n 1390 used in first 24 hours of stay • 3600 patients since 2000 n based on ICD 9 + additional material n List of problems for patient admission n Each reason for admission is described with one label • Labels consist of 1. 8 words on average • redundancy because of spelling mistakes • implicit hierarchy (e. g. many fractures)
AMC list n List of 1460 problems for ICU admission n Each problem is described using 5 aspects from the DICE terminology: n 2500 concepts (5000 terms), 4500 links • Abnormality (size: 85) • Action taken (size: 55) • Body system (size: 13) • Location (size: 1512) • Cause (size: 255) n expressed in OWL n allows for subsumption & part-of reasoning
Why mapping AMC list $ OLVG list? n allow easy entering of OLVG data n re-use of data in • epidemiology • quality of care assessment • data-mining (patient prognosis)
Linguistic mapping: n Compare each pair of concepts n Use labels and synonyms of concepts n Heuristic method to discover equivalence and subclass relations Long brain tumor More specific Long tumor than n First round • compare with complete DICE • 313 suggested matches, around 70 % correct n Second round: • only compare with “reasons for admission” subtree • 209 suggested matches, around 90 % correct è High precision, low recall (“the easy cases”)
Using background knowledge n Use properties of concepts n Use other ontologies to discover relation between properties …. …. …. ? …. …. ….
Semantic match Lexical match OLVG problem list DICE aspect taxonomies ? ? ? Given Abnormality taxonomy Action taxonomy Body system taxonomy Location taxonomy Cause taxonomy Implicit matching: property match DICE problem list
Semantic match Taxonomy of body parts Blood vessel is more general Vein Artery is more general Aorta Lexical match: has location Reasoning: implies Aorta thoracalis dissection Lexical match: has location Dissection of artery Location match: has more general location
Example: “Heroin intoxication” – “drugs overdose” Cause taxonomy Drugs is more general Heroine Lexical match: cause Heroin intoxication Lexical match: abnormality Cause match: has more specific cause Abnormality match: has more general abnormality Abnormality taxonomy Intoxicatie is more general Overdosis Lexical match: cause Drugs overdosis Lexical match: abnormality
Example results • OLVG: Acute respiratory failure DICE: Asthma cardiale • OLVG: Aspergillus fumigatus DICE: Aspergilloom • OLVG: duodenum perforation DICE: Gut perforation • OLVG: HIV DICE: AIDS • OLVG: Aorta thoracalis dissectie type B DICE: Dissection of artery abnormality cause abnormality, cause location, abnormality
Ontology mapping using background knowledge Case study 2 Work with Heiner Stuckenschmidt @ VU
Case Study: 1. Map GALEN & Tambis, using UMLS as background knowledge 2. Select three topics with sufficient overlap • • • Substances Structures Processes 3. Define some partial & ad-hoc manual mappings between individual concepts 4. Represent mappings in C-OWL 5. Use semantics of C-OWL to verify and complete mappings
Case Study: verification & derivation UMLS verification & derivation (medical terminology) lexical mapping GALEN (medical ontology) Tambis derived mapping (genetic ontology)
Ad hoc mappings: Substances UMLS GALEN Notice: mappings high and low in the hierarchy, few in the middle
Ad hoc mappings: Substances UMLS Tambis Notice different grainsize: UMLS course, Tambis fine
Verification of mappings = UMLS: Chemicals Tambis: Chemical UMLS: Chemicals_ viewed_structurally Tambis: enzyme ? UMLS: Chemicals_ viewed_functionally UMLS: enzyme =
Deriving new mappings UMLS: substance UMLS: Chemicals UMLS: Phenomenon_ or_process Galen: Chemical. Substance UMLS: Organic. Chemical =
Ontology mapping: a way out of the medical tower of Babel?
“Conclusions” n Ontology mapping is (still) hard & open n Many different approaches will be required: • • • linguistic, structural statistical semantic … n Currently no roadmap theory on what's good for which problems
Challenges n roadmap theory n run-time matching n “good-enough” matches n large scale evaluation methodology n hybrid matchers (needs roadmap theory)
Ontology mapping: a way out of the medical tower of Babel? ?


