Скачать презентацию Ontology mapping a way out of the medical Скачать презентацию Ontology mapping a way out of the medical

a0d414563ff066752be925234eac826f.ppt

  • Количество слайдов: 48

Ontology mapping: a way out of the medical tower of Babel? Frank van Harmelen Ontology mapping: a way out of the medical tower of Babel? Frank van Harmelen Vrije Universiteit Amsterdam The Netherlands Antilles

Before we start… n a talk on ontology mappings is difficult talk to give: Before we start… n a talk on ontology mappings is difficult talk to give: n no concensus in the field • on merits of the different approaches • on classifying the different approaches n no one can speak with authority on the solution n this is a personal view, with a sell-by date n other speakers will entirely disagree (or disapprove)

Good overviews of the topic n Knowledge Web D 2. 2. 3: “State of Good overviews of the topic n Knowledge Web D 2. 2. 3: “State of the art on ontology alignment” n Ontology Mapping Survey talk by Siyamed Seyhmus SINIR n ESWC'05 Tutorial on Schema and Ontology Matching by Pavel Shvaiko Jerome Euzenat n KER 2003 paper Kalfoglou & Schorlemmer n These are all different & incompatible…

Ontology mapping: a way out of the medical tower of Babel? Ontology mapping: a way out of the medical tower of Babel?

The Medical tower of Babel n Mesh • Medical Subject Headings, National Library of The Medical tower of Babel n Mesh • Medical Subject Headings, National Library of Medicine • 22. 000 descriptions n EMTREE • Commercial Elsevier, Drugs and diseases • 45. 000 terms, 190. 000 synonyms n UMLS • Integrates 100 different vocabularies n SNOMED • 200. 000 concepts, College of American Pathologists n Gene Ontology • 15. 000 terms in molecular biology n NCI Cancer Ontology: • 17, 000 classes (about 1 M definitions),

Ontology mapping: a way out of the medical tower of Babel? Ontology mapping: a way out of the medical tower of Babel?

What are ontologies & what are they used for world concept language no shared What are ontologies & what are they used for world concept language no shared understanding Conceptual and terminological confusion Agree on a conceptualization Make it explicit in some language. Actors: both humans and machines

Ontologies come in very different kinds n From lightweight to heavyweight: • Yahoo topic Ontologies come in very different kinds n From lightweight to heavyweight: • Yahoo topic hierarchy • Open directory (400. 000 general categories) • Cyc, 300. 000 axioms n From very specific to very general • METAR code (weather conditions at air terminals) • SNOMED (medical concepts) • Cyc (common sense knowledge)

What’s inside an ontology? n terms + specialisation hierarchy n classes + class-hierarchy n What’s inside an ontology? n terms + specialisation hierarchy n classes + class-hierarchy n instances n slots/values n inheritance (multiple? defaults? ) n restrictions on slots (type, cardinality) n properties of slots (symm. , trans. , …) n relations between classes (disjoint, covers) n reasoning tasks: classification, subsumption Increasing semantic “weight”

In short (for the duration of this talk) n Ontologies are not definitive descriptions In short (for the duration of this talk) n Ontologies are not definitive descriptions of what exists in the world (= philosphy) n Ontologies are models of the world constructed to facilitate communication n Yes, ontologies exist (because we build them)

Ontology mapping: a way out of the medical tower of Babel? Ontology mapping: a way out of the medical tower of Babel?

 Ontology mapping is old & inevitable n Ontology mapping is old • db Ontology mapping is old & inevitable n Ontology mapping is old • db schema integration • federated databases n Ontology mapping is inevitable • ontology language is standardised, • don't even try to standardise contents

 Ontology mapping is important n database integration, heterogeneous database retrieval (traditional) n catalog Ontology mapping is important n database integration, heterogeneous database retrieval (traditional) n catalog matching (e-commerce) n agent communication (theory only) n web service integration (urgent) n P 2 P information sharing (emerging) n personalisation (emerging)

 Ontology mapping is now urgent n Ontology mapping has acquired new urgency • Ontology mapping is now urgent n Ontology mapping has acquired new urgency • physical and syntactic integration is ± solved, (open world, web) • automated mappings are now required (P 2 P) • shift from off-line to run-time matching n Ontology mapping has new opportunities • larger volumes of data • richer schemas (relational vs. ontology) • applications where partial mappings work

Different aspects of ontology mapping n how to discover a mapping n how to Different aspects of ontology mapping n how to discover a mapping n how to represent a mapping • subset/equal/disjoint/overlap/ is-somehow-related-to • logical/equational/category-theoretical n atomic/complex arguments, n confidence measure n how to use it We only talk about “how to discover”

Many experimental systems: (non-exhaustive!) n n n Prompt (Stanford SMI) Anchor-Prompt (Stanford SMI) Chimerae Many experimental systems: (non-exhaustive!) n n n Prompt (Stanford SMI) Anchor-Prompt (Stanford SMI) Chimerae (Stanford KSL) Rondo (Stanford U. /ULeipzig) Mo. A (ETRI) Cupid (Microsoft research) Glue (Uof Washington) FCA-merge (UKarlsruhe) IF-Map Artemis (UMilano) T-tree (INRIA Rhone-Alpes) S-MATCH (UTrento) n n n Coma (ULeipzig) Buster (UBremen) MULTIKAT (INRIA S. A. ) ASCO (INRIA S. A. ) OLA (INRIA R. A. ) Dogma's Methodology Art. Gen (Stanford U. ) Alimo (ITI-CERTH) Bibster (UKarlruhe) QOM (UKarlsruhe) KILT (INRIA LORRAINE)

Different approaches to ontology matching n Linguistics & structure n Shared vocabulary n Instance-based Different approaches to ontology matching n Linguistics & structure n Shared vocabulary n Instance-based matching n Shared background knowledge

Linguistic & structural mappings n normalisation (case, blanks, digits, diacritics) n lemmatization, N-grams, edit-distance, Linguistic & structural mappings n normalisation (case, blanks, digits, diacritics) n lemmatization, N-grams, edit-distance, Hamming distance, n distance = fraction of common parents n elements are similar if their parents/children/siblings are similar decreasing order of boredom

Different approaches to ontology matching n Linguistics & structure n Shared vocabulary n Instance-based Different approaches to ontology matching n Linguistics & structure n Shared vocabulary n Instance-based matching n Shared background knowledge

Matching through shared vocabulary Q Low(Q) Q Up(Q) U Low(Q) µ Q µ I Matching through shared vocabulary Q Low(Q) Q Up(Q) U Low(Q) µ Q µ I Up(Q)

Matching through shared vocabulary n Used in mapping geospatial databases from German land-registration authorities Matching through shared vocabulary n Used in mapping geospatial databases from German land-registration authorities (small) n Used in mapping bio-medical and genetic thesauri (large)

Different approaches to ontology matching n Linguistics & structure n Shared vocabulary n Instance-based Different approaches to ontology matching n Linguistics & structure n Shared vocabulary n Instance-based matching n Shared background knowledge

Matching through shared instances Matching through shared instances

Matching through shared instances n Used by Ichise et al (IJCAI’ 03) to succesfully Matching through shared instances n Used by Ichise et al (IJCAI’ 03) to succesfully map parts of Yahoo to parts of Google n Yahoo = 8402 classes, 45. 000 instances n Google = 8343 classes, 82. 000 instances n Only 6000 shared instances n 70% - 80% accuracy obtained (!) n Conclusions from authors: • semantics is needed to improve on this ceiling

Different approaches to ontology matching n Linguistics & structure n Shared vocabulary n Instance-based Different approaches to ontology matching n Linguistics & structure n Shared vocabulary n Instance-based matching n Shared background knowledge

Matching using shared background knowledge ontology 1 ontology 2 Matching using shared background knowledge ontology 1 ontology 2

Ontology mapping using background knowledge Case study 1 PHILIPS Work with Zharko Aleksovski @ Ontology mapping using background knowledge Case study 1 PHILIPS Work with Zharko Aleksovski @ Philips • Michel Klein @ VU KIK @ AMC

Overview of test data Two terminologies from intensive care domain n OLVG list • Overview of test data Two terminologies from intensive care domain n OLVG list • List of reasons for ICU admission n AMC list • List of reasons for ICU admission n DICE hierarchy • Additional hierarchical knowledge describing the reasons for ICU admission

OLVG list n developed by clinician n 3000 reasons for ICU admission n 1390 OLVG list n developed by clinician n 3000 reasons for ICU admission n 1390 used in first 24 hours of stay • 3600 patients since 2000 n based on ICD 9 + additional material n List of problems for patient admission n Each reason for admission is described with one label • Labels consist of 1. 8 words on average • redundancy because of spelling mistakes • implicit hierarchy (e. g. many fractures)

AMC list n List of 1460 problems for ICU admission n Each problem is AMC list n List of 1460 problems for ICU admission n Each problem is described using 5 aspects from the DICE terminology: n 2500 concepts (5000 terms), 4500 links • Abnormality (size: 85) • Action taken (size: 55) • Body system (size: 13) • Location (size: 1512) • Cause (size: 255) n expressed in OWL n allows for subsumption & part-of reasoning

Why mapping AMC list $ OLVG list? n allow easy entering of OLVG data Why mapping AMC list $ OLVG list? n allow easy entering of OLVG data n re-use of data in • epidemiology • quality of care assessment • data-mining (patient prognosis)

Linguistic mapping: n Compare each pair of concepts n Use labels and synonyms of Linguistic mapping: n Compare each pair of concepts n Use labels and synonyms of concepts n Heuristic method to discover equivalence and subclass relations Long brain tumor More specific Long tumor than n First round • compare with complete DICE • 313 suggested matches, around 70 % correct n Second round: • only compare with “reasons for admission” subtree • 209 suggested matches, around 90 % correct è High precision, low recall (“the easy cases”)

Using background knowledge n Use properties of concepts n Use other ontologies to discover Using background knowledge n Use properties of concepts n Use other ontologies to discover relation between properties …. …. …. ? …. …. ….

Semantic match Lexical match OLVG problem list DICE aspect taxonomies ? ? ? Given Semantic match Lexical match OLVG problem list DICE aspect taxonomies ? ? ? Given Abnormality taxonomy Action taxonomy Body system taxonomy Location taxonomy Cause taxonomy Implicit matching: property match DICE problem list

Semantic match Taxonomy of body parts Blood vessel is more general Vein Artery is Semantic match Taxonomy of body parts Blood vessel is more general Vein Artery is more general Aorta Lexical match: has location Reasoning: implies Aorta thoracalis dissection Lexical match: has location Dissection of artery Location match: has more general location

Example: “Heroin intoxication” – “drugs overdose” Cause taxonomy Drugs is more general Heroine Lexical Example: “Heroin intoxication” – “drugs overdose” Cause taxonomy Drugs is more general Heroine Lexical match: cause Heroin intoxication Lexical match: abnormality Cause match: has more specific cause Abnormality match: has more general abnormality Abnormality taxonomy Intoxicatie is more general Overdosis Lexical match: cause Drugs overdosis Lexical match: abnormality

Example results • OLVG: Acute respiratory failure DICE: Asthma cardiale • OLVG: Aspergillus fumigatus Example results • OLVG: Acute respiratory failure DICE: Asthma cardiale • OLVG: Aspergillus fumigatus DICE: Aspergilloom • OLVG: duodenum perforation DICE: Gut perforation • OLVG: HIV DICE: AIDS • OLVG: Aorta thoracalis dissectie type B DICE: Dissection of artery abnormality cause abnormality, cause location, abnormality

Ontology mapping using background knowledge Case study 2 Work with Heiner Stuckenschmidt @ VU Ontology mapping using background knowledge Case study 2 Work with Heiner Stuckenschmidt @ VU

Case Study: 1. Map GALEN & Tambis, using UMLS as background knowledge 2. Select Case Study: 1. Map GALEN & Tambis, using UMLS as background knowledge 2. Select three topics with sufficient overlap • • • Substances Structures Processes 3. Define some partial & ad-hoc manual mappings between individual concepts 4. Represent mappings in C-OWL 5. Use semantics of C-OWL to verify and complete mappings

Case Study: verification & derivation UMLS verification & derivation (medical terminology) lexical mapping GALEN Case Study: verification & derivation UMLS verification & derivation (medical terminology) lexical mapping GALEN (medical ontology) Tambis derived mapping (genetic ontology)

Ad hoc mappings: Substances UMLS GALEN Notice: mappings high and low in the hierarchy, Ad hoc mappings: Substances UMLS GALEN Notice: mappings high and low in the hierarchy, few in the middle

Ad hoc mappings: Substances UMLS Tambis Notice different grainsize: UMLS course, Tambis fine Ad hoc mappings: Substances UMLS Tambis Notice different grainsize: UMLS course, Tambis fine

Verification of mappings = UMLS: Chemicals Tambis: Chemical UMLS: Chemicals_ viewed_structurally Tambis: enzyme ? Verification of mappings = UMLS: Chemicals Tambis: Chemical UMLS: Chemicals_ viewed_structurally Tambis: enzyme ? UMLS: Chemicals_ viewed_functionally UMLS: enzyme =

Deriving new mappings UMLS: substance UMLS: Chemicals UMLS: Phenomenon_ or_process Galen: Chemical. Substance UMLS: Deriving new mappings UMLS: substance UMLS: Chemicals UMLS: Phenomenon_ or_process Galen: Chemical. Substance UMLS: Organic. Chemical =

Ontology mapping: a way out of the medical tower of Babel? Ontology mapping: a way out of the medical tower of Babel?

“Conclusions” n Ontology mapping is (still) hard & open n Many different approaches will “Conclusions” n Ontology mapping is (still) hard & open n Many different approaches will be required: • • • linguistic, structural statistical semantic … n Currently no roadmap theory on what's good for which problems

Challenges n roadmap theory n run-time matching n “good-enough” matches n large scale evaluation Challenges n roadmap theory n run-time matching n “good-enough” matches n large scale evaluation methodology n hybrid matchers (needs roadmap theory)

Ontology mapping: a way out of the medical tower of Babel? ? Ontology mapping: a way out of the medical tower of Babel? ?