Découverte de mappings entre schemas les différentes

Скачать презентацию Découverte de mappings entre schemas les différentes

49134d822f5936ed038321fd9b9fa8b0.ppt

Количество слайдов: 18

Découverte de mappings entre schemas : les différentes approches Schema Matching : Different Approaches Khalid Saleem LIRMM 1

Schema and Ontology n Schema represents Database Community n n Ontology represents the AI Community n n Schemas often do not provide explicit semantics of their data (ER, XML document schema). Ontologies are logical systems that themselves obey some formal semantics. Designed to be interpreted by computers for reasoning (OWL) Schemas and Ontologies are similar in the sense that n n Both provide a vocabulary of terms that describes a domain Both constraint the meaning of terms used in vocabulary (Hierarchy/ relations) XML Schema RDF Schema OWL 2

Schema vs Ontology : examples branch is-part-of tree class-def animal %plants are a class that is disjoint from animals class-def plant subclass-of NOT animal %it is necessary but not sufficient for a tree to be a plant: class-def tree subclass-of plant %branches are PART OF trees class-def branch slot-constraint is-part-of has-value tree %it is necessary and sufficient for a carnivore to be an animal: class-def defined carnivore subclass-of animal slot-constraints eats value-type animal %herbivores eat only plants OR part of plants class-def defined herbivore subclass-of animal slot-constraint eats value-type plant OR (slot-constraint is-part-of has-value plant) XML DAML +OIL 3

Match n Takes two schemas/ontologies as input and produces a mapping between elements of the two schemas that correspond semantically to each other Books Source A price book-title Books Source B author-name listed-price title complex match 1 -1 match 26, 60 Harry Potter J. K. Rowling 11, 50 Marie Des Intrigues Juliette Benzoni a-fname a-lname 16, 50 24 Nous Les Dieux Pompei Bernard Werber Robert Harris 4

Schema Matching vs Ontology Matching n n Schema matching is usually performed with the help of techniques trying to guess the meaning encoded in the schemas Ontology matching try to exploit knowledge explicitly encoded in the ontologies. ` In real world applications : Solutions from both domains are mutually beneficial 5

Application Domains n Traditional (Static) n n n Schema Integration Data warehousing E-commerce Catalogue Integration New Frontiers (Dynamic) n n Semantic Query Processing Agent Communication Web Services Integration P 2 P Databases 6

Basic Classification of Matchers n n n Schema vs Data Instance Element vs Structure Language vs Constraint n n [RB 01] String based : Prefix, Suffix e. g. auth: author Tokenization, Lemmatization, Eliminition [GSY 04] Tool_Kit : (Tool, Kit), Kits: Kit, Is. Related. To : Related Data Types, Value domain e. g. 1. . 12 : month Match Cardinalities - 1: 1, 1: n, n: m (Tel Res, Other) : (Tel Day, Evening, Night) n Auxiliary Information n Global Schema, Dictionaries, Thesauri, Previous Match Decisions, User Input 7

Basic Classification of Matchers n Structure Level Techniques n n n [SE 05] Graph Matching Children Leaves Relations Taxonomy based Techniques e. g if super concept is same then sub concepts are same or vice versa n Model Based n ER, XML or XML schema, OWL, OO etc. Combinational Matchers [RB 01] n Hybrid Matcher n Multiple/Composite Matcher 8

Match Dimensions [SE 05] For Match Algorithms designing We need the knowledge for its utilization i. e. Dimensions n Input of the Algorithm n n Characteristics of the Matching Process n n n Data or Schema, Element level or Structure Level Require exact or approximate matching Performance over quality Output of the Algorithms n Output is a graded result, or part of a set of match algorithms which are combined together for a map result 9

Existing Matching Tools n n n n n Cupid [MBR 01] COMA (COMA++) [ADMR 05] Similarity Flooding Sem. Int Artemis DIKE Trans. Scm Auto. Med Charlie [TBBT 04] Ontologies Specific n NOM/ QOM n OLA n Anchor-PROMPT n S-Match [GSY 04] n HICAL n SKAT 10

Matching Tools continued Machine Learning n GLUE (LSD, CGLUE) [DMDH 02] n Automatch n These tools do not completely fulfil the requirements for large scale schema matching because n n Not fully automated Emphasise less on search space optimisation 11

Our Approach Motivation : n n n Large Scale Scenario Peer-to-peer Information Systems over the XML Web Our Schema Matching and Integration Approach Tree Mining Techniques n n Name Matcher n Element Level Matching n Structure Level Matching b a p n n Search sub-trees b t w n p b f n t i g t b w d a a=w b=o f=d p n r h a: author b: book d: detail f: information g: general h: birth i: isbn n: name o: own-books p: publisher r: price t: title w: writer o w f t n h p t i n 12

Tree Mining Approach b n 0 [0, 5] Inspired from the tree mining algorithms and data structures based on node scope values (calculated by depth first pre-order traversal) Top-down [Z 02] n n a n 1 [1, 2] author book p publisher n n 2 [2, 2] t n name title n 5 [5, 5] n 3 [3, 4] n 4 [4, 4] Our work extends these data structures for schema matching and integration process for handling large sets of XML schema trees. Employs a) b) Element level Name Matcher (same node label or synonym) n Cluster similar/synonym labels Utilize the node scope values properties to extract semantics out of structure n E. g. node with label name n 2[2, 2] is a descendent of node with label author n 1[1, 2] and not of node with label publisher n 3[3, 4] verified using descendent test Descendent Node Check : Scope of Node x is [X, Y] and Scope of Descendent Node xd [Xd, Yd] then Xd>X and Yd<=Y 13

Tree Mining Approach … n Data Structure used n n n continued Label List : Sorted list of all node labels in the forest of XML schema trees x. Grid : Matrix in which each row represent each participating XML tree and each column represents the corresponding node label. Each cell contains the scope values, parent node number and mapping information. Output n n Creation of a Mediated Schema Tree , from the given forest of participating XML schema trees. Generation of Mapping Information between participating schema trees and the mediated schema tree 14

Tree Mining Approach … continued Sm S 1 S 2 S 3 S 4 Mapping Information is the column number of node 15

Conclusion n n Element level Name and Linguistic Matching with the support of thesaurus is an integral part of every Match system. With systems moving towards schema/ontology based manipulation, and lack of global schemas or previous matching results, Structure Level matching is equally important for making out the semantics. Peer-to-peer environment requires new methods to be exploited for performance and quality mapping i. e. integration of Tree Mining techniques for matching purposes and search space optimisation. Machine Learning algorithms can be beneficial in the P 2 P environment in later stages when training examples have been created from instance data, provided the target domain remains the same. 16

References n n n [AH 04] Antoniou G. , Harmelen F. A Semantic Web Primer, The MIT Press, 2004 [ADMR 05] Aumuller D. , Do H. H. , Massmann S. , and Rahm E. Schema and ontology matching with COMA++. In Proceedings of the International Conference on Management of Data (SIG-MOD), 2005 [BR 04] Bellahsène Z. and Roantree M. (2004) Querying Distributed Data in a Superpeer based Architecture. DEXA 2004. [BMP 04] Bernstein PA. , Melnik S. , Petropoulos M. and Quix C. (2004) Industrial. Strength Schema Mapping. SIGMOD Record, Vol. 33, No. 4, December 2004 [DMDH 02] Doan AH. , Madhavan J. , Domingos P. and Halvey A. (2002) Learning to Map Ontologies on the Semantic Web. WWW 2002 [MBR 01] Madhavan J. , Bernstein PA. and Rahm E. (2001) Generic Schema Matching with Cupid. VLDB 2001. [RB 01] Rahm E. and Bernstein PA (2001) A Survey of Approaches to Automatic Schema Matching. VLDB Journal 2001 : 10(4): 334 -3503 [SE 05] Shvaiko P. and Euzenat J. (2005) A Survey of Schema-based Matching Approaches. Journal on Data Semantics, 2005. [TBBT 04] Tranier J. , Baraer R. , Bellahsene Z. and Teisseire M (2004) Where’s Charlie: Family Based Heuristics for Peer-to-Peer Schema Integration. IDEAS 2004, 227 -235 [Z 02] Zaki MJ (2002) Efficiently Mining Frequent Trees in a Forest. 8 th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining. July 2002 http: //www. w 3. org/TR/daml+oil-reference http: //www. doc. ic. ac. uk/automed/ 17

Thank you 18