Скачать презентацию Integrating Multiple Data Sources using a Standardized XML Скачать презентацию Integrating Multiple Data Sources using a Standardized XML

22a73a36c4e8131196a23e496397c2e1.ppt

  • Количество слайдов: 27

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence University of Manitoba Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence University of Manitoba umlawren@cs. umanitoba. ca Supervisor: Dr. Ken Barker TRLabs - Winnipeg Page 1

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Outline èIntroduction, Motivation, Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Outline èIntroduction, Motivation, and Background èIntegration architecture components èIntegration architecture èExample integration èApplications to the WWW èFuture work and conclusions èDemonstration of Unity Page 2

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Introduction èIntegration of Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Introduction èIntegration of data is required when accessing multiple databases within an organization or on the WWW. èOur focus is automatically combining database schema using schema integration. èSchema integration requires knowledge of data semantics and use of metadata. Page 3

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Motivation èOrganizations have Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Motivation èOrganizations have several database systems which must interoperate. èUsers often access multiple Web databases whose knowledge must be integrated and presented in a useful form. èData warehouses and OLAP systems require data semantics to be understood and data to be cleansed and summarized. Page 4

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Background èSchema integration Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Background èSchema integration involves combining diverse database schema into an integrated view by resolving conflicts. èSchema conflicts include naming, structural, and semantic conflicts. èSchema integration is required for database interoperability, but it is currently a manual process. Page 5

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence MDBS Architecture Global Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence MDBS Architecture Global Transactions Global Transaction Manager (GTM) • processes global transactions • insures information in all LDBSs is consistent • submits subtransactions to the GTSs for each LDBS GTM subtransactions GTS LDBS GTS Global Transaction Servers (GTSs) GTS • one for each LDBS • converts subtransactions from the GTM into a form usable by the LDBS and vice versa LDBS Local Database Systems (LDBSs) Local Transactions • databases combined into MDBS • unchanged as still process local transactions Page 6

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Previous Work èResearch Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Previous Work èResearch systems: u integrating systems by logical rules (Sheth) u defining global dictionaries (Castano) u Carnot Project using the Cyc knowledge base èIndustrial systems and standards: u Metadata Interchange Specification (MDIS) u XML, Biz. Talk, E-commerce portals Page 7

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Architecture Objective èThe Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Architecture Objective èThe objective of our architecture is to provide a system for automatically integrating diverse relational schemas into a multidatabase èDesirable properties: u individual mappings - information sources integrated one-at-a-time and independently u global view constructed for query transparency u handles schema conflicts - including semantic, structural, and naming conflicts u automated global integration - global view constructed efficiently and automatically Page 8

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence The Idea èThe Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence The Idea èThe major idea is that schema conflicts can be resolved if we: u eliminate all naming conflicts u define a language capable of determining schema equivalence and performing transformations èWith these two properties, schema conflicts can be resolved automatically at the global level Page 9

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Architecture Components: The Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Architecture Components: The Global Dictionary èA global dictionary (GD) provides standardized terms to capture data semantics. u Hierarchy of terms related by IS-A or Has-A links u Contains base set of common database concepts, but new concepts can be added èA GD term is a single, unambiguous semantic definition. u Several GD entries for a single English word are required if the word has multiple definitions. Page 10

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Architecture Components: Using Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Architecture Components: Using the Global Dictionary èGD terms are used to build semantic names to describe the semantics of schema elements. èSemantic names have the form: u semantic name = “[“CT [[; CT] | [, CT]] “]” CN u CT = context term, CN = concept name u each CT and CN is a single term from the GD èSemantic names are included in specifications describing a data source. Page 11

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Architecture Components: X-Specs Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Architecture Components: X-Specs èDatabase metadata and semantic names are combined into specifications called X-Specs: u stored and transmitted using XML u contains information on a relational schema u organized into database, table, and field levels u stores semantic names to describe and integrate schema elements Page 12

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Architecture Components: Integrating Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Architecture Components: Integrating X-Specs èEach database to be integrated is described using a X-Spec. èIdentical concepts in different databases are identified by similar semantic names. èConcepts with identical (or hierarchially related) semantic names are combined regardless of their physical representation in the individual databases. Page 13

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integration Architecture èOur Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integration Architecture èOur integration architecture consists of two separate phases: u capture process: X-Specs are constructed for each data source independently u integration process: X-Specs are combined using the integration algorithm which matches semantic names using the global dictionary Page 14

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integration Architecture: The Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integration Architecture: The Capture Process èCapture process involves: u automatically extracting the schema information and metadata using a specification editor u assigning semantic names to each schema element (tables and fields) to capture their semantics Page 15

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integration Architecture: The Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integration Architecture: The Capture Process Relational Schema Automatic Extraction Specification Editor Global Dictionary X-Spec DBA Lookup of terms Page 16

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integration Architecture: The Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integration Architecture: The Integration Process èIntegration process involves: u automatically identifying identical concepts by matching semantic names u constructing a global view of database concepts consisting of a hierarchy of concept terms u resolving structural differences during query generation and submission (e. g. a concept may be represented as a table in one database and a field (attribute) in another) Page 17

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integration Architecture: The Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integration Architecture: The Integration Process Client …………. Client Integration Site Subtransactions X-Spec RDBMS ……. . RDBMS Page 18

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integration Architecture Benefits Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integration Architecture Benefits èThe are: benefits of the two phase architecture u Dynamic integration: schemas integrated as needed u X-Specs are constructed only once and independent of each other u Automatic conflict resolution by integrating based on semantic name rather than physical structure u Users are isolated from system names and organization by querying through a global view using semantic names for concepts Page 19

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integration Example èTwo Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integration Example èTwo claims databases to be integrated: u ABC Company: Claims_tb(claim_id, claimant, net_amount, paid_amount) u XYZ Company: T_claims(id, customer, claim_amt), T_payments(cid, pid, amount) èFirst step is to construct X-Specs for each database. Page 20

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integration Example: ABC Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integration Example: ABC Database X-Spec Page 21

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integration Example: XYZ Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integration Example: XYZ Database X-Spec Page 22

Integration Example: Integrated View èGlobal Integrating Multiple Data Sources using a Standardized XML Dictionary Integration Example: Integrated View èGlobal Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence view after integration: u [Claim] ï Id ï Net amount ï [Customer] s name ï [Payment] s s id amount Page 23

Integration Example: Discussion èImportant Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Integration Example: Discussion èImportant Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence points: u system and field names are not presented to the user who queries based on semantic names u database structure is not shown to the user u different physical representations for the same concept are combined (e. g. payment (attribute) in ABC with payment table in XYZ database) u hierarchially related concepts (customer vs. claimant) are combined based on their IS-A relationship in the global dictionary Page 24

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Applications to the Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Applications to the WWW èIntegrating diverse data sources is involved in constructing a data warehouse and other operational systems. èThe WWW is a diverse organizations of databases which users access. èAutomatically integrating web data sources by a browser or portal reduces query complexity and integration of results for the user. Page 25

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Conclusions èAutomatic integration Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Conclusions èAutomatic integration of database schema is possible by using a global dictionary of terms and constructing semantic names for schema elements. èIntegration of data sources has applications to the WWW and construction of data warehouses. Page 26

Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Future Work èThe Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Future Work èThe integration architecture is evolving with standards on XML and captures metadata information in XML documents. èThe system is being tested on sample problems, and a query mechanism is work-inprogress. èWe are refining a prototype of the system called Unity. Page 27