Datalogand Data Integration Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 10, 2005
An Important Set of Questions
Reasoning about Queries and Views
Let’s Go Back a Few Weeks… Domain Relational Calculus
A Similar Logic-Based Language: Datalog
Datalog. Terminology
Datalogin Action
Datalogis Relationally Complete
A Query We Can’t Answer in RA/TRC/DRC… Recall our example of a binary relation for graphs or trees (similar to an XML Edge relation): edge(from, to) If we want to know what nodes are reachable: reachable(F, T, 1) : - edge(F, T) reachable(F, T, 2) : - edge(F, X), edge(X, T) reachable(F, T, 3) : - reachable(F, X, 2), edge(X, T) distance 1 dist. 2 dist. 3 But how about all reachable paths? (Note this was easy in XPath over an XML representation -- //edge)
Recursive. Datalog. Queries
Our Query in RA + while (inflationary semantics, no negation)
A Special Type of Query: Conjunctive Queries
Example of Containment
Wrapping up Datalog …
A Problem
Building a Data Integration System
Typical Data Integration Components
Typical Data Integration Architecture
Challenges of Mapping Schemas
Different Aspects to Mapping Let’s see one influential approach to schema matching…
The LSD (Learning Source Descriptions) System
Example
LSD’s Multi-Strategy Learning
Training the Learners Name Learner
Applying the Learners
Putting It All Together: LSD System Training Phase Matching Phase Mediated schema Source schemas Data listings Training data for base learners L 1 L 2 Lk Domain Constraints User Feedback Constraint Handler Mapping Combination
Mappings between Schemas
A Few Mapping Examples Cust. ID Cust. Name Penn. ID Emp. Name 1234 46732 Smith, J. John Smith
Two Important Approaches
TSIMMIS
XML vs. Object Exchange Model
Queries in TSIMMIS
Query Answering in TSIMMIS
A Wrapper Definition in MSL book title The union of Get. Book’s results is unioned with others to form the view Mediator() author
How to Answer the Query
Query Composition with Views We find all views that define book with author and title, and we compose the query with each: book define function Get. Book($x AS xsd: string) as book { for $b in sql(“Amazon. DB”, author title “select * from book where author=‘” + $x + “’”) return {$b/title} {$x} } for $b in Mediator()/book where $b/title/text() = “DB 2 UDB” and $b/author/text() = “Chamberlin” return $b
Matching View Output to Our Query’s Conditions book title author
The Final Step: Unfolding
Virtues of TSIMMIS
Limitations of TSIMMIS’ Approach