Скачать презентацию DB Lunch Berkeley 10 28 05 Semantic Скачать презентацию DB Lunch Berkeley 10 28 05 Semantic

d651cf573282137ff81b6e4e8a144cd6.ppt

  • Количество слайдов: 42

DB Lunch @ Berkeley 10. 28. 05 Semantic Interoperability in Large Scale Heterogeneous Networks DB Lunch @ Berkeley 10. 28. 05 Semantic Interoperability in Large Scale Heterogeneous Networks Philippe Cudré-Mauroux, EPFL Joint work with: Karl Aberer (advisor @ EPFL) Manfred Hauswirth (Semantic Gossiping) T. van Pelt, L. Zhou & A. Feher (Implementation) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Overview 1. Motivation • Picture Sharing in Decentralized Settings 2. Decentralized Data Integration 1. Overview 1. Motivation • Picture Sharing in Decentralized Settings 2. Decentralized Data Integration 1. 2. 3. 4. Peer Data Management Systems Probabilistic Message-passing Aspects of self-organization Studying semantic interoperability in the large 3. Applications 1. Grid. Vine 2. Pic. Shark 4. Conclusions The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

1. Motivation: Picture Sharing • Profusion of Digital Images – Variety of powerful devices 1. Motivation: Picture Sharing • Profusion of Digital Images – Variety of powerful devices – gigabytes of pictures is the new norm • Most of the images are kept local • Some are shared – Mostly point-to-point – Primitive search capabilities MMS SMTP HTTP The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Opportunity • More and more software use metadata to organize images locally <? xpacket Opportunity • More and more software use metadata to organize images locally 2001 -12 -19 T 18: 49: 03 Z 2001 -12 -19 T 20: 09: 28 Z John Doe … – (Semi) Structured metadata (e. g. , XML, PSA) – Ontological metadata (e. g. , RDF, XMP) – Type-based metadata (e. g. , Win. FS) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Hurdle: Metadata Heterogeneity • Why not taking advantage of those metadata in a distributed Hurdle: Metadata Heterogeneity • Why not taking advantage of those metadata in a distributed setting? X Syntactic discrepancies Image. GUID c. Date A 0657 B 25 05. 08. 04 109 E 7 A 25 05. 08. 04 VS 05/08/2004 X Semantic heterogeneity • All the aforementioned standards are extensible • Shared representation is not enough Width VS The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities Length-Y 5

Beyond Keyword Search Þ searching semantically richer objects in large scale heterogeneous networks <xap: Beyond Keyword Search Þ searching semantically richer objects in large scale heterogeneous networks 2001 -1219 T 18: 49: 03 Z 2001 -1219 T 20: 09: 28 Z date? 05/08/2004 ? ? ? Jan 1, 2005 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

2. Decentralized Semantics • Traditional database techniques (e. g. , LAV/GAV) rely on centralized 2. Decentralized Semantics • Traditional database techniques (e. g. , LAV/GAV) rely on centralized schemas to integrate data sources Date m(Date) = my. Date m(Date) = your. Date • Not applicable to our context – Scale (upper ontologies? ) – Churn – Autonomy • How can we foster semantic interoperability in decentralized settings? The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Semantic Interoperability Q 1= <GUID>$p/GUID</GUID> FOR $p IN /Photoshop_Image WHERE $p/Creator LIKE Semantic Interoperability Q 1= $p/GUID FOR $p IN /Photoshop_Image WHERE $p/Creator LIKE "%Robi%" Photoshop (own schema) 178 A 8 CD 8865 Robinson T 12 = $fs/GUID Tunbridge Wells Royal Council $fs/Author/Display. Name … FOR $fs IN /Win. FSImage Q 2= $p/GUID FOR $p IN T 12 WHERE $p/Creator LIKE "%Robi%" Win. FS (known schema) 178 A 8 CD 8866 Henry Peach Robinson Photographer Tunbridge Council Extending semantic interoperability techniques to decentralized settings The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

2. 1 Peer Data Management Systems date? <es: c. Date> 05/08/2004 </es: c. Date> 2. 1 Peer Data Management Systems date? 05/08/2004 2001 -1219 T 18: 49: 03 Z 2001 -1219 T 20: 09: 28 Z m te te Da Da : c F: es y. RD article weather m xa y. R p: DF M : D od at ify e Da te es: c. Date xap: Create. Date Jan 1, 2005 • Local pairwise mappings – Peer Data Management Systems (PDMS) • Pairwise mappings overcome global schema heterogeneity – Transitive closures on mapping operations The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Problem: Precision/Recall Tradeoff • Semantic Query routing – To whom shall I forward a Problem: Precision/Recall Tradeoff • Semantic Query routing – To whom shall I forward a query posed against my local schema? • Some (most) mappings will be (partially) faulty – Low expressive power of mappings – Automatic schema alignment techniques – Granularity of conceptualizations… • Local query resolution – Low recall • Flooding (PDMS) – Low precision • Standard deductive integration is not sufficient – Uncertainty on mappings and conceptualizations Þ abductive reasoning (on transitive closures of mappings) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

2. 2. Probabilistic Message Passing Link-based analysis of the PDMS: m 0 - Mapping 2. 2. Probabilistic Message Passing Link-based analysis of the PDMS: m 0 - Mapping Cycles - Parallel Paths m 5 m 3 Semantics as global agreement q VS m 3(m 4(m 0(q))) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities m 1 m 4 m 2

Computing a Marginal for one cycle unknown observed • P(m 0, m 1, m Computing a Marginal for one cycle unknown observed • P(m 0, m 1, m 2, m 3, f 0) = P(m 0) P(m 1) P(m 2) P(m 3) P(f 0 | m 0, m 1, m 2, m 3, ) • P(m 0| f 0)= m 1, m 2, m 3 P(m 0, m 1, m 2, m 3 , f 0) P(f 0)-1 • But: feedbacks on different cycles are correlated – Need to express a global probabilistic model for the mapping graph The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

A Brief Intro to Factor-Graphs • g(x 1, x 2, x 3, x 4) A Brief Intro to Factor-Graphs • g(x 1, x 2, x 3, x 4) = f. A(x 1, x 2)f. B(x 2, x 3, x 4) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Deriving PDMS Factor-Graphs The National Centres of Competence in Research are managed by the Deriving PDMS Factor-Graphs The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

PDMS Factor-Graphs • Cyclic graph – Junction Tree? Clustering / Stretching of variables? • PDMS Factor-Graphs • Cyclic graph – Junction Tree? Clustering / Stretching of variables? • Not applicable (decentralization) – Iterative Sum-Product • Approximate results • How to perform iterative sum-product by message passing on the mapping graph? – Message passing in factor graph does not correspond to connectivity of mapping graph – We want to rely on decentralized computations only • Locality VS Globality of nodes in the factor graph – Mappings: local – Feedback factor: common, global knowledge – Observed feedback variables: neighborhood The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Embedded Message-Passing (1) The National Centres of Competence in Research are managed by the Embedded Message-Passing (1) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Embedded Message-Passing (2) The National Centres of Competence in Research are managed by the Embedded Message-Passing (2) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Sending Messages in the Mapping Graph • Message-Passing Schedules – Periodic – Lazy (piggybacking Sending Messages in the Mapping Graph • Message-Passing Schedules – Periodic – Lazy (piggybacking on query forwarding) • No message overhead The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Implemented System • Schemas – Import from OWL (Web Ontology Language) • Mappings – Implemented System • Schemas – Import from OWL (Web Ontology Language) • Mappings – – Knowledge. Web Ontology Alignment API Import from RDF/XML Automated on-the-fly creation Comparison to standard alignments Þ Automatic derivation of quality measures P(m=correct | {F}) for the mappings using iterative message-passing Þ Per-Hop Forwarding Behaviors (Semantic Gossiping) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Some (Preliminary) Results: Convergence (undirected example graph, prior 0. 7 delta 0. 1) The Some (Preliminary) Results: Convergence (undirected example graph, prior 0. 7 delta 0. 1) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Impact Of Cycle Length (simple cycle, prior 0. 5) The National Centres of Competence Impact Of Cycle Length (simple cycle, prior 0. 5) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Fault-tolerance (faulty links) (undirected example graph, prior 0. 8 delta 0. 1) The National Fault-tolerance (faulty links) (undirected example graph, prior 0. 8 delta 0. 1) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Preliminary Results: EON (Alignment contest) • Worst-case scenario: no prior knowledge • Set of Preliminary Results: EON (Alignment contest) • Worst-case scenario: no prior knowledge • Set of 6 schemas on bibliographic data (approx. 30 -40 attributes) • 396 generated attribute mappings (84 incorrect) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

2. 3. Semantic Gossiping • Selectively reformulate queries through mapping links – Semantic disances 2. 3. Semantic Gossiping • Selectively reformulate queries through mapping links – Semantic disances • Cycles analysis ( ) πTitle Creature=Joe (R 5) • Results analysis X – Syntactic distance • Lost predicates πTitle Creator=Joe (R 3) πTitle Author=Joe (R 2) πTitre Auteur=Joe (R 1) X πTitle Creator=Joe (R 4) Author=Joe (R 4)) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Self-Organization • Two types of self-organization – Static network • Self-organizing dissemination of queries Self-Organization • Two types of self-organization – Static network • Self-organizing dissemination of queries ( ) – Dynamic network • Self-organizing network of mappings • Idea: – – Quality evaluation of mappings through Semantic Gossiping Drop low quality links Reorganized network leads to different quality evaluation Dynamic network changes self-organizing, self-referential semantic network The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Some Results (1) Sensitivity to TTL (cycle analysis only, 25 schemas, 4 concepts) The Some Results (1) Sensitivity to TTL (cycle analysis only, 25 schemas, 4 concepts) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Some Results (2) Scalability (results analysis only, 4 concepts, TTL=3, misclassification rate=0. 1, 2 Some Results (2) Scalability (results analysis only, 4 concepts, TTL=3, misclassification rate=0. 1, 2 documents/peer on avg. ) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

2. 4. Semantic Interoperability in the Large • Do we have enough (good) mappings? 2. 4. Semantic Interoperability in the Large • Do we have enough (good) mappings? • Modeling semantic interoperability: Schema-to-Schema Graph – – Logical model Directed Weighted Redundant • The semantic connectivity graph – Idea: as for physical network analyses, define a connectivity layer – Unweighted, non-redundant version of the Schema-to-schema graph – Observation: • Peers in a set Ps are semantically interoperable iff Ss is strongly connected, with Ss {s | p Ps, p s} The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Analyzing Semantic Interoperability in the Large • Analyzing semantic interoperability in large-scale, decentralized networks Analyzing Semantic Interoperability in the Large • Analyzing semantic interoperability in large-scale, decentralized networks – Percolation theory for directed graphs – Based on recent graph-theoretic frameworks – Random graphs with specific degree distributions pjk, clustering coefficients cc and bidirectionality coefficient bc • Necessary condition for semantic interoperability in the large: j, k (jk-j(bc+cc)-k)pjk ≥ 0 • Excellent approximations of the size of semantically interoperable clusters in the graph • Analysis: Sequence Retrieval System The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

3. Applications 1. Grid. Vine • Self-organizing semantic overlay network 2. Pic. Shark • 3. Applications 1. Grid. Vine • Self-organizing semantic overlay network 2. Pic. Shark • Self-organizing middleware to export pictures and create mappings The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

3. 1 Grid. Vine • Building large-scale semantic systems – Self-organizing semantic overlay network 3. 1 Grid. Vine • Building large-scale semantic systems – Self-organizing semantic overlay network The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Semantic Mediation Layer Overlay Layer Correlated / Uncorrelated “Physical” layer The National Centres of Semantic Mediation Layer Overlay Layer Correlated / Uncorrelated “Physical” layer The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Features • Based on the P-Grid P 2 P structure – Distributed Hash Table Features • Based on the P-Grid P 2 P structure – Distributed Hash Table developed at EPFL – Self-organized, scalable, decentralized – Resolves key-based searches in O (log(n)) even for unbalanced trees • Semantic Web compliant – RDF triples, RDFS schemas, OWL mappings • Structured searches – RDQL queries • Semantic Gossiping – Fosters semantic interoperability The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Grid. Vine: Annotating Content The National Centres of Competence in Research are managed by Grid. Vine: Annotating Content The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Decentralized Query Resolution: Overview The National Centres of Competence in Research are managed by Decentralized Query Resolution: Overview The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

3. 2 Pic. Shark • Where do the translation links come from? • Middleware 3. 2 Pic. Shark • Where do the translation links come from? • Middleware for sharing semi-structured metadata attached to pictures and creating translation links 60 moments Features Extractor Pic. Shark PSP XMP Win. FS Insert Metadata Extractor Information Tracker The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities Retrieve (Distributed) Hashtable (e. g. , Grid. Vine)

Features • Self-Organization of mappings – Based on low-level features extracted from • Picture Features • Self-Organization of mappings – Based on low-level features extracted from • Picture (color moment, textures) • Structured Metadata (lexicographical analysis) • Self-Organization of annotations – Probabilistic propagation of annotations between similar individuals • Self-Organization of query propagation – Schema distance based on probabilistic subsumption – Propagation within a certain diameter Þ Driven by user interaction Þ Scalable • • • Computationally expensive operations are local at the peers Only simple in-network operations (look-ups) (on-going) collaborative effort with Microsoft Research Asia The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 37

Pic. Shark Prototype The National Centres of Competence in Research are managed by the Pic. Shark Prototype The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

4. Conclusions • Fundamental issue: Interoperability in large scale (semi) structured environments – Content 4. Conclusions • Fundamental issue: Interoperability in large scale (semi) structured environments – Content Sharing – Information search – Semantic Web? • Traditional techniques are not sufficient – Scale – Autonomy – Uncertainty Þ Self-organizing, decentralized stochastic processes Þ Data Indexation Þ Data Integration Þ Query dissemination The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Some References (1) Semantic Gossiping A Framework for Semantic Gossiping Karl Aberer, Philippe Cudré-Mauroux, Some References (1) Semantic Gossiping A Framework for Semantic Gossiping Karl Aberer, Philippe Cudré-Mauroux, Manfred Hauswirth SIGMOD Record, 31(4), December 2002. The Chatty Web: Emergent Semantics through Gossiping Karl Aberer, Philippe Cudré-Mauroux, Manfred Hauswirth, International World Wide Web Conference (WWW 03). Probabilistic Message-Passing in Peer-Data Management Systems Philippe Cudré-Mauroux, Karl Aberer, and Andras Feher International Conference on Data Engineering (ICDE 06). Self-Organizing Semantics Start making sense: The Chatty Web approach for global semantic agreements, Karl Aberer, Philippe Cudré-Mauroux, Manfred Hauswirth, Journal of Web Semantics, 1 (1), December 2003. Emergent Semantics Principles and Issues Karl Aberer, Philippe Cudré-Mauroux and Aris M. Ouksel (editors) Tiziana Catarci Mohand-Said Hacid, Arantza Illarramendi, Vipul Kashyap, Massimo Mecella, Eduardo Mena, Erich J. Neuhold, Olga De Troyer, Thomas Risse, Monica Scannapieco, Fèlix Saltor, Luca de Santis, Stefano Spaccapietra, Steffen Staab and Rudi Studer International Conference on Database Systems for Advanced Applications (DASFAA 04). The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Some References (2) Semantic Interoperability In the Large A Necessary Condition For Semantic Interoperability Some References (2) Semantic Interoperability In the Large A Necessary Condition For Semantic Interoperability In The Large Philippe Cudré-Mauroux and Karl Aberer International Conference on Ontologies, Data. Bases, and Applications of Semantics (ODBASE 04). Analyzing Semantic Interoperability in Bioinformatic Database Networks Philippe Cudré-Mauroux, Julien Gaugaz, Adriana Budura and Karl Aberer Semantic Network Analysis (SNA 05). Grid. Vine: Building Internet-Scale Semantic Overlay Networks Karl Aberer, Philippe Cudré-Mauroux, Manfred Hauswirth and Tim van Pelt International Semantic Web Conference (ISWC 04). Semantic Overlay Netwoks (tutorial) Karl Aberer and Philippe Cudré-Mauroux International Conference on Very Large Data Bases (VLDB 05). … more references at http: //lsirpeople. epfl. ch/pcudre/ The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities

Questions? The National Centres of Competence in Research are managed by the Swiss National Questions? The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities