d651cf573282137ff81b6e4e8a144cd6.ppt
- Количество слайдов: 42
DB Lunch @ Berkeley 10. 28. 05 Semantic Interoperability in Large Scale Heterogeneous Networks Philippe Cudré-Mauroux, EPFL Joint work with: Karl Aberer (advisor @ EPFL) Manfred Hauswirth (Semantic Gossiping) T. van Pelt, L. Zhou & A. Feher (Implementation) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Overview 1. Motivation • Picture Sharing in Decentralized Settings 2. Decentralized Data Integration 1. 2. 3. 4. Peer Data Management Systems Probabilistic Message-passing Aspects of self-organization Studying semantic interoperability in the large 3. Applications 1. Grid. Vine 2. Pic. Shark 4. Conclusions The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
1. Motivation: Picture Sharing • Profusion of Digital Images – Variety of powerful devices – gigabytes of pictures is the new norm • Most of the images are kept local • Some are shared – Mostly point-to-point – Primitive search capabilities MMS SMTP HTTP The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Opportunity • More and more software use metadata to organize images locally <? xpacket begin='' id='W 5 M 0 Mp. Cehi. Hzre. Sz. NTczkc 9 d'? > <x: xapmeta xmlns: x='adobe: ns: meta/'> <rdf: RDF xmlns: rdf= 'http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#'> <rdf: Description about='' xmlns: xap='http: //ns. adobe. com/xap/1. 0/'> <xap: Create. Date>2001 -12 -19 T 18: 49: 03 Z</xap: Create. Date> <xap: Modify. Date>2001 -12 -19 T 20: 09: 28 Z</xap: Modify. Date> <xap: Creator> John Doe </xap: Creator> </rdf: Description> … – (Semi) Structured metadata (e. g. , XML, PSA) – Ontological metadata (e. g. , RDF, XMP) – Type-based metadata (e. g. , Win. FS) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Hurdle: Metadata Heterogeneity • Why not taking advantage of those metadata in a distributed setting? X Syntactic discrepancies Image. GUID c. Date A 0657 B 25 05. 08. 04 109 E 7 A 25 05. 08. 04 VS <es: c. Date> 05/08/2004 </es: c. Date> X Semantic heterogeneity • All the aforementioned standards are extensible • Shared representation is not enough <rdf: Property rdf: ID="width"> <rdfs: label>Width</rdfs: label> <rdfs: sub. Property. Of rdf: resource="#length"/> </rdf: Property> VS The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities <rdf: Property rdf: ID=“Length-Y"> <rdfs: label>Length-Y</rdfs: label> <rdfs: sub. Property. Of rdf: resource="#length"/> </rdf: Property> 5
Beyond Keyword Search Þ searching semantically richer objects in large scale heterogeneous networks <xap: Create. Date>2001 -1219 T 18: 49: 03 Z</xap: Create. Date> <xap: Modify. Date>2001 -1219 T 20: 09: 28 Z</xap: Modify. Date> date? <es: Dof. Creation> 05/08/2004 </es: Dof. Creation> ? ? ? <my. RDF: Date> Jan 1, 2005 </my. RDF: Date> The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
2. Decentralized Semantics • Traditional database techniques (e. g. , LAV/GAV) rely on centralized schemas to integrate data sources Date m(Date) = my. Date m(Date) = your. Date • Not applicable to our context – Scale (upper ontologies? ) – Churn – Autonomy • How can we foster semantic interoperability in decentralized settings? The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Semantic Interoperability Q 1= <GUID>$p/GUID</GUID> FOR $p IN /Photoshop_Image WHERE $p/Creator LIKE "%Robi%" Photoshop (own schema) <Photoshop_Image> <GUID>178 A 8 CD 8865</GUID> <Creator>Robinson</Creator> <Subject> T 12 = <Bag> <Photoshop_Image> <Item> <GUID>$fs/GUID</GUID> Tunbridge Wells </Item> <Creator> <Item>Royal Council</Item> $fs/Author/Display. Name </Bag> </Creator> </Subject> </Photoshop_Image> … FOR $fs IN /Win. FSImage </Photoshop_Image> Q 2= <GUID>$p/GUID</GUID> FOR $p IN T 12 WHERE $p/Creator LIKE "%Robi%" Win. FS (known schema) <Win. FSImage> <GUID>178 A 8 CD 8866</GUID> <Author> <Display. Name> Henry Peach Robinson <Display. Name> <Role>Photographer</Role> <Author> <Keyword> Tunbridge </Keyword> <Keyword>Council</Keyword> … </Win. FSImage> Extending semantic interoperability techniques to decentralized settings The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
2. 1 Peer Data Management Systems date? <es: c. Date> 05/08/2004 </es: c. Date> <xap: Create. Date>2001 -1219 T 18: 49: 03 Z</xap: Create. Date> <xap: Modify. Date>2001 -1219 T 20: 09: 28 Z</xap: Modify. Date> m te te Da Da : c F: es y. RD article weather m xa y. R p: DF M : D od at ify e Da te es: c. Date xap: Create. Date <my. RDF: Date> Jan 1, 2005 </my. RDF: Date> • Local pairwise mappings – Peer Data Management Systems (PDMS) • Pairwise mappings overcome global schema heterogeneity – Transitive closures on mapping operations The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Problem: Precision/Recall Tradeoff • Semantic Query routing – To whom shall I forward a query posed against my local schema? • Some (most) mappings will be (partially) faulty – Low expressive power of mappings – Automatic schema alignment techniques – Granularity of conceptualizations… • Local query resolution – Low recall • Flooding (PDMS) – Low precision • Standard deductive integration is not sufficient – Uncertainty on mappings and conceptualizations Þ abductive reasoning (on transitive closures of mappings) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
2. 2. Probabilistic Message Passing Link-based analysis of the PDMS: m 0 - Mapping Cycles - Parallel Paths m 5 m 3 Semantics as global agreement q VS m 3(m 4(m 0(q))) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities m 1 m 4 m 2
Computing a Marginal for one cycle unknown observed • P(m 0, m 1, m 2, m 3, f 0) = P(m 0) P(m 1) P(m 2) P(m 3) P(f 0 | m 0, m 1, m 2, m 3, ) • P(m 0| f 0)= m 1, m 2, m 3 P(m 0, m 1, m 2, m 3 , f 0) P(f 0)-1 • But: feedbacks on different cycles are correlated – Need to express a global probabilistic model for the mapping graph The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
A Brief Intro to Factor-Graphs • g(x 1, x 2, x 3, x 4) = f. A(x 1, x 2)f. B(x 2, x 3, x 4) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Deriving PDMS Factor-Graphs The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
PDMS Factor-Graphs • Cyclic graph – Junction Tree? Clustering / Stretching of variables? • Not applicable (decentralization) – Iterative Sum-Product • Approximate results • How to perform iterative sum-product by message passing on the mapping graph? – Message passing in factor graph does not correspond to connectivity of mapping graph – We want to rely on decentralized computations only • Locality VS Globality of nodes in the factor graph – Mappings: local – Feedback factor: common, global knowledge – Observed feedback variables: neighborhood The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Embedded Message-Passing (1) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Embedded Message-Passing (2) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Sending Messages in the Mapping Graph • Message-Passing Schedules – Periodic – Lazy (piggybacking on query forwarding) • No message overhead The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Implemented System • Schemas – Import from OWL (Web Ontology Language) • Mappings – – Knowledge. Web Ontology Alignment API Import from RDF/XML Automated on-the-fly creation Comparison to standard alignments Þ Automatic derivation of quality measures P(m=correct | {F}) for the mappings using iterative message-passing Þ Per-Hop Forwarding Behaviors (Semantic Gossiping) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Some (Preliminary) Results: Convergence (undirected example graph, prior 0. 7 delta 0. 1) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Impact Of Cycle Length (simple cycle, prior 0. 5) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Fault-tolerance (faulty links) (undirected example graph, prior 0. 8 delta 0. 1) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Preliminary Results: EON (Alignment contest) • Worst-case scenario: no prior knowledge • Set of 6 schemas on bibliographic data (approx. 30 -40 attributes) • 396 generated attribute mappings (84 incorrect) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
2. 3. Semantic Gossiping • Selectively reformulate queries through mapping links – Semantic disances • Cycles analysis ( ) πTitle Creature=Joe (R 5) • Results analysis X – Syntactic distance • Lost predicates πTitle Creator=Joe (R 3) πTitle Author=Joe (R 2) πTitre Auteur=Joe (R 1) X πTitle Creator=Joe (R 4) Author=Joe (R 4)) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Self-Organization • Two types of self-organization – Static network • Self-organizing dissemination of queries ( ) – Dynamic network • Self-organizing network of mappings • Idea: – – Quality evaluation of mappings through Semantic Gossiping Drop low quality links Reorganized network leads to different quality evaluation Dynamic network changes self-organizing, self-referential semantic network The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Some Results (1) Sensitivity to TTL (cycle analysis only, 25 schemas, 4 concepts) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Some Results (2) Scalability (results analysis only, 4 concepts, TTL=3, misclassification rate=0. 1, 2 documents/peer on avg. ) The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
2. 4. Semantic Interoperability in the Large • Do we have enough (good) mappings? • Modeling semantic interoperability: Schema-to-Schema Graph – – Logical model Directed Weighted Redundant • The semantic connectivity graph – Idea: as for physical network analyses, define a connectivity layer – Unweighted, non-redundant version of the Schema-to-schema graph – Observation: • Peers in a set Ps are semantically interoperable iff Ss is strongly connected, with Ss {s | p Ps, p s} The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Analyzing Semantic Interoperability in the Large • Analyzing semantic interoperability in large-scale, decentralized networks – Percolation theory for directed graphs – Based on recent graph-theoretic frameworks – Random graphs with specific degree distributions pjk, clustering coefficients cc and bidirectionality coefficient bc • Necessary condition for semantic interoperability in the large: j, k (jk-j(bc+cc)-k)pjk ≥ 0 • Excellent approximations of the size of semantically interoperable clusters in the graph • Analysis: Sequence Retrieval System The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
3. Applications 1. Grid. Vine • Self-organizing semantic overlay network 2. Pic. Shark • Self-organizing middleware to export pictures and create mappings The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
3. 1 Grid. Vine • Building large-scale semantic systems – Self-organizing semantic overlay network The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Semantic Mediation Layer Overlay Layer Correlated / Uncorrelated “Physical” layer The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Features • Based on the P-Grid P 2 P structure – Distributed Hash Table developed at EPFL – Self-organized, scalable, decentralized – Resolves key-based searches in O (log(n)) even for unbalanced trees • Semantic Web compliant – RDF triples, RDFS schemas, OWL mappings • Structured searches – RDQL queries • Semantic Gossiping – Fosters semantic interoperability The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Grid. Vine: Annotating Content The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Decentralized Query Resolution: Overview The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
3. 2 Pic. Shark • Where do the translation links come from? • Middleware for sharing semi-structured metadata attached to pictures and creating translation links 60 moments Features Extractor Pic. Shark PSP XMP Win. FS Insert Metadata Extractor Information Tracker The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities Retrieve (Distributed) Hashtable (e. g. , Grid. Vine)
Features • Self-Organization of mappings – Based on low-level features extracted from • Picture (color moment, textures) • Structured Metadata (lexicographical analysis) • Self-Organization of annotations – Probabilistic propagation of annotations between similar individuals • Self-Organization of query propagation – Schema distance based on probabilistic subsumption – Propagation within a certain diameter Þ Driven by user interaction Þ Scalable • • • Computationally expensive operations are local at the peers Only simple in-network operations (look-ups) (on-going) collaborative effort with Microsoft Research Asia The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 37
Pic. Shark Prototype The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
4. Conclusions • Fundamental issue: Interoperability in large scale (semi) structured environments – Content Sharing – Information search – Semantic Web? • Traditional techniques are not sufficient – Scale – Autonomy – Uncertainty Þ Self-organizing, decentralized stochastic processes Þ Data Indexation Þ Data Integration Þ Query dissemination The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Some References (1) Semantic Gossiping A Framework for Semantic Gossiping Karl Aberer, Philippe Cudré-Mauroux, Manfred Hauswirth SIGMOD Record, 31(4), December 2002. The Chatty Web: Emergent Semantics through Gossiping Karl Aberer, Philippe Cudré-Mauroux, Manfred Hauswirth, International World Wide Web Conference (WWW 03). Probabilistic Message-Passing in Peer-Data Management Systems Philippe Cudré-Mauroux, Karl Aberer, and Andras Feher International Conference on Data Engineering (ICDE 06). Self-Organizing Semantics Start making sense: The Chatty Web approach for global semantic agreements, Karl Aberer, Philippe Cudré-Mauroux, Manfred Hauswirth, Journal of Web Semantics, 1 (1), December 2003. Emergent Semantics Principles and Issues Karl Aberer, Philippe Cudré-Mauroux and Aris M. Ouksel (editors) Tiziana Catarci Mohand-Said Hacid, Arantza Illarramendi, Vipul Kashyap, Massimo Mecella, Eduardo Mena, Erich J. Neuhold, Olga De Troyer, Thomas Risse, Monica Scannapieco, Fèlix Saltor, Luca de Santis, Stefano Spaccapietra, Steffen Staab and Rudi Studer International Conference on Database Systems for Advanced Applications (DASFAA 04). The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Some References (2) Semantic Interoperability In the Large A Necessary Condition For Semantic Interoperability In The Large Philippe Cudré-Mauroux and Karl Aberer International Conference on Ontologies, Data. Bases, and Applications of Semantics (ODBASE 04). Analyzing Semantic Interoperability in Bioinformatic Database Networks Philippe Cudré-Mauroux, Julien Gaugaz, Adriana Budura and Karl Aberer Semantic Network Analysis (SNA 05). Grid. Vine: Building Internet-Scale Semantic Overlay Networks Karl Aberer, Philippe Cudré-Mauroux, Manfred Hauswirth and Tim van Pelt International Semantic Web Conference (ISWC 04). Semantic Overlay Netwoks (tutorial) Karl Aberer and Philippe Cudré-Mauroux International Conference on Very Large Data Bases (VLDB 05). … more references at http: //lsirpeople. epfl. ch/pcudre/ The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Questions? The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
d651cf573282137ff81b6e4e8a144cd6.ppt