c3db9deb81a9899df01083aac3d011fe.ppt
- Количество слайдов: 25
Uo. I Presentation DBGlobe IST-2001 -32645 3 rd Meeting Athens, November 29, 2002 Future and Emerging Technologies (FET) Proactive initiative on: Global Computing (GC) The roots of innovation
Outline Directories : : Resource Location Data Delivery
Resource Discovery Summaries for Resource Discovery Maintain summaries (e. g. , Bloom filters) to assist the search for a service (resource) Directories for XML metadata and appropriate summaries
Resource Discovery Motivation: (DBGlobe) Large Scale and Dynamic Environment How to locate a resource System Model: Sites that store hierarchical descriptions of services (in XML) or XML documents Path queries Limitations (so far): We consider only XML-Trees (no cycles) No value queries Joint work with Georgia Koloniari
Resource Discovery device <xml> <device> <printer> <color></color> <postscript></postscript> </printer> <camera> <digital></digital> </camera> </device> printer color postscript camera digital An example XML-description and the corresponding XML-tree Path queries From the root: //device/printer Partial: camera/digital * Overall Approach: maintain Bloom-based indexes to check whether a document (item) exists at a site (peer)
Resource Discovery Bloom-Filters test if an element b exists in a set A = {a 1, a 2, …, an} of n elements (keys) Bit Vector v Element a h 1(a) = P 1 h 2(a) = P 2 h 3(a) = P 3 h 4(a) = P 4 Allocate a vector v of m bits, initially all set to 0 Choose k independent hash functions, h 1, h 2, … , hk, each with range {1, …, m}. 1 1 m bits 1 1 For each element a A, set the bits at positions h 1(a), h 2(a), . . . , hk(a) to 1. (A particular bit might be set to 1 multiple times) Given a query for b, check the bits at positions h 1(b), h 2(b), . . . , hk(b). If any is 0, then certainly b is not in the set A. Otherwise we assume that b is in the set (“false positive”).
Resource Discovery Breadth (or level) Blooms The Breadth Bloom Filter (BBF) for an XML tree T with j levels: set of Bloom filters {BBF 0, BBF 1, BBF 2, … BBFi}, i ≤ j One Bloom filter, denoted BBFi, for each level i of the tree. BBFi: the labels (attributes) of all nodes at level i. BBF 0: all attributes that appear in any node of the XML tree T. device BBF 0 printer color camera {device, printer, camera, color, postscript, digital} digital {device} BBF 2 {printer, camera} BBF 3 postscript BBF 1 {color, postscript, digital} The BBFi s are not of the same size We may skip levels
Resource Discovery Depth (or Path) Blooms The Depth Bloom Filter (DBF) for an XML tree T with j levels: set of Bloom filters {DBF 0, DBF 1, DBF 2, … DBFi-1}, i ≤ j One Bloom filter, denoted DBFi, for each path of length i (with i+1 nodes) of the tree. DBFi: the labels (attributes) of all paths of length i. DBF 0: all attributes that appear in any node of the XML tree T. device DBF 0 printer color postscript camera digital DBF 1 DBF 2 Special symbol for “root” paths {device, printer, camera, color, postscript, digital} {device/printer, device/camera, printer/color, printer/postscript, camera/digital} {device/printer/color, device/printer/postscript, device/camera/digital
Resource Discovery Preliminary performance results • Both outperform (in terms of false positives) a same size simple bloom • Depth (path) very sensitive on the number of levels • Depth (path) need more space • Updates are handled efficiently (just the corresponding vectors)
Resource Discovery Distribution Each site: § local-filter: a bloom filter for local resources § one or more summary -filter summary-filter: merge of the bloom filters of a set X of other sites
Resource Discovery Horizons (keep information for up to horizon = d neighbors (as in routing indexes) A merged-filter for each path: merge of blooms for all sites on the path up to length equal to the horizon 7 1 6 8 Merged of nodes 6, 7, 8 9 2 Merged of nodes 1, 2 0 4 3 Merged of nodes 3, 4 5
Resource Discovery Hierarchical root peers 1 2 3 Leaf sites : local filter Internal sites : summaries for all nodes in its subtree Root sites : summaries for other root sites
Resource Discovery Future work • Evaluate distribution strategies • Other ways of summarizing data (related work on selectivity estimation) • See how this § can be related to ontologies (meaningful path queries) § whether/how it can be integrated with querying
Outline Directories : : Resource Location Data Delivery
Data Delivery For the 1 st deliverable on the topic • A survey on different modes to transmit data: § Push/pull § Continuous (periodic) /a-periodic § Multicast/unicast § Directed diffusion (communication only with neighbor nodes)
Data Delivery For the 1 st deliverable on the topic • The different data delivery modes in DBGlobe Tradeoffs of using one over the other (e. g. , in registering services, directory (location updates) To be extended for D 10 (Data Delivery and Querying)
Data Delivery Modes and Coherence Focus: How to achieve temporal (currency) and Semantic (transaction-based) Coherency of Data under different modes of data delivery
Data Delivery The Data Broadcast Model The server broadcasts data from a database to a large number of clients • • push mode + no direct communication with the server Server • Data updates at the server Broadcast Channel Client • Periodic updates for values on the channel the § Efficient way to disseminate information to large client populations with similar interests § Physical support in wireless networks (satellite, cellular) § Alternative way of transmitting information for data intensive applications (e. g. , web)
Data Delivery Clients must read consistent and current data without contacting the server directly § Multiple Versions: Not just one value per item, but k such values [Pitoura&Chrysanthis, IEEE TC 2003] § Temporal and Semantic Coherency (Theory and Protocols) [Pitoura, Chrysanthis&Ramamritham, ICDT 03]
Currency Data Delivery Currency Interval of an item x in RS(R) - CI(x, R) - is [cb, ce) where cb is the time instance when the value was stored in the database, ce is the time insatnce of the next change of this value in the database Currency Interval for a set (readset) (x, u) RS(R) CI(x, R) , say [cb, ce) overlappingequal to ce. RS(R) is a subset an actual database state at the server older value OV_Currency(R) = ce- , where ce is the smallest among the right limits of CI(x, R) Two properties: Temporal spread (discrepancies among database states) Temporal Lag (how old with regards some point in time (e. g. , T_commit)
Data Delivery Protocols and their properties § Timestamps (versioning) § Invalidation Reports § Propagation
Data Delivery Consistency Degrees of Consistency C 0 C 1 RS(R) DS C 2 R serializable with the set of server transactions that read values read (directly or indirectly) by R C 3 R serializable with the all server transactions C 4 R serializable with the all server transactions and the serial izability order of the server transactions that R observes is consistent with the commit order of transactions at the server
Data Delivery Protocols and their properties Based on broadcasting the serialization graph of the server (or parts of it) Relation to temporal coherency
Data Delivery Future Work Multiple servers model Applications in sensor networks
DBGlobe IST-2001 -32645
c3db9deb81a9899df01083aac3d011fe.ppt