Скачать презентацию Proxy-Server Architectures for OLAP Panos Kalnis Dimitris Papadias Скачать презентацию Proxy-Server Architectures for OLAP Panos Kalnis Dimitris Papadias

14260da8d0c53cd82a31d8a53e83ff6a.ppt

  • Количество слайдов: 15

Proxy-Server Architectures for OLAP Panos Kalnis, Dimitris Papadias THE HONG KONG UNIVERSITY OF SCIENCE Proxy-Server Architectures for OLAP Panos Kalnis, Dimitris Papadias THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY

The Problem Ø Data warehouses: Large repositories of historical summarized information Ä Distributed: Centralized The Problem Ø Data warehouses: Large repositories of historical summarized information Ä Distributed: Centralized or decentralized. Static structure! Ø WWW: new opportunities to access warehouses. Example: Stock market data Ä Professional brokers: Access directly the warehouse by special purpose OLAP software Ä Individual investors around the world: Use web browsers. Slow network? Server overloading? Caching? London Stock Market Warehouse OLAP clients Interne t Tokyo Singapore Hong Kong THE HONG KONG UNIVERSITY

OLAP Cache Servers (OCS) Ø Similar to WWW Proxy-Servers Ø Geographically spanned and connected OLAP Cache Servers (OCS) Ø Similar to WWW Proxy-Servers Ø Geographically spanned and connected London Stock Market Warehouse through an arbitrary network Ø They cache results from OLAP queries Ø Can derive new results from the cached OLAP clients Interne data t Ø Clients connect to an OCS. If the OCS cannot answer, the query is redirected to a OCS neighbor OCS or to the warehouse Ø Result: Lower network cost, better Tokyo Singapore scalability, lower response time Hong Kong THE HONG KONG UNIVERSITY

OCS vs. WWW Proxy-Servers Ø OCS has computational capabilities. Ø The cache admission and OCS vs. WWW Proxy-Servers Ø OCS has computational capabilities. Ø The cache admission and replacement policies are optimized for OLAP operations. Ø OCS can update its contents incrementally, instead of invalidating the cached data THE HONG KONG UNIVERSITY

Background Ø Data Cube Lattice: Interdependencies among views SELECT P_id, T_id, SUM(Sales) FROM data Background Ø Data Cube Lattice: Interdependencies among views SELECT P_id, T_id, SUM(Sales) FROM data GROUP BY P_id, T_id Ø Client-Server OLAP Caching ÄWatchman: Semantic caching ÄDynamat: Stores fragments ÄCaching chunks Ø OCSs may use any of these methods ÄThe prototype caches entire views THE HONG KONG UNIVERSITY

System Architecture Ø Multiple levels of caching Ø Cooperation among OCSs Ø Physical organization System Architecture Ø Multiple levels of caching Ø Cooperation among OCSs Ø Physical organization and fragmentation may differ in each OCS Ø Centralized: Query optimization and cache control in a central site (intranet) Ø Semi-centralized: Only query optimization in central site. Each OCS controls its local cache Ø Autonomous: All decisions are taken locally (internet) THE HONG KONG UNIVERSITY

Query Optimizer Cost = Read + Transfer Ø A client sends a query q Query Optimizer Cost = Read + Transfer Ø A client sends a query q Autonomous policy: i. OCS has the exact answer ii. OCS cannot answer q iii. OCS can derive q THE HONG KONG UNIVERSITY

Query Optimizer (cont. ) Ø Autonomous: Scalable, easy to implement, high availability. ÄLarge, unstructured, Query Optimizer (cont. ) Ø Autonomous: Scalable, easy to implement, high availability. ÄLarge, unstructured, dynamic environments ÄBUT may produce inefficient plans Ø Centralized (and semi-centralized): ÄA central site has global information for all OCSs. ÄCreates the execution and routing plan for all queries ÄLow availability, low scalability ÄSuitable for intranets THE HONG KONG UNIVERSITY

Caching Policy: Autonomous ØLower Benefit First: Considers interdependencies, but: Ä Cost() difficult to calculate; Caching Policy: Autonomous ØLower Benefit First: Considers interdependencies, but: Ä Cost() difficult to calculate; If v cannot be answered locally we assume that it is answered by the warehouse Ä The complexity of LBF grows quadratically with the number of materialized views Ø We evict a set from the cache if the combined benefit < benefit(u). Select the victim set: Similar idea to [HRU 96] THE HONG KONG UNIVERSITY

Caching Policy: Centralized Ø All the decisions are taken at the central site Ø Caching Policy: Centralized Ø All the decisions are taken at the central site Ø Centralized policy uses Smaller Penalty First ÄExperiments show that the difference between SPF and LBF is not significant Ø In general: A bad decision of the caching algorithm does not affect the performance significantly BUT a bad decision of the optimizer has significant impact THE HONG KONG UNIVERSITY

Updates Ø Changes are propagated periodically to the warehouse. It computes deltas for its Updates Ø Changes are propagated periodically to the warehouse. It computes deltas for its materialized views Ø No down time for the OCSs Ø OCS updates its cache on-demand: Invalidate vs. incrementally update Ø Deltas are treated as normal data Ø Deltas are evicted at the end of the update period Ø Non-updated results are also evicted THE HONG KONG UNIVERSITY

Experimental Setup OCS configuration Client-Side-Cache Ø APB and TPC-H Ø Cmax = max Cache Experimental Setup OCS configuration Client-Side-Cache Ø APB and TPC-H Ø Cmax = max Cache as a percentage of the entire cube Ø 1500 queries at each OCS Worst case DCSR vs. Cmax THE HONG KONG UNIVERSITY

Effect of Network Cost DCSR vs. Cmax Warehouse Hit Ratio vs. Cmax Ø 3 Effect of Network Cost DCSR vs. Cmax Warehouse Hit Ratio vs. Cmax Ø 3 OCSs – we vary the speed of the links to the DW Ø In slow networks, OCSs utilize the contents of their neighbors Ø In fast networks, many queries reach the warehouse, because the computation cost is lower THE HONG KONG UNIVERSITY

Autonomous vs. Semicentralized 100 OCSs DCSR vs. tightness G Centralized Semi. Centralized Ø High Autonomous vs. Semicentralized 100 OCSs DCSR vs. tightness G Centralized Semi. Centralized Ø High tightness or many OCSs Autonomous Semi-Centralized DCSR vs. #of OCSs THE HONG KONG UNIVERSITY

Conclusions Ø OCS: Architecture for caching OLAP results Ø Beneficial for ad-hoc, geographically spanned Conclusions Ø OCS: Architecture for caching OLAP results Ø Beneficial for ad-hoc, geographically spanned and possibly mobile users, who sporadically need to access a warehouse Ø Complimentary to both client-side-cache systems and distributed OLAP approaches Ø Future work: Prototype on top of a DBMS, support of multiple DWs, finer granularity of cached data, special queries. THE HONG KONG UNIVERSITY