Скачать презентацию Caching with Good Enough Currency Consistency and Completeness Скачать презентацию Caching with Good Enough Currency Consistency and Completeness

bfb514e3022c10e6d721b17d3483693b.ppt

  • Количество слайдов: 40

Caching with “Good Enough” Currency, Consistency, and Completeness Hongfei Guo University of Wisconsin Per-Åke Caching with “Good Enough” Currency, Consistency, and Completeness Hongfei Guo University of Wisconsin Per-Åke Larson Microsoft Research Raghu Ramakrishnan University of Wisconsin

Motivation — Scaling Google … 2 Motivation — Scaling Google … 2

Motivation — Scaling A DBMS By Caching How to tell whether the cached data Motivation — Scaling A DBMS By Caching How to tell whether the cached data is “good enough” for an Application Server application? Ø NO data quality requirements from the applications! Ø NO data quality App specific codeguarantees from the caching DBMS! … Caching DBMS Asynchronous Updates Backend DBMS 3

The Big Picture Ø Ø Apps: Application data quality requirements in Specifies Server queries The Big Picture Ø Ø Apps: Application data quality requirements in Specifies Server queries Cache: Enforces data quality constraint View level granularity [SIGMOD 2004] [SIGMOD 2004 Demo] Ø Caching Cache admin: Specify local data quality to be DBMS Finer granularity maintained by cache (Data (Partitions of a view) model) quality-aware database caching [This presentation] Ø Backend System performance evaluation DBMS [dissertation] 4

Data Quality Metrics (informal) Ø Ø Ø Currency: The elapsed time since this copy Data Quality Metrics (informal) Ø Ø Ø Currency: The elapsed time since this copy becomes stale Consistency: A query result is (snapshot) consistent iff it is as if evaluated from a snapshot of the master database C&C: Currency & Consistency 5

Roadmap Ø Ø Ø Background Cache data quality properties Cache property specification Enforcing data Roadmap Ø Ø Ø Background Cache data quality properties Cache property specification Enforcing data quality constraints Experiments Future directions and conclusions 6

Why Define Cache Properties? Query processing Cache Properties (= contract) Cache maintenance 7 Why Define Cache Properties? Query processing Cache Properties (= contract) Cache maintenance 7

Cache Properties (P+3 C) Ø Ø Presence — per object Consistency — a set Cache Properties (P+3 C) Ø Ø Presence — per object Consistency — a set of objects Completeness — per predicate Currency — object staleness 8

Basic Concepts Tables Object View 1 Master Database H 1 Snapshots View 2 View Basic Concepts Tables Object View 1 Master Database H 1 Snapshots View 2 View 3 Cache H 2 9

Cache Property Examples Currency = now – stale point Consistent Complete Present View 1 Cache Property Examples Currency = now – stale point Consistent Complete Present View 1 Master Database H 1 Stale point View 2 View 3 Cache H 2 10

Roadmap Ø Ø Ø Background Cache data quality properties Cache property specification Enforcing data Roadmap Ø Ø Ø Background Cache data quality properties Cache property specification Enforcing data quality constraints Experiments Future directions and conclusions 11

Specifying Cache Properties Ø Specified as integrity constraints Ø Ø Ø Presence constraint Consistency Specifying Cache Properties Ø Specified as integrity constraints Ø Ø Ø Presence constraint Consistency constraint Completeness constraint Presence correlation constraint Consistency correlation constraint 12

Presence Constraint Author. Copy: author. Id name city 1 Madison 2 Backend DBMS Alice Presence Constraint Author. Copy: author. Id name city 1 Madison 2 Backend DBMS Alice Bob Madison 3 Cedric Seattle Author. List_PCT: author. Id 1 Caching DBMS 2 3 13

Presence Constraint CREATE VIEW Author. Copy AS Partially SELECT * FROM Authors materialized view Presence Constraint CREATE VIEW Author. Copy AS Partially SELECT * FROM Authors materialized view CREATE TABLE Author. List_PCT control-key [Zhou et al (author. Id int) 2005] ALTER VIEW Author. Copy ADD PRESENCE ON author. Id IN control-table (SELECT author. Id FROM author. Id_PCT Author. Copy: author. Id name city 1 Alice Madison 2 Bob Madison 3 Cedric Seattle Author. List_PCT: author. Id 1 2 3 14

Consistency Constraint Cache Region CREATE TABLE City. List_Cs. CT (city string) Backend ALTER VIEW Consistency Constraint Cache Region CREATE TABLE City. List_Cs. CT (city string) Backend ALTER VIEW Author. Copy ADD DBMS Consistency ON city IN (SELECT city FROM city. List_Cs. CT Author. Copy: author. Id name city 1 Alice Madison 2 Bob Madison 3 Cedric Seattle Author. List_PCT: City. List_Cs. CT: Author. List_PCT: author. Id city author. Id Madison 1 1 2 2 3 3 15

Completeness Constraint Author. Copy: author. Id CREATE TABLE City. List_Cp. CT (city string) Backend Completeness Constraint Author. Copy: author. Id CREATE TABLE City. List_Cp. CT (city string) Backend ALTER VIEW Author. Copy ADD DBMS Completeness ON city IN (SELECT city FROM city. List_Cs. CT name city 1 Alice Madison 2 Bob Madison 3 Cedric Seattle City. List_Cp. CT: Author. List_PCT: author. Id city author. Id Madison 1 1 3 3 16

Presence Correlation Constraint Author. List_PCT: author. Id 1 Author. Copy: author. Id 2 3 Presence Correlation Constraint Author. List_PCT: author. Id 1 Author. Copy: author. Id 2 3 Backend DBMS ALTER VIEW Book. Copy ADD PRESENCE ON author. Id IN (SELECT author. Id FROM Author. Copy) name 1 2 3 Alice Bob Cedric 111 222 333 444 555 Madison Seattle author. Id Book. Copy: isbn city author. Id 1 1 2 3 3 title aaa bbb ccc ddd eee 17

Presence Correlation Constraint Author. List_PCT: author. Id 1 2 3 Author. List_PCT author. Id Presence Correlation Constraint Author. List_PCT: author. Id 1 2 3 Author. List_PCT author. Id Author. Copy author. Id Book. Copy Author. Copy: author. Id name 1 2 3 Alice Bob Cedric Book. Copy: isbn 111 222 333 444 555 author. Id 1 1 2 3 3 city Madison Seattle author. Id title aaa bbb ccc ddd eee 18

Consistency Correlation Constraint Author. List_PCT: author. Id 1 2 3 Backend DBMS ALTER VIEW Consistency Correlation Constraint Author. List_PCT: author. Id 1 2 3 Backend DBMS ALTER VIEW Book. Copy ADD CONSISTENCY ROOT Author. Copy: author. Id name 1 2 3 Alice Bob Cedric 111 222 333 444 555 Madison Seattle author. Id Book. Copy: isbn city author. Id 1 1 2 3 3 title aaa bbb ccc ddd eee 19

Consistency Correlation Constraint Author. List_PCT: author. Id 1 2 3 Author. List_PCT author. Id Consistency Correlation Constraint Author. List_PCT: author. Id 1 2 3 Author. List_PCT author. Id Author. Copy author. Id Book. Copy Author. Copy: author. Id name 1 2 3 Alice Bob Cedric Book. Copy: isbn 111 222 333 444 555 author. Id 1 1 2 3 3 city Madison Seattle author. Id title aaa bbb ccc ddd eee 20

Cache Schema Example Author. List_PCT Reviewer. List_PCT author. Id reviewer. Id Author. Copy Reviewer. Cache Schema Example Author. List_PCT Reviewer. List_PCT author. Id reviewer. Id Author. Copy Reviewer. Copy author. Id Book. Copy isbn Review. C opy review. Id 21

Roadmap Ø Ø Ø Background Cache data quality properties Cache property specification Enforcing data Roadmap Ø Ø Ø Background Cache data quality properties Cache property specification Enforcing data quality constraints Experiments Future directions and conclusions 22

Changing The Assumptions Fully materialized Partially materialized Ø More general algorithms views Run-time check Changing The Assumptions Fully materialized Partially materialized Ø More general algorithms views Run-time check for consistency constraints that can not be validated Consistentcompile-time Row-level consistency at views Ø Push-based maintenance Pull-based maintenance 23

Run-time C&C Checking When view V matches expression E E V Choose. Plan Local Run-time C&C Checking When view V matches expression E E V Choose. Plan Local plan using V C&C Guard Remote plan requesting E Currency guard: Check if local view V satisfies currency requirement Consistency guard: Check if local view V satisfies consistency requirement 24

Performance Evaluation Goals Ø Consistency guards overhead Ø Ø Simple checks A spectrum of Performance Evaluation Goals Ø Consistency guards overhead Ø Ø Simple checks A spectrum of checks ranging from simple to complicated 25

Experimental Setting Ø Ø Back-end hosts a TPCD database tpcd 1 gh with scale Experimental Setting Ø Ø Back-end hosts a TPCD database tpcd 1 gh with scale factor 1. 0 (~1 GB) Cache server has a shadow of tpcd 1 gh Two local views: cust. Copy, order. Copy LAN connection between cache and backend server 26

Queries Used 27 Queries Used 27

Simple Consistency Guards Overhead 1. 6% Execution time (ms) 1. 72% 1. 66% 1. Simple Consistency Guards Overhead 1. 6% Execution time (ms) 1. 72% 1. 66% 1. 59% 16. 56% 14. 00% Local Remote 28

Execution time (ms) Single Table Consistency Guard Overhead 6. 06% 4. 95% 2. 33% Execution time (ms) Single Table Consistency Guard Overhead 6. 06% 4. 95% 2. 33% 7. 48% 8. 79% (Qa is used) 62. 85% 71. 41% 16. 98% 58. 32% 23. 77% Local Remote 29

Future Directions Adaptive data quality aware caching policies Improve current prototype Ø Ø Read-write Future Directions Adaptive data quality aware caching policies Improve current prototype Ø Ø Read-write transactions? Time-line constraints? Apply “good enough” to other forms of replications Ø Indexing data? Ø Ø Control-table content? Refresh intervals? Automate cache design/tuning Ø How to get a good cache schema? (i. e. , cache region granularity, assignment) 30

Summary Ø Ø Goal: fine-grained data quality-aware cache management A comprehensive solution How the Summary Ø Ø Goal: fine-grained data quality-aware cache management A comprehensive solution How the cache tracks data quality? ØSo long, and thanks for all the fish! Four cache properties Ø Ø Ø How admin specify cache properties? Dynamic cache model How to maintain the cache efficiently? Efficient cache maintenance and “safety” How to do enforce C&C checking for queries? Efficiently enforce C&C constraints Questions? 31

32 32

Proposed SQL Syntax Book. Copy bid title author 1 databases Raghu 2 databases Ullman Proposed SQL Syntax Book. Copy bid title author 1 databases Raghu 2 databases Ullman Review. Copy rid bid text SELECT * Currency bound Consistency FROM Books B, Reviews R Group by class WHERE B. bid = R. bid AND B. title = “Databases“ CURRENCY BOUND 10 min ON (B, R) BY B. bid CURRENCY BOUND 10 min ON (B)R) (B, , 30 min ON (R) bid title author bid rid text 1 databases Raghu 1 1 … … 1 databases Raghu 1 2 … … 2 databases Ullman 2 3 … 1 1 … 2 1 3 2 33

Pull-Maintenance Ø Ø Refresh a region by pulling query results When refreshing a region, Pull-Maintenance Ø Ø Refresh a region by pulling query results When refreshing a region, also refresh the affected closure Ø Ø All overlapping regions All correlated regions 34

Theoretical Results Ø Definition: (Safe partially materialized views) A partially materialized view V is Theoretical Results Ø Definition: (Safe partially materialized views) A partially materialized view V is safe if the following two conditions hold for every instance of the cache that satisfies all integrity constraints: Ø Ø Property held for For any pair of regions in V, either they don’t overlap or one is contained in the other. every instance If V is gray, let X denote the set of regions in V defined by presence control-key values. X is a partitioning of V and no pair of regions in X is contained in any one region defined on V. Ø Ø Cache schema design rules: Rule 1: A cache graph is Syntacticallya DAG. Rule 2: Only red nodes can have independent completeness or consistency control-tables. checkable conditions Rule 3: Every PMV with more than one parent must be a red circle. Rule 4: If a PMV has the (polynomial)sharedrow problem according to Lemma 5. 2, Ø Ø Ø then it cannot be gray. Rule 5: A PMV cannot have noncompatible control-tables. Theorem: Given a cache schema , if it satisfies the design rules, then every PMV in W is safe. Conversely, if the schema violates one of these rules, there is an instance of the cache satisfying all specified integrity constraints in which some PMV is unsafe. 35

Pull-Maintenance Author. List_PCT: author. Id 1 3 4 author. Id Title. List_Cs. CT: Book. Pull-Maintenance Author. List_PCT: author. Id 1 3 4 author. Id Title. List_Cs. CT: Book. Copy: isbn 111 222 333 444 555 author. Id 1 1 1 3 4 title aaa bbb ccc aaa eee title aaa 36

Pull-Maintenance Author. Copy: Author. List_PCT author. Id Author. Copy author. Id Book. Copy author. Pull-Maintenance Author. Copy: Author. List_PCT author. Id Author. Copy author. Id Book. Copy author. Id name city 1 3 Alice Cedric Madison Seattle Book. Copy: isbn 111 222 333 444 555 author. Id 1 1 1 3 3 author. Id title aaa bbb ccc aaa eee 37

Inefficient Pulling Author. Copy: author. Id Author. Book. Copy: author. Id 1 1 1 Inefficient Pulling Author. Copy: author. Id Author. Book. Copy: author. Id 1 1 1 3 3 isbn 111 222 333 111 555 r. Id o th au isbn name city 1 Alice Madison Shared-row 3 Cedric Seattle problem Book. Copy: isbn 111 222 333 555 price 10 20 30 50 title aaa bbb ccc eee 38

Issues Ø Inefficient pulling: Ø Ø Calculation of the affected closure requires checking the Issues Ø Inefficient pulling: Ø Ø Calculation of the affected closure requires checking the rows Efficient pulling: Ø Ø The affected closure does NOT depend on the instance of a view Only requires forward pull among correlated views 39

Related Work Relaxing data quality Ø Distributed databases Read-only transactions [Garcia-Monina et al. 1982] Related Work Relaxing data quality Ø Distributed databases Read-only transactions [Garcia-Monina et al. 1982] Ø Demarcation protocol [Barbará et al 1992] Ø TACC [Yu et al. 2000] Ø Ø Epsilon-serilizability [Pu et al. 1992] Ø Caching Ø Database caching DBCache [Altinel et al. 2003] Constraint-based database caching [Härder et al. 2004] Mid-Tier caching [Times. Ten 2002] Shared-storage caching [Khalil et al 2002] Uniqueness of our approach (query-centric): Query: Specifies fine-grained C&C constraints Ø Warehousing and web views Ø Admin: Flexible local data quality control in Web. Views [Labrinidis et al 2003] Ø Others FAS [Röhm et of 2002] al. Semantic caching termsviews granularity and properties [Dar et al 1996] Obsolescent [Gal 1999] Cache in Postgres [Stonebraker et al 1990] Distributed views [Segev et al 1990] ØFreshness-driven web caching [Li et. Provides C&C guarantees for Caching DBMS: al 2003] Predicate-based caching [Keller et al 1996] WATCHMAN [Scheuermann et al 1996] Ø Replica management individual query Cache investment [Kossmann et al 2000] Quasi-copies [Alonso et al. 1998], Ø Ø Ø Ø Ø [Gallersdörfer et al. 1995] Good-enough views [Seligman et al. 1997] TRAPP [Olson et al. 2000] Ø Ø DECAF [Kiernan et al 2000] Proxy caching [Luo et al 2001] 40