 Скачать презентацию Column-Stores vs Row-stores Harikrishna Bikmal 05 d 05019
	Скачать презентацию Column-Stores vs Row-stores Harikrishna Bikmal 05 d 05019
	5e82229e9c7c4ca7975f6726ccce6a37.ppt
- Количество слайдов: 28
 
								Column-Stores vs. Row-stores Harikrishna Bikmal 05 d 05019
 
								Row vs. Column Stores
 
								Column Stores Really good for read-mostly datawarehouses ◦ Lots of column scans and aggregations ◦ Writes tend to be in batch ◦ Top 3 in TPC-H rankings are column stores(Exasol, Par. Accel and Kickfire) ◦ Yahoo’s world largest data warehouse is a column store
 
								Paper Looks At Key Question How much of the buzz around column-stores just marketing hype? ◦ Do you really need to buy Sybase IQ or Vertica? How far will your current row-store take you? Can you get column-store performance from a row-store? Can you simulate a column-store in a rowstore?
 
								Benchmark Star Schema Benchmark (SSMB) ◦ Fact table: 17 columns, 60, 000 rows Table: Line. Order ◦ 4 dimension tables: largest one 80, 000 rows Tables: Customer, Supplier, Part, Date Contains 13 queries divided into four categories or flights Used System-x row-store database and for column store they used C-store
 
								Simulate Column-Store Inside Row-Store
 
								Average Query Time
 
								Vertical Partitioning Tuple overheads Traditional row store takes 4 GB (compressed) Vertical partitioning takes 0. 7 to 1. 1 GB per column(compressed) ** Column joins Expensive hash joins Were unable to use merge join
 
								Vertical Partitioning Horizontal Partitioning Fact table is vertically partitioned using year attribute of date table (dimension table) Most queries in SSMB have predicate on year attribute Vertically partitioned tables do not contain date foreign key (except one table)
 
								Index only plans Common type of query: ◦ SELECT store_name, SUM(revenue) FROM Facts, Stores WHERE fact. store_id = atores. store_id AND stores. country = “Canada” GROUP BY store_name ◦ Contains predicates on fact table using dimension tables and need to extract SELECT attributes for these tuple ids
 
								Index only plans ◦ Available : Index maps values to tuple ids ◦ Required : Tuple ids to attribute values Tuple construction is slow ! ◦ ** Expensive hash joins of the columns of the fact table before filtering using dimension tables
 
								So. . All indexes approach is a poor way to simulate a column-store Problems with vertical partitioning are NOT fundamental ◦ Store tuple header in a separate partition ◦ Allow virtual TIDs ◦ Allow HP using a foreign key on a different VP So can row-stores simulate columnstores?
 
								Optimal set of Materialized Views
 
								Column-Store Experiments Start with column-store (C-Store) Remove column-store-specific performance optimizations End with column-store with a roworiented query executer
 
								Column Store specific optimizations ◦ Compression ◦ Late Materialization ◦ Block Processing ◦ Invisible Join
 
								Column Store Compression ◦ Higher data value locality in column-stores Better ratio - reduced I/O ◦ Can use schemes like run-length encoding ◦ Easy to operate on directly for improved performance
 
								Column Store Late Materialization ◦ Select and Aggregate operations render the construction of some tuples unnecessary ◦ Delays decompression ◦ Better cache performance Block Iteration ◦ Attributes extraction can be done in blocks instead of iterating on rows
 
								Column Store Invisible Join ◦ Typical query
 
								Invisible Join Typical query Contains selection predicates on fact table using one or more dimension tables, need to extract SELECT attributes for these tuple ids, aggregation using other dimension tables Joins between fact table and dimension tables for each selection predicate, aggregate grouping
 
								Invisible Join Previous approaches either used early materialization or the ones with late materialization had many out of order attribute extraction from the dimension tables
 
								Invisible Join Step 1
 
								Invisible Join Step 2
 
								Invisible Join Step 3
 
								Invisible Join Between-Predicate Rewriting Use of range predicates instead of hash lookup in step 1 of invisible join If the set of keys obtained for join in step 1 results in a complete range of keys Use of dictionary table in case of predicates on sorted fields Invisible join gave up to 75% improvement in some queries
 
								Break down of column store advantages t = block processing (T) C = compression (c) I = Invisible join (i) L = late Materialization(l)
 
								Conclusions Attempt to emulate the physical layout of a column-store in a row-store Broke down the reasons why a columnstore is able to process column-oriented data so efficiently Introduced Invisible join
 
								Questions
 
								Thank You
 Скачать презентацию Column-Stores vs Row-stores Harikrishna Bikmal 05 d 05019
	Скачать презентацию Column-Stores vs Row-stores Harikrishna Bikmal 05 d 05019
	5e82229e9c7c4ca7975f6726ccce6a37.ppt