
c53ed6a6b68af38b492b7d3f4054a1c0.ppt
- Количество слайдов: 39
How to Select an Analytic DBMS Overview, checklists, and tips by Curt A. Monash, Ph. D. President, Monash Research Editor, DBMS 2 contact @monash. com http: //www. DBMS 2. com
Curt Monash Analyst since 1981, own firm since 1987 Publicly available research Covered DBMS since the pre-relational days Also analytics, search, etc. Blogs, including DBMS 2 (www. DBMS 2. com -- the source for most of this talk) Feed at www. monash. com/blogs. html White papers and more at www. monash. com User and vendor consulting
Our agenda Why are there such things as specialized analytic DBMS ? What are the major analytic DBMS product alternatives? What are the most relevant differentiations among analytic DBMS users? What’s the best process for selecting an analytic DBMS?
Why are there specialized analytic DBMS? General-purpose database managers are optimized for updating short rows … … not for analytic query performance 10 -100 X price/performance differences are not uncommon At issue is the interplay between storage, processors, and RAM
Moore’s Law, Kryder’s Law, and a huge exception Growth factors: Transistors/chip : >100, 000 since 1971 Disk density: >100, 000 since 1956 Disk speed: 12. 5 since 1956 The disk speed barrier dominates everything! 3/16/2018 DRAFT!! THIRD TEST!!
Software strategies to optimize analytic I/O Minimize data returned Minimize index accesses Page size Precalculate results Classic query optimization Materialized views OLAP cubes Return data sequentially Store data in columns Stash data in RAM
Hardware strategies to optimize analytic I/O Lots of RAM Parallel disk access!!! Lots of networking Tuned MPP (Massively Parallel Processing) is ideal. “Recommended configurations” are a mixed bag.
Specialty hardware strategies Custom or unusual chips (rare) Custom or unusual interconnects Fixed configurations of common parts Appliances or recommended configurations And there’s also Saa. S.
18 contenders (and there are more) Aster Dataupia Exasol Greenplum HP Neoview IBM DB 2 BCUs Infobright/My. SQL Kickfire/My. SQL Kognitio Microsoft Madison Netezza Oracle Exadata Oracle w/o Exadata Par. Accel SQL Server w/o Madison Sybase IQ Teradata Vertica
General areas of feature differentiation Most influenced by architecture Query performance Update/load performance Alternate datatypes Most influenced by product maturity Compatibilities Advanced analytics Manageability and availability Encryption and security
Major analytic DBMS product groupings Architecture is a good first categorization Traditional OLTP Row-based MPP Columnar (Not covered tonight) MOLAP/array-based
Traditional OLTP examples Oracle (especially pre-Exadata) IBM DB 2 (especially mainframe) Microsoft SQL Server (pre-Madison)
Analytic optimizations for OLTP DBMS Performance Two major kinds of precalculation Star indexes Materialized views Other specialized indexes Query optimization tools Other OLAP extensions SQL 2003 Other embedded analytics
Drawbacks Complexity and people cost Hardware cost Software cost Absolute performance
Legitimate use scenarios When TCO isn’t an issue When specialized features matter Undemanding performance (and therefore administration too) OLTP-like Integrated MOLAP Edge-case analytics Rigid enterprise standards Small enterprise/true single-instance
Row-based MPP examples Teradata DB 2 (open systems version) Netezza Oracle Exadata (sort of) DATAllegro/Microsoft Madison Greenplum Aster Data Kognitio HP Neoview
Typical design choices in row-based MPP “Random” (hashed or round-robin) data distribution among nodes Large block sizes Limited indexing alternatives Suitable for scans rather than random accesses Or little optimization for using the full boat Carefully balanced hardware High-end networking
Tradeoffs among row MPP alternatives Enterprise standards Vendor size Hardware lock-in Total system price Features
Columnar DBMS examples Sybase IQ Vertica Info. Bright SAND Par. Accel Kickfire Exasol Monet. DB SAP BI Accelerator (sort of)
Columnar pros and cons Bulk retrieval is faster Pinpoint I/O is slower Compression is easier Memory-centric processing is easier MPP is not as crucial Being columnar reduces I/O So does (better) compression
Segmentation made (too) simple One database to rule them all One analyticdatabase to rule them all Frontlineanalytic database Very, very big analytic database Big analytic database handled very costeffectively
Basics of systematic segmentation Use cases Metrics Platform preferences There isn’t just one checklist.
Use cases – a first cut Light reporting Diverse EDW Big Data Operational analytics
Metrics – a first cut Total raw/user data Total concurrent users Below 1 -2 TB, references abound 10 TB is another major breakpoint 5, 15, 50, or 500? Data freshness Hours Minutes Seconds
Basic platform issues Enterprise standards Appliance-friendliness Need for MPP? Cloud/Saa. S
The selection process in a nutshell Figure out what you’re trying to buy Make a shortlist Do free POCs* Evaluate and decide *The only part that’s even slightly specific to the analytic DBMS category
Figure out what you’re trying to buy Inventory your use cases Set constraints Current Known future Wish-list/dream-list future People and platforms Money Establish target SLAs Must-haves Nice-to-haves
Use-case checklist -- generalities Database growth As time goes by … More detail New data sources Users (human) Users/usage (automated) Freshness (data and query results)
Use-case checklist – traditional BI Reports Dashboards and alerts Today Future Latency Ad-hoc Users Now that we have great response time …
Use-case checklist – predictive analytics How much do you think it would improve results to Run more models? Model on more data? Add more variables? Increase model complexity? Which of those can the DBMS help with anyway? What about scoring? Real-time Other latency issues
SLA realism What kind of turnaround truly matters? Customer or customer-facing users Executive users Analyst users How bad is downtime? Customer or customer-facing users Executive users Analyst users
Short list constraints Cash cost Deployment effort But purchases are heavily negotiated Appliances can be good Platform politics You might as well consider incumbent(s) Appliances can be frowned on
Filling out the shortlist Who matches your requirements in theory? What kinds of evidence do you require? References? How many? How relevant? A careful POC? Analyst recommendations? General “buzz”?
A checklist for shortlists What’s your tolerance for specialized hardware? What’s your tolerance for set-up effort? What’s your tolerance for ongoing administration? What are your insert and update requirements? At what volumes will you run fairly simple queries? What are your complex queries like? For which third-party tools do you need support? and, most important, Are you madly in love with your current DBMS?
Proof-of-Concept basics The better you match your use cases, the more reliable the POC is Most of the effort is in the set-up You might as well do POCs for several vendors – at (almost) the same time! Where is the POC being held?
The three big POC challenges Getting data Real? Synthetic? Hybrid? Picking queries Politics Privacy And more? Realistic simulation(s) Workload Platform Talent
POC tips Don’t underestimate requirements Don’t overestimate requirements Get SOME data ASAP Don’t leave the vendor in control Test what you’ll actually be buying Use the baseball bat
Evaluate and decide It all comes down to Cost Speed Risk and in some cases Time to value Upside
Further information Curt A. Monash, Ph. D. President, Monash Research Editor, DBMS 2 contact @monash. com http: //www. DBMS 2. com
c53ed6a6b68af38b492b7d3f4054a1c0.ppt