data:image/s3,"s3://crabby-images/68abb/68abb56c7f787cd2955a41f2e3ff1d7b7c5854f2" alt="Скачать презентацию The State of the Art in Distributed Query Скачать презентацию The State of the Art in Distributed Query"
0c5e3b611414377aabd78667cc1417bf.ppt
- Количество слайдов: 23
The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco
Introduction l Distributed database technology is becoming an increasingly attractive enhancement to many database systems ¡ Cost and scalability ¡ Software integration l Legacy systems ¡ New applications ¡ Market forces
Introduction l Topics covered in this paper ¡ Basics of distributed query processing ¡ Client-server distributed DB models ¡ Heterogeneous distributed DB models ¡ Data placement techniques ¡ Other distributed architectures
Client-Server Database Systems l Relationships between distributed nodes take a client-server form l Client: makes requests of the servers, usually the source of queries l Server: responds to client requests, usually the source of data l System architectures: peer-to-peer, strict client-server, middleware/multitier
Architectures: Peer-to-Peer All nodes are equivalent l Each can be either a client or server on demand (can store data and/or make requests) l Ex: SHORE system l Peer Node Server or Client
Architectures: Strict Client-Server Client or server status is pre-defined and can never change l Clients supply queries, servers supply data l Most common architecture in commercial DBMS’s l Client Query source Server Data source
Architectures: Middleware/Multitier Multiple levels of client-server interaction l Nodes act as clients to those below them and servers to those above l SAP R/3, web servers with DB backends l Node 1 Client to Node 2 Server to Node 1, Client to Node 3 Server to Node 2
Architectures: Evaluation l Peer-to-Peer ¡ Simplest setup ¡ Equal load sharing l Strict Client-Server ¡ Specialization ¡ Administration for servers only l Middleware/Multitier ¡ Functionality integration ¡ Scalability
Client-Server Query Processing l Queries initiated at clients, data stored at servers l Where do we execute the query? l Query shipping: move the query down to the data l Data shipping: move the data up to the query l Hybrid shipping: combination of both
Query Shipping SQL query code is sent down to the server l Server parses and evaluates query, returns result l Used in DB 2, Oracle, MS SQL Server l
Data Shipping Client parses query and requests data from server l Server provides data, then client executes query l Data can be cached at client (main memory or disk) l
Hybrid Shipping Mix-and-match data shipping and query shipping l Query parts can be executed at any level according to query plan l Data is cached when beneficial l
Evaluation l Query Shipping ¡ Reliant on server performance ¡ Scales poorly with increasing client load l Data Shipping ¡ Good scalability ¡ High communication costs l Hybrid ¡ Potential to outperform other options ¡ More complex optimizations
Hybrid Shipping Observations l Some observations of optimal performance using hybrid shipping l Preference to not use a client cache ¡ If network transfer cost < client access cost l Shipping down cached data ¡ If in main memory & execution at server l Multiple small updates ¡ Maintain at client and post to server only when necessary
Query Optimization l Query plans must also specify where the query pieces are executed l Data shipping: all execution done at client l Query shipping: all execution done at server l Hybrid: choice can be made for each operator l Results display to user is always at client
Distributed Query Plans l Each operator is annotated with a logical site of execution – plans are shareable l client means an operator is executed from the client where the query is issued l server means: ¡ for scan operators, execute at a location that has the necessary data ¡ for updates, execute at all locations with the relevant data
Query Optimization: Where? l Should optimization occur at the client or the server? l At client: less load on servers, better scalability l At server: more information about system statistics, especially server loads l Potential solution: primary parsing and query rewriting at client, further optimization at server
Query Optimization: Statistics l Even when optimization is done at a server, that server does not usually have full knowledge of the system l System can either: ¡ Guess the status of other servers – less accuracy, less cost ¡ Ask other servers their status – fully accurate, additional communication costs
Query Optimization: When? l Tradeoff of accuracy vs. cost l Traditional-style: optimize once, store plan ¡ No support for changing DB conditions ¡ No incurred cost for query execution l Plan sets: optimize for possible scenarios ¡ Generate a few query plans for diff. conditions ¡ Choose plans based on runtime statistics l On-the-fly: observe intermediate results ¡ Re-optimize query if different from expectations
Query Optimization: Two-Step l Compile-time: generate join order, etc. l Runtime: perform site selection l Reasonable cost at each end l Responds well to changing server loads l Fully utilizes client data caching
Two-Step Optimization: Downside 1. 2. 3. • Optimal plan is generated traditional-style Site selection is performed True optimal plan was missed Optimal was missed because first optimization step was done with no knowledge of the system
Query Execution Techniques l Standard fare: row blocking, multithread when possible l Issues: transactions with both updates and retrieval queries using hybrid shipping ¡ We want to wait to propagate updates for efficiency’s sake ¡ Other option: perform query before update and temporarily pad results
l Questions? l Comments?
0c5e3b611414377aabd78667cc1417bf.ppt