Скачать презентацию Self-managing Grid Services for Efficient Data Management Anastasios Скачать презентацию Self-managing Grid Services for Efficient Data Management Anastasios

333b80d3fbed60ce0b1a7a29dc0b19a4.ppt

  • Количество слайдов: 38

Self-managing Grid Services for Efficient Data Management Anastasios Gounaris (University of Cyprus, University of Self-managing Grid Services for Efficient Data Management Anastasios Gounaris (University of Cyprus, University of Manchester) Core. GRID Summer School Budapest, September 2007

Web/Grid Services play an increasingly important role in large-scale data management applications E. g. Web/Grid Services play an increasingly important role in large-scale data management applications E. g. , Sky. Query (figure taken from [2]) Or more generic WSMSs that compile, optimize and evaluate workflows composed of multiple calls to remote WSs in order to analyze data [1] U. Srivastava, K. Munagala, J. Widom, and R. Motwani. Query optimization over web services. In VLDB, pages 355– 366, 2006. [2] Tanu Malik et al. Sky. Query: A Web Service Approach to Federate Databases. In CIDR 2003 2

At the middleware level: OGSA-DAI in brief • Different types of data resources - At the middleware level: OGSA-DAI in brief • Different types of data resources - including relational, XML and files - can be exposed via web services onto Grids. • A number of popular data resource products are supported. • Through OGSA-DAI, data resources can become gridenabled in a clean, non-intrusive, straightforward way. • “Perform” documents can describe/include data creation, query, modification and delivery tasks For more information, visit the OGSA-DAI website: http: //www. ogsadai. org. uk 3

Services For Accessing Multiple Grid Data Sources and Computational Resources Protein. ID Sequence …. Services For Accessing Multiple Grid Data Sources and Computational Resources Protein. ID Sequence …. …I’d like to find similar proteins to those proteins that have interacted with any other protein in more than X cases. . . …. Subquery 2 BLAST Protein. ID 1 Protein. ID 2 …. Subquery 1 …. Protein. ID Sequence …. 4

OGSA-DQP in brief • Provides services that enable nodes to participate in the evaluation OGSA-DQP in brief • Provides services that enable nodes to participate in the evaluation of a query • Additionally, it provides a coordinator service. • Presents to the user a unified view of the available (computational, analysis and OGSA-DAI) resources in a location-transparent way. OGSA-DQP ethos: • Light-weight, loosely-coupled integration of data and computational resources. • Wrapper-mediator plus: – Allocation of query sub-plans to arbitrary grid nodes. – Partitioned and pipelined parallelism. • Service-based in that: – Queries run over services. – Query processor is itself implemented as a collection of services. For more information, visit the OGSA-DQP website: http: //www. ogsadai. org. uk/dqp 5

Services For Semantic Integration - Extend OGSA-DQP with semantic integration capabilities Topic↔Field ↔School Author Services For Semantic Integration - Extend OGSA-DQP with semantic integration capabilities Topic↔Field ↔School Author ↔ Researcher↔Professor Semantic Map Researcher Topic …. Database …I’d like to find authors of papers from the database community… Subquery 3 Professor Author …. Database Subquery 1 Field School …. Database Subquery 2 [3] Gounaris, Comito, Sakellariou and Talia. A Service-Oriented System to Support Data Integration on Data Grids. CCGrid 2007 6

BUT, there are some problems - continuing the example of slide 4 0. 5 BUT, there are some problems - continuing the example of slide 4 0. 5 BLAST 1 BLAST 2 - IF, machines have the same capacity, connection speed and load, execution time will be t - BUT, IF the load of one machine doubles, then the execution time will be 2* t, whereas the optimum is 4/3 * t - ALSO, IF a third machine becomes available, we must be able to employ it - THUS, we need ADAPTIVE, SELF-MANAGING SOLUTIONS 7

Service tuning is non-trivial - Concave-like performance graphs are very common F(x) x - Service tuning is non-trivial - Concave-like performance graphs are very common F(x) x - BUT, finding the optimal point is challenging, since the volatility of the environment and the noise insert local optima and non-linearities - In addition, we may have functions of the form F(x, y, z, . . . ) instead of F(x) 8

Outline of (the rest of) this talk - Component-based adaptivity solutions - Monitoring - Outline of (the rest of) this talk - Component-based adaptivity solutions - Monitoring - Architecture - An example from OGSA-DQP - Control theoretical approaches 9

Monitoring with external component… … such as R-GMA: + RDBMS technology allows for expressive Monitoring with external component… … such as R-GMA: + RDBMS technology allows for expressive queries - Centralized - Cannot cope well with rapidly evolving properties - Cannot cover application specific metric (e. g. , per tuple join cost, selectivity of an operator) R-GMA 10

Self-monitoring - Can monitor both at machine level (e. g. , current load, memory Self-monitoring - Can monitor both at machine level (e. g. , current load, memory available, etc. ) and at application level (e. g. , per tuple cost of subquery, selectivities, etc. ) - No unnecessary information dissemination - … which results in significantly low overheads [4] Anastasios Gounaris, Norman W. Paton, Alvaro A. A. Fernandes, Rizos Sakellariou: "Self-monitoring Query Execution for Adaptive Query Processing". Data Knowl. Eng. 51(3): 325 -348 (2004) 11

Going one step further: Self-managing According to [5], an adaptive, self-managing element is as Going one step further: Self-managing According to [5], an adaptive, self-managing element is as follows: Main advantages: • components can be re-used • easier combinations of different adaptivity flavors • clean engineering • easier configuration [5] Jeffrey O. Kephart, David M. Chess: The Vision of Autonomic Computing. IEEE Computer 36(1): 41 -50 (2003). For more information visit: http: //www-306. ibm. com/autonomic 12

A Corresponding Adaptivity Framework for Grid Query Services (extension to OGSA-DQP) Resource pool Query A Corresponding Adaptivity Framework for Grid Query Services (extension to OGSA-DQP) Resource pool Query Evaluator query Execution info WS boundaries Resource info Monitoring Component Monitoring events Assessment Prompt for adaptation Assessment Component Issues with current execution Response Adaptivity phases Response Component Adaptivity components [6] Gounaris, Paton, Sakellariou, Fernandes, Smith, Watson: "Modular Adaptive Query Processing for Service -Based Grids ". Proc. of the 3 rd IEEE International Conference on Autonomic Computing, 2006 13 [7] A Gounaris Resource Aware Query Processing on the Grid, Ph. D thesis, 2005

Additional info • How can these inter-operate? – Through publish subscribe • What is Additional info • How can these inter-operate? – Through publish subscribe • What is the exact role of the adaptivity components – Depends on the adaptivity scenario that is to be supported 14

A dynamic balancing example Site A, B Evaluator Detector project op_call (Blast) exchange Site A dynamic balancing example Site A, B Evaluator Detector project op_call (Blast) exchange Site A Site B The Detector, Diagnoser and 1 Responder are specific Diagnoser Instantiations of the Monitoring, Assessment and subscribe table_scan (protein) Response Components, respectively Responder Site C Coordinator 15

A dynamic balancing example Evaluator Detector Site A 2 Evaluator Site B Detector Evaluators A dynamic balancing example Evaluator Detector Site A 2 Evaluator Site B Detector Evaluators pass on monitoring info to detectors Diagnoser Responder Coordinator Site A, B project op_call (Blast) exchange table_scan (protein) Site C 16

A dynamic balancing example Site A, B Evaluator Detector Site A Site B Diagnoser A dynamic balancing example Site A, B Evaluator Detector Site A Site B Diagnoser Responder Coordinator 3 Detectors notify the diagnoser of the actual behaviour project op_call (Blast) exchange table_scan (protein) Site C 17

A dynamic balancing example Site A, B Evaluator Detector Site A Site B Diagnoser A dynamic balancing example Site A, B Evaluator Detector Site A Site B Diagnoser Responder Coordinator 4 Diagnoser notifies responder project op_call (Blast) exchange table_scan (protein) Site C 18

A dynamic balancing example Site A, B Evaluator Detector Site A Site B Diagnoser A dynamic balancing example Site A, B Evaluator Detector Site A Site B Diagnoser Responder Coordinator 5 Responder checks progress project op_call (Blast) exchange table_scan (protein) Site C 19

A dynamic balancing example Site A, B Evaluator Detector project Site C op_call (Blast) A dynamic balancing example Site A, B Evaluator Detector project Site C op_call (Blast) exchange 6 Responder enforces redistribution Diagnoser Responder Coordinator table_scan (protein) Site C 20

Categories of Monitoring M 1: Notifications on the cost of processing a tuple in Categories of Monitoring M 1: Notifications on the cost of processing a tuple in a subquery (tuple processing cost, selectivity, time waiting for input). M 2: Notifications about communication costs (cost of sending a buffer, recipient of buffer, size of buffer). 21

Monitoring: Parameters M 1 notifications: – Events produced by evaluator every 10 tuples; – Monitoring: Parameters M 1 notifications: – Events produced by evaluator every 10 tuples; – Averages are calculated for last 25 events; – Threshold for sending notification is a 20% change in cost per tuple. M 2 notifications are for every buffer. 22

Assessment • Based o the monitoring info, calculate throughput for each subquery. • The Assessment • Based o the monitoring info, calculate throughput for each subquery. • The assessment component seeks to identify where there is a workload imbalance. – Existing Balance Vector W = (w 1, …, wn) – Proposed Balance Vector W’ = (w’ 1, …, w’n) • Notify response component if: – (| wi- w’i|) / wi) > threshold (20%) for some i. 23

Response • If evaluation not close to completion and If progress since last adaptation Response • If evaluation not close to completion and If progress since last adaptation > x% then apply new distribution vector W’ • response can be retrospective: resends tuples to consumer based on W’. • or prospective: no tuples are resent, but future tuples are sent to consumers based on W’. 24

Such a framework is general E. g. , - by extending the RC, it Such a framework is general E. g. , - by extending the RC, it can adapt in a different way (e. g. , by replacing a slow machine), while reusing the same MC and AC. - By extending the AC, it can solve a different problem (e. g. , checking also for vertical imbalances) - By deploying a hierarchy of components, problems can be solved locally and at different levels in order to attain scalability [8] Anastasios Gounaris, Jim Smith, Norman W. Paton, Rizos Sakellariou, Alvaro A. A. Fernandes, Paul Watson: "Adapting to Changing Resource Performance in Grid Query Processing". Proc. of the 1 st International Workshop on Data Management in Grids, DMG'05 (held in conjunction with VLDB 2005). 25

Outline of (the rest of) this talk - Component-based adaptivity solutions - Monitoring - Outline of (the rest of) this talk - Component-based adaptivity solutions - Monitoring - Architecture - An example from OGSA-DQP - Control theoretical approaches - Runtime optimization (hill climbing) - Extremum control - System identification 26

A real problem • We want to retrieve a large dataset from a WS A real problem • We want to retrieve a large dataset from a WS • It is widely accepted that due to performance limitations of XML parsers, SOAP, etc. , we must split the data into several chunks • The performance is shown in the figure 27

Main observations The optimal block size depends on – The actual connection – The Main observations The optimal block size depends on – The actual connection – The data itself (avg. tuple size) Main challenges: – No model – Significant noise • Local minima – Requirement for fast convergence 28

Runtime optimization Newton-based approach. • Let x be the block size and y=f(x) be Runtime optimization Newton-based approach. • Let x be the block size and y=f(x) be the transfer time. • One chunk transfer corresponds to one adaptivity step • Performance: LOW! (Such a technique is known to be sensitive to noise, and works better for quadratic models) 29

On extremum control Objective: find the (optimal) value of the control variable(s) that yields On extremum control Objective: find the (optimal) value of the control variable(s) that yields the maximum (minimum) value of a process output (or performance measure) in the presence of noise. xk y = f(x) yk Extremum controller 30

Switching extremum control Let y be the performance metric and xk the block size Switching extremum control Let y be the performance metric and xk the block size at the kth step. Then xk= xk-1 – gain* sign(Δx*Δy) Rationale: – detect the side of the optimum point where the current block size resides on. Intuition behind: – the next block size must be greater than the previous one, if, in the last step, an increase has lead to performance improvement, or a decrease has lead to performance degradation. – Otherwise, the block size must become smaller. 31

Extensions • Linear models are not suitable, thus it is better the gain to Extensions • Linear models are not suitable, thus it is better the gain to be adaptive gain = b* Δx*Δy/y • The impact of noise must be mitigated. To this end, the values of x, y are the average over a window • To enable continuous search of the block size space, add a dither signal 32

Evaluation • We used 5 queries over data from the TPC-H benchmark 33 Evaluation • We used 5 queries over data from the TPC-H benchmark 33

Results Starting point: 1000 tuples, b = 15 The performance degradation drops by an Results Starting point: 1000 tuples, b = 15 The performance degradation drops by an order of magnitude or less 34

Intra-query behavior • The technique is characterized by stability and quick convergence • However, Intra-query behavior • The technique is characterized by stability and quick convergence • However, in some cases, overshooting is avoided only by imposing hard upper and lower limits. • Also, the oscillation in individual runs are rather significant 35

System identification Choose a model, e. g. , Or, … and apply (Recursive) Least System identification Choose a model, e. g. , Or, … and apply (Recursive) Least Squares. Then, optimal point can be found analytically Which model(s) to choose in each case, remains an open issue! 36

Summary Remarks - Component-based adaptivity solutions - Enable a wide range of adaptive solutions Summary Remarks - Component-based adaptivity solutions - Enable a wide range of adaptive solutions - Decentralized adaptations - Services are self-monitoring - In line with the IBM autonomic toolbox - Control theoretical approaches - seem very promising for self-optimization when a performance metric must be optimized on the fly - In the presence of noise … - … and absence of a model 37

Thank you! Questions? 38 Thank you! Questions? 38