02a9b0cf54bf841eed0852dd0737c1b7.ppt
- Количество слайдов: 21
The State-Space Approach to Self-Management of Enterprise Systems Vibhore Kumar, Karsten Schwan Subu Iyer*, Yuan Chen*, Akhil Sahai* Georgia Institute of Technology Hewlett-Packard labs*
Outline o o o o Motivation: Enterprise Complexity Issues Solution Overview Policy-Driven Self-Management Dynamic SLA Decomposition Results Future Work
Enterprise Complexity: Some Facts o From a survey conducted by Forrester Research n n n o Enterprises now devote 80% of their overall IT budget to maintenance and ongoing operations More than half of the 347 participating companies used at least 3 database vendors A major banking-industry client had 18 different travel and expense systems in the organization “VP of IT Governance” - says tons about the state of enterprise IT infrastructure
The Complexity Wall “If we don’t get a handle on complexity, it will stop the expansion” - Paul Horn, Senior Vice President, IBM Research “Our enterprise customers are working with enormous complexity” - Dick Lampman, Former Director, HP Labs
The Complexity Wall @ o o Worldspan, one of our industry collaborators, provides services to the travel industry One of their airline ticket pricing/availability services is hosted on a farm of 1400 servers o In 2006 alone, they processed around 9. 6 billion messages o Highly varying request rates and request type mix o Several behaviors of their system are not well understood n n n Effects of Ticket Geography Effects of Cache Refresh Time Effects of Time of Day …
To Handle The Complexity… o One must enable self-management of complex enterprise infrastructures driven by high-level goals
Enterprise Self-Management: The Hurdles o Enterprise systems are too big n o It is tough to relate high-level goals to lowlevel actions n o The problem of Complex System Modeling The operating environment is very dynamic n o The problem of Scale The problem of Dynamism Administrators find it hard to trust black-box solutions n The problem of Trust & Tractability
Solution Overview: System State-Space Enterprise System Monitored System Variables Monitored Component Variables System State Space V = (v 1, v 2, v 3, v 4, v 5, v 6, v 7, v 8, v 9, v 10, v 11, v 12, v 13, v 14, v 15, . . . . , vn) • Variables of Interest Vø • Controllable Variables Vα o V, e. g. Response-Time, Qo. I V, e. g. Allocated-Servers, Memory The aim is to establish a relation between Vø and Vα under current operating conditions
Simple Automated Operation o SLO: “Response Time < 10 msec” n n n Event: SLO Violation Condition: Bandwidth=90 Mbps, Request Rate=30 Action: set Allocated Servers to 3 Vα : Vα Vø given V – (Vα U Vø) Vø 3 1 90 30 12 12 8 9 Allocated Servers Bandwidth Request Rate Response Time
Solution Overview: The Function o o Learn from observed system states But there are problems n Different behavior in different sub-spaces n Large state space, |V| ≈ 102 to 103 v 1 v 2. . . vn CPU Bottleneck Machine Learning Network Bottleneck Observed System States
Solution Overview: The Function o o We decided to model the system using multiple µ-models ={ } We intelligently partition the set of observed system states v 1 v 2 partitions. exhibit. . . vn n n o homogenous behavior partitions have a reduced number of relevant variables Reduced & µ-Modeling solve two problems! Partitioning Number of Relevant Variables in a µ-model n n The problem of Scale The problem of Complex System Modeling
Solution Overview: µ-Models o o o We use Tree Augmented Naïve Bayes (TAN) Classifier to build µ-models The model returns the following probability γ = Pr(Vα | Vdesired) Find assignment of values to variables in Vα that maximizes the probability of moving the system to the desired state
Solution Approach: Dynamism o o o As the system keeps running more system states are generated, which could be incorporated into the µ-models are easier to update as compared to monolithic system models As a result of µ-model update n n n o Policy Invalidation Policy Adaptation New Policies can Result This addresses the problem of Dynamism
Solution Approach: Tractability & Trust o o Each self-management action that assigns values to variables in Vα is associated with a probability γ = Pr(Vα | V – Vø) An action is taken only when γ > γthreshold This can be used to fine-tune self-management TANs can be easily understood by administrators
Outline o o o o Motivation: Enterprise Complexity Issues Solution Overview Policy-Driven Self-Management Dynamic SLA Decomposition Results Future Work
Policy-Driven Self-Management o SLO: “Response Time < 10 msec” n n Event: SLO Violation Condition: Bandwidth=90 Mbps, Request Rate=30 Given the goal state (90, 30, 9), find the µ-model to use Current State Goal State Action: set Allocated Servers to (90, 30, 9) 3 (90, 30, 12) 3 1 90 30 12 12 8 9 Allocated Servers Bandwidth Request Rate Response Time
Dynamic SLA Decomposition o Problem: To determine sub-SLAs for components that lead to SLA conformance System-Level SLA o Sub-SLAs can be thought of as per-component range of values for controllable variables SLA 1 o o SLA 2 SLA 3 SLA 4 SLA 5 If each component adheres to the sub-SLAs then the SLA is not violated Our techniques can handle SLA decomposition conformance(SLA 1, SLA 2, …, SLAn) conformance(System SLA)
Experimental Results: SOA Simulator Without Self-Management With Self-Management
Experimental Results: RUBi. S over VMs Without Self-Management Database Perturbation With Self-Management Partition Change
Conclusions & Future Work o o Our techniques are applicable for a variety of enterprise systems In our experiments the techniques have proven to be very scalable and accurate Monitoring overheads can be reduced by taking inputs about relevant variables from the state-space partitions Design & Implement techniques that can proactively avoid SLA violations
Thank You! References [1] V. Kumar, K. Schwan, S. Iyer, Y. Chen, A. Sahai. The statespace approach to SLA-based management. In submission to NOMS 2008. [2] V. Kumar, B. F. Cooper, G. Eisenhauer, K. Schwan. i. Manage: Policy-Driven Self-Management for Enterprise-Scale Systsem. Middleware 2007. [3] V. Kumar, B. F. Cooper, G. Eisenhauer, K. Schwan. Enabling Policy-Driven Self-Management for Enterprise Systems. PBAC 2007 in conjunction with ICAC-2007 [4] V. Kumar, et al. Implementing Diverse Messaging Models with Self-Managing Properties using IFLOW. ICAC 2006


