Скачать презентацию Systems Support for End-to-End Performance Management Sandip Agarwala Скачать презентацию Systems Support for End-to-End Performance Management Sandip Agarwala

ec5b4492cfdec6d695dde4d848ad9d44.ppt

  • Количество слайдов: 25

Systems Support for End-to-End Performance Management Sandip Agarwala Ph. D Advisor: Karsten Schwan College Systems Support for End-to-End Performance Management Sandip Agarwala Ph. D Advisor: Karsten Schwan College of Computing Georgia Tech

Complexity, complexity… Source: Gartner (December 2005) Complexity, complexity… Source: Gartner (December 2005)

Reasons for Complexity • Application diversity • Interdependencies • Heterogeneous components – Too many Reasons for Complexity • Application diversity • Interdependencies • Heterogeneous components – Too many different technologies and platform • Too little “hints” from the system to the administrators – Legacy issues; Application-specific solutions • Insufficient information about the system to drive self-management Lack of Automation

Online System Management Analyze Workload • • • Proposed • • Monitor Control Execute Online System Management Analyze Workload • • • Proposed • • Monitor Control Execute Scheduling Capacity and SLA management Design evaluation and tuning. Path Approach: Service Bottleneck detection Resource provisioning, accounting, etc.

Service Path • System abstractions that describe the dynamic dependencies between the different distributed Service Path • System abstractions that describe the dynamic dependencies between the different distributed application components • Service Class: Application-level request class, e. g. SLA class Front - end Web Servers Middle-tier Servlet Server Application Logic (EJBs, etc. ) Data Base Back - end Proxy Server In te rn et

Service Path Characteristics • End-to-End analysis • Online • Non-intrusive • Application-generic Service Path Characteristics • End-to-End analysis • Online • Non-intrusive • Application-generic

Outline • Background • Motivation • Service path – Discovery with E 2 EProf Outline • Background • Motivation • Service path – Discovery with E 2 EProf – Refinement with Sys. Prof – Automated SLA Enforcement • Related Work • Future Plans

E 2 EProf • Black-box approach • Correlate per-edge time series signals • Monitor E 2 EProf • Black-box approach • Correlate per-edge time series signals • Monitor network packet traces (source, destination, timestamps) A C B X D D 1 (A B) time (B C) D 2 time Model traces as per-edge time series signals or density functions time

Basic Approach A B X C D • Compute cross-correlation (D 1 D 2) Basic Approach A B X C D • Compute cross-correlation (D 1 D 2) Spike Causality (A B) (B C) (A B) (B D) Delay at B Spike’s position Delay No spike

Evaluation with 4 -tier RUBi. S Tomcat Server 1 Clients I/O bound comm ent Evaluation with 4 -tier RUBi. S Tomcat Server 1 Clients I/O bound comm ent g biddin CPU bound 1 EJB Server 1 My. SQL Server Apache Web Server Tomcat Server 2 EJB Server 2 1 http: //rubis. objectweb. org/

Service Path Detection in RUBi. S Highest delay node Static server assignment Round-robin load Service Path Detection in RUBi. S Highest delay node Static server assignment Round-robin load balancer Highest delay nodes delay node

Change detection in RUBi. S Injected Delay Change detection in RUBi. S Injected Delay

Delta Air Lines’ Application Revenue Pipeline Total Traffic: 1. 34 million / day (56 Delta Air Lines’ Application Revenue Pipeline Total Traffic: 1. 34 million / day (56 k / hour) TACSIN & TACSOUT APEXIN & APEXOUT Error/Warning (Tivoli) Logs XIN & XOUT

Delta Air Lines’ Application Huge request burst S 1 S 2 Latency (sec) Client Delta Air Lines’ Application Huge request burst S 1 S 2 Latency (sec) Client requests TACS S 3 S 7 TACS S 8 Time of the day

Outline • Background • Motivation • Service path – Discovery with E 2 EProf Outline • Background • Motivation • Service path – Discovery with E 2 EProf – Refinement with Sys. Prof – Automated SLA Enforcement • Related Work • Future Plans

Beyond dependency and latency… S 2 C 1 S 6 S 4 S 1 Beyond dependency and latency… S 2 C 1 S 6 S 4 S 1 C 2 S 3 S 5 Solution: Zoom into the servicepath with Sys. Prof • No application hints or instrumentation • Monitor resource usage on per-class basis

Sys. Prof Methodology • Track request context –Work done for processing a request class Sys. Prof Methodology • Track request context –Work done for processing a request class –May span user-level or kernel-level –Executes in more than one contexts (e. g. processes, threads, softirqs) –Happens in a system-visible event (e. g. system calls) system call parameters, PID, App functions A 1 A 2 AN Init CID From client To client FS/ VM/ etc. eth driver Scheduler Context Switches Network Stack Net softirq Scheduler System Call User Kernel BDD Disk I/O Instrumentation points

Class ID Propagation Process CID Front-Tier Msg CID Middle-Tier End-Tier User Kernel Init CID Class ID Propagation Process CID Front-Tier Msg CID Middle-Tier End-Tier User Kernel Init CID From client To client Packet CID Inherits CID

Application of Sys. Prof • Resource Accounting • Utility Billing • Bottleneck detection • Application of Sys. Prof • Resource Accounting • Utility Billing • Bottleneck detection • Capacity Estimation • Root-Cause Analysis • Black-Box SLA management

Resource-Aware Adaptive Control Separate Queue/Controller for each cluster EJB Server 1 Controller + Scheduler Resource-Aware Adaptive Control Separate Queue/Controller for each cluster EJB Server 1 Controller + Scheduler My. SQL Server Front-end Class 1 Tomcat Server 1 Class 2 Class 3 Cluster workloads contending for same resources Tomcat Server 2 EJB Server 2

Resource-Aware Adaptive Control Capacity = 80 req/s per server No Sys. Prof With Sys. Resource-Aware Adaptive Control Capacity = 80 req/s per server No Sys. Prof With Sys. Prof

Summary • Service Path – System abstractions to represent dependencies and request path • Summary • Service Path – System abstractions to represent dependencies and request path • E 2 EProf and Pathmap – Dependency and latency analysis • Sys. Prof – Service-based resource analysis • Aid human operator and automate end-to-end performance management

Thank You! Questions? Email: sandip@cc. gatech. edu Thank You! Questions? Email: sandip@cc. gatech. edu

Extra Slides Extra Slides

Pathmap Optimizations Packet timestamp trace Bursty traffic time Sliding window (W) W Cross-correlation series Pathmap Optimizations Packet timestamp trace Bursty traffic time Sliding window (W) W Cross-correlation series Run-length compression time Time-series signal Or Density Function time Upper-bound On latency