b9842835b09e7646783232dcb30f0474.ppt
- Количество слайдов: 22
Resource Overbooking and Application Profiling in Shared Hosting Platforms Bhuvan Urgaonkar Prashant Shenoy Timothy Roscoe † UMASS Amherst and Intel Research Computer Science † 1
Motivation cluster E-commerce Streaming Internet Clients Games r Proliferation of Internet applications m Electronic commerce, streaming media, online games, online trading, … r Commonly hosted on clusters of servers m Cheaper Computer Science alternative to large multiprocessors 2
Hosting Platforms r Hosting platform: server cluster that runs third-party applications r Application providers pay for server resources m CPU, disk, network bandwidth, memory r Platform provider guarantees resource availability m Performance guarantees provided to applications r Central challenge: Maximize revenue while providing resource guarantees Computer Science 3
Design Challenges r How to determine an application’s resource needs? r How to provision resources to meet these needs? r How to map applications to nodes in the platform? r How to handle dynamic variations in the load? Computer Science 4
Talk Outline þ Introduction r Inferring Resource Requirements r Provisioning Resources r Handling Dynamic Load Variations r Experimental Evaluation r Related Work Computer Science 5
Hosting Platform Model r Hosting Platforms: Dedicated vs Shared m Dedicated: Applications get integral # nodes m Shared: Applications may get fractional # nodes r Our focus: Shared Hosting Platforms m Nodes may have competing applications r Capsule: component of an application running on a node m Example: e-commerce application: HTTP server, app server, database server Computer Science 6
Provisioning By Overbooking r How should the platform allocate resources? m Provision resources based on worst-case needs r Worst-case provisioning is wasteful m Low platform utilization r Applications may be tolerant to occasional violations m E. g. , CPU guarantees should be met 99% of the time r Possible to provide useful guarantees even after provisioning less than worst-case needs ð Idea: Improve utilization by overbooking resources Computer Science 7
Application Profiling r Profiling: process of determining resource usage m Run the application on an isolated set of nodes m Subject m Model the application to a real workload CPU and network usage as ON-OFF processes Begin CPU quantum End CPU quantum ON OFF time r Use the Linux trace toolkit Computer Science 8
Resource Usage Distribution Measurement Interval time 1 Cumulative Probability 0. 99 r(99) 0 Fractional usage Computer Science 1 0 r(100) Fractional usage 1 9
Capturing Burstiness: Token Bucket r Token Bucket (σ, ρ) m Resource usage over t ≤ σ. t + ρ usage Algorithm by Tang et al r Additional parameter T m Satisfy Computer Science σ1. t + ρ1 σ2. t + ρ2 ρ2 ρ1 time token bucket guarantees only for t ≥ T 10
Profiles of Server Applications Apache Web Server, 50% cgi-bin Probability 0. 25 0. 2 0. 15 0. 1 0. 05 0 Streaming Media Server, 20 clients 0. 3 0. 25 0. 2 0. 15 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 Fraction of CPU 0 0. 08 0. 06 0. 04 0. 02 0. 05 0 Postgres Server, 10 clients 0. 1 Probability 0. 3 0 0. 1. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 Fraction of NW bandwidth 0 0 0. 2 0. 4 0. 6 0. 8 1 Fraction of CPU r Applications exhibit different degrees of burstiness m May have a long tail r Insight: Choose (σ, ρ) based on a high percentile Computer Science 11
Resource Overbooking r Applications specify overbooking tolerance O m Probability with which capsule needs may be violated r Controlled overbooking via admission control: ΣK (σk ·Tmin + ρk)·(1 - Ok) ≤ C·Tmin Pr (ΣKUk > C) ≤ min (O 1, …, Ok) r A node that has sufficient resources for a capsule is feasible for it Computer Science 12
Mapping Capsules to Nodes 1 2 3 capsules 1 2 Final Mapping 3 4 nodes 1 2 3 capsules 1 3 4 nodes r A bipartite graphs of capsules and feasible nodes m Greedy mapping: consider capsules in non-decreasing order of degrees: O( c. Log c ) m Guaranteed to find a placement if one exists! m Multiple feasible nodes => best fit, worst fit, random… Computer Science 13
Handling Flash Crowds r Detect overloads by online profiling Apache Web Server, Offline Profile Apache Web Server, Overload Apache Web Server, Expected Workload 0. 3 0. 25 0. 2 0. 15 0. 1 Probability 0. 3 0. 2 0. 15 0. 1 0. 05 0 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 Fraction of CPU r Reacting to overloads (ongoing work) m Compute new allocations m Change allocations, move capsules, add servers Computer Science 14
Talk Outline þ Introduction þ Inferring Resource Requirements þ Provisioning Resources þ Handling Dynamic Load Variations r Experimental Evaluation r Related Work Computer Science 15
The SHARC Prototype r A Linux-based Shared Hosting Platform m m 6 Dell Poweredge 1550 servers Gigabit Ethernet link r Software Components m Profiling § Vanilla Linux + Linux trace toolkit m Control plane § Overbooking, placement m Qo. S-enhanced Linux kernel § HSFQ schedulers Computer Science 16
Experimental Setup r Prototype running on a 5 node cluster m Each server: 1 GHz PIII with 512 MB RAM and Gigabit ethernet m Control plane runs on a dedicated node m Applications run on the other four nodes r Workload: mix of server applications m Postgre. SQL database server with pgbench (TPC-B) benchmark m Apache web server with SPECWeb 99 (static & dynamic HTTP) m MPEG streaming server with 1. 5 Mb/s VBR MPEG-1 clients m Quake I game server with “terminator” bots Computer Science 17
Resource Overbooking Benefits Placement of Apache Web Servers r Small amounts of overbooking can yield large gains m Bursty Computer Science applications yields larger benefits 18
Capsule Placement Algorithms r Diverse requirements: worst-fit outperforms others r Similar requirements: all perform similarly Computer Science 19
Performance with Overbooking Application Metric Isolated 100 th 99 th 95 th Avg Apache Tput (req/s) 67. 9 67. 51 66. 91 64. 81 39. 8 Postgre. SQL Tput (trans/s) 22. 8 22. 46 22. 21 21. 78 9. 04 Streaming Viol (sec) 0 0 0. 31 0. 59 5. 23 r Performance degradation is within specified overbooking tolerance Computer Science 20
Related Work r Single node resource management m Proportional share schedulers: WFQ, SFQ, BVT, … m Reservation based schedulers: Nemesis, Rialto, … r Cluster-based resource management m Cluster Reserves [Aron 00], Aron thesis [Aron 00] m MUSE [Chase 01]: economic approach m Oceano [IBM], Planetary computing [HP] m Clusters for high availability: Porcupine [Saito 99] m Grid computing Computer Science 21
Concluding Remarks r Resource management in shared hosting platforms m Application profiling to determine resource usage m Revenue maximization using controlled overbooking m Ability to handle dynamic workloads (ongoing work) r URL: http: //lass. cs. umass. edu Computer Science 22
b9842835b09e7646783232dcb30f0474.ppt