Mor Harchol-Balter Carnegie Mellon University School of Computer

Mor Harchol-Balter Carnegie Mellon University School of Computer Science 1

Q: Which minimizes mean response time? jobs FCFS “size” = service requirement load r < 1 jobs PS jobs SRPT 2

Q: Which best represents scheduling in web servers ? jobs FCFS “size” = service requirement load r < 1 jobs PS jobs SRPT 3

IDEA How about using SRPT instead of PS in web servers? client 1 “Get File 1” client 2 “Get File 2” client 3 Internet WEB SERVER (Apache) Linux 0. S. “Get File 3” 4

Immediate Objections 1) Can’t assume known job size Many servers receive mostly static web requests. “GET FILE” For static web requests, know file size Approx. know service requirement of request. 2) But the big jobs will starve. . . 5

Outline of Talk THEORY Wierman (M/G/1) [BH – Sigmetrics 01] “Analysis of SRPT: Investigating Unfairness” [HSW-Performance 02] “Asymptotic Convergence of Scheduling Policies…” [WH – Sigmetrics 03*] “Classifying Scheduling Policies wrt Unfairness …” Schroeder IMPLEMENT [HSBA – TOCS 03] “Size-based Scheduling to Improve Web Performance” [SH – ITC 03*] “Web servers under overload: How scheduling can help” [MSAH – ICDE 03] “Priority Mechanisms for OLTP and Web Applications” www. cs. cmu. edu/~harchol/ 6

THEORY SRPT has a long history. . . 1966 Schrage & Miller derive M/G/1/SRPT response time: 1968 Schrage proves optimality 1979 Pechinkin & Solovyev & Yashkov generalize 1990 Schassberger derives distribution on queue length BUT WHAT DOES IT ALL MEAN? 7

THEORY SRPT has a long history (cont. ) 1990 - 97 7 -year long study at Univ. of Aachen under Schreiber SRPT WINS BIG ON MEAN! 1998, 1999 Slowdown for SRPT under adversary: Rajmohan, Gehrke, Muthukrishnan, Rajaraman, Shaheen, Bender, Chakrabarti, etc. SRPT STARVES BIG JOBS! Various o. s. books: Silberschatz, Stallings, Tannenbaum: Warn about starvation of big jobs. . . Kleinrock’s Conservation Law: “Preferential treatment given to one class of customers is afforded at the expense of other customers. ” 8

Unfairness Question Let r=0. 9. Let G: Bounded Pareto(a = 1. 1, max=1010) ? PS M/G/1 ? Question: Which queue does biggest job prefer? SRPT M/G/1 9

Results on Unfairness Let r=0. 9. Let G: Bounded Pareto(a = 1. 1, max=1010) I SRPT PS SRPT 10

Unfairness – General Distribution All-can-win-theorem: For all distributions, if r < ½, E[T(x)]SRPT < E[T(x)]PS for all x. 11

All-can-win-theorem: For all distributions, if r < ½, E[T(x)]SRPT £ E[T(x)]PS for all x. Proof idea: x 2 + l x F ( x) l ò t f (t ) dt 2 0 2(1 - r ( x)) 2 Waiting time (SRPT) x + dt ò 1 - r (t ) 0 Residence (SRPT) Total (PS) 12

Classification of Scheduling Policies ALWAYS FAIR For all loads, for all service distributions, E[T ( x)]P £ E[T ( x)]PS , "x ALWAYS UNFAIR For all loads, for all service distributions, > E[T ( x)]PS $ x, E[T ( x)] P SOMETIMES UNFAIR For some loads: E[T ( x)]P £ E[T ( x)]PS , "x For other loads : x, E[T ( x)]P > E[T ( x)]PS $ 13

Classification of Scheduling Policies PS FSP PLCFS Always FAIR FB Preemptive Size-based Policies PSJF Age. Based Policies Always Unfair LRPT Remaining Size-based Policies SRPT FCFS Nonpreemptive Sometimes Unfair LJF SJF Lots of open problems… 14

IMPLEMENT From theory to practice: What does SRPT mean within a Web server? • Many devices: Where to do the scheduling? • No longer one job at a time. 15

$IMPLEMENT Server’s Performance Bottleneck Site buys limited fraction of ISP’s bandwidth client 1 “Get$

IMPLEMENT Server’s Performance Bottleneck Site buys limited fraction of ISP’s bandwidth client 1 “Get File 1” client 2 “Get File 2” client 3 Rest of Internet WEB SERVER (Apache) Linux 0. S. ISP “Get File 3” 5 We model bottleneck by limiting bandwidth on server’s uplink. 16

IMPLEMENT Network/O. S. insides of traditional Web server Socket 1 Client 1 Network Card Client 2 Socket 2 Web Server BOTTLENECK Client 3 Sockets take turns draining --- FAIR = PS. 17

IMPLEMENT Network/O. S. insides of our improved Web server Socket 1 Client 2 Client 3 Network Card BOTTLENECK 1 st 2 nd 3 rd S Socket 2 M Socket 3 priority queues. Web Server L Socket corresponding to file with smallest remaining data gets to feed first. 18

200 Linux 1 2 3 200 Linux WAN EMU 1 2 3 Experimental Setup 1 2 3 APACHE WEB SERVER switch Linux 0. S. Implementation SRPT-based scheduling: 1) Modifications to Linux O. S. : 6 priority Levels 2) Modifications to Apache Web server 3) Priority algorithm design. 19

1 2 3 200 Linux WAN EMU Experimental Setup Flash Apache 10 Mbps uplink 1 APACHE WEB 2 SERVER 3 switch Linux 0. S. Trace-based workload: 100 Mbps uplink Surge Trace-based Open system Partly-open WAN EMU Number requests made: 1, 000 Size of file requested: 41 B -- 2 MB Distribution of file sizes requested has HT property. Geographicallydispersed clients Load < 1 Transient overload + Other effects: initial RTO; user abort/reload; persistent connections, etc. 20

1 2 3 200 Linux WAN EMU Preliminary Comments 1 APACHE WEB 2 SERVER 3 switch Linux 0. S. • Job throughput, byte throughput, and bandwidth utilization were same under SRPT and FAIR scheduling. • Same set of requests complete. • No additional CPU overhead under SRPT scheduling. Network was bottleneck in all experiments. 21

Results: Mean Response Time (LAN) Mean Response Time (sec) . . FAIR . SRPT . Load 22

Mean Response time (ms) Mean Response Time vs. Size Percentile (LAN) Load =0. 8 FAIR SRPT Percentile of Request Size 23

Transient Overload r>1 r<1 r>1 r<1 24

Transient Overload - Baseline Mean response time FAIR SRPT 25

Transient overload Response time as function of job size FAIR SRPT small jobs win big! WHY? big jobs aren’t hurt! 26

FACTORS Baseline Case WAN propagation delays RTT: 0 – 150 ms WAN loss Loss: 0 – 15% WAN loss + delay RTT: 0 – 150 ms, Persistent Connections 0 – 10 requests/conn. Initial RTO value RTO = 0. 5 sec – 3 sec SYN Cookies ON/OFF User Abort/Reload Abort after 3 – 15 sec, with 2, 4, 6, 8 retries. Packet Length Packet length = 536 – 1500 Bytes Realistic Scenario RTT = 100 ms; Loss = 5%; 5 requests/conn. , RTO = 3 sec; pkt len = 1500 B; User aborts After 7 sec and retries up to 3 times. Loss: 0 – 15% 27

Transient Overload - Realistic Mean response time FAIR SRPT 28

Conclusion so far … q SRPT scheduling is a promising solution for reducing mean response time seen by clients, particularly when the load at server bottleneck is high, or under transient overload conditions. q SRPT results in negligible or zero unfairness to large requests. q SRPT is easy to implement and efficient. No CPU overhead. No drop in throughput. q Results corroborated via implementation and analysis. 29

More questions … STATIC web requests Everything so far in talk … Wierman Schroeder DYNAMIC web requests Current work… Schroeder Mc. Wherter 30

Online Shopping client 1 “buy” client 2 “buy” Internet Web Server (eg: Apache/Linux) client 3 “buy” Database (eg: DB 2, Oracle, Postgre. SQL) • Dynamic responses take much longer – 10 sec • Database is bottleneck. 31

Online Shopping client 1 “$$$buy$$$” client 2 “buy” Internet Web Server (eg: Apache/Linux) client 3 “buy” Database (eg: DB 2, Oracle, Postgre. SQL) Goal: Prioritize requests 32

Isn’t “prioritizing requests” problem already solved? “$$$buy$$$” “buy” Internet Web Server (eg: Apache/Linux) Database (eg: DB 2, Oracle, Postgre. SQL) No. Prior work mostly simulation or RTDBMS. 33

Which resource to prioritize? “$$$buy$$$” “buy” Internet Web Server (eg: Apache/Linux) Internet High-Priority client Low-Priority client Database Disks CPU(s) Locks 34

Q: Which resource to prioritize? “$$$buy$$$” “buy” Internet Web Server (eg: Apache/Linux) Internet High-Priority client Low-Priority client Database Disks CPU(s) A: 2 PL Lock Queues Locks 35

What is bottleneck resource? Fix at 10 warehouses #clients = 10 x #warehouses • IBM DB 2 -- Lock waiting time (yellow) is bottleneck. • Therefore, need to schedule lock queues to have impact. 36

Why lock scheduling is hard Lock resource 1 L L H H L Lock resource 2 L L H H L NP H may wait long time NPinherit Speeding up L may hurt H in long run Pabort Rollback cost + wasted work + really hurt L’s. 37

Results: Implementation study of NP, NPinherit, Pabort under TPC-C workload, Shore DBMS Develop new policy POW (Preempt on Wait) 38

Results: Response time Pabort NPinherit Pabort Think time 39

Results: Response time Pabort NPinherit POW: Best of both NPinherit Pabort Think time 40

More work in SYNC project… q Qo. S from outside the box “$$$buy$$$” “buy” Internet Web Server Qo. S DBMS (eg: DB 2, Oracle) “buy” q Scheduling the Tera. Grid PSC SDSC NCSA q Time-varying load in systems q Impact of closed versus open system models 41

Conclusion Scheduling is a very cheap solution… No need to buy new hardware No need to buy more memory Small software modifications …with a potentially very big win in some situations. Thank you! 42