Скачать презентацию Vague idea Experimental Lifecycle groping around experiences Evidence Скачать презентацию Vague idea Experimental Lifecycle groping around experiences Evidence

a83cadb57118d2fe19860988f6b42add.ppt

  • Количество слайдов: 50

Vague idea Experimental Lifecycle “groping around” experiences Evidence of real problem, justification, opportunity, feasibility, Vague idea Experimental Lifecycle “groping around” experiences Evidence of real problem, justification, opportunity, feasibility, understanding Initial observations Boundary of system under test, workload & system Hypothesis parameters that affect behavior. Data, analysis, interpretation Results & final Presentation © 1998, Geoff Kuenning Model Experiment Questions that test the model. metrics to answer questions, factors to vary, levels of factors.

Workloads Made-up Live workload Data sets Microbenchmark programs monitor generator Traces analysis © 2003, Workloads Made-up Live workload Data sets Microbenchmark programs monitor generator Traces analysis © 2003, Carla Ellis © 1998, Geoff Kuenning prototype real sys Synthetic benchmark programs Benchmark applications “Real” workloads Experimental environment execdriven sim Synthetic traces tracedriven sim Distributions & other statistics stochastic sim

What is a Workload? • A workload is anything a computer is asked to What is a Workload? • A workload is anything a computer is asked to do • Test workload: any workload used to analyze performance • Real workload: any observed during normal operations • Synthetic: created for controlled testing © 1998, Geoff Kuenning

Workload Issues • Selection of benchmarks – Criteria: • • Repeatability Availability and community Workload Issues • Selection of benchmarks – Criteria: • • Repeatability Availability and community acceptance Representative of typical usage (e. g. timeliness) Predictive of real performance – realistic (e. g. scaling issue) – Types • Tracing workloads & using traces – Monitor design – Compression, anonymizing realism (no feedback) • Workload characterization • Workload generators © 2003, Carla Ellis © 1998, Geoff Kuenning Choosing an unbiased workload is key to designing an experiment that can disprove an hypothesis.

Types: Real (“live”) Workloads • Advantage is they represent reality • Disadvantage is they’re Types: Real (“live”) Workloads • Advantage is they represent reality • Disadvantage is they’re uncontrolled – Can’t be repeated – Can’t be described simply – Difficult to analyze • Nevertheless, often useful for “final analysis” papers • “Deployment experience” © 1998, Geoff Kuenning

Types: Synthetic Workloads • Advantages: – Controllable – Repeatable – Portable to other systems Types: Synthetic Workloads • Advantages: – Controllable – Repeatable – Portable to other systems – Easily modified • Disadvantage: can never be sure real world will be the same (i. e. , are they representative? ) © 1998, Geoff Kuenning

Instruction Workloads • Useful only for CPU performance – But teach useful lessons for Instruction Workloads • Useful only for CPU performance – But teach useful lessons for other situations • Development over decades – “Typical” instruction (ADD) – Instruction mix (by frequency of use) • Sensitive to compiler, application, architecture • Still used today (MFLOPS) © 1998, Geoff Kuenning

Instruction Workloads (cont’d) • Modern complexity makes mixes invalid – Pipelining – Data/instruction caching Instruction Workloads (cont’d) • Modern complexity makes mixes invalid – Pipelining – Data/instruction caching – Prefetching • Kernel is inner loop that does useful work: – Sieve, matrix inversion, sort, etc. – Ignores setup, I/O, so can be timed by analysis if desired (at least in theory) © 1998, Geoff Kuenning

Synthetic Programs • Complete programs – Designed specifically for measurement – May do real Synthetic Programs • Complete programs – Designed specifically for measurement – May do real or “fake” work – May be adjustable (parameterized) • Two major classes: – Benchmarks – Exercisers © 1998, Geoff Kuenning

Real-World Benchmarks • • Pick a representative application Pick sample data Run it on Real-World Benchmarks • • Pick a representative application Pick sample data Run it on system to be tested Modified Andrew Benchmark, MAB, is a realworld benchmark • Easy to do, accurate for that sample data • Fails to consider other applications, data © 1998, Geoff Kuenning

Application Benchmarks • Variation on real-world benchmarks • Choose most important subset of functions Application Benchmarks • Variation on real-world benchmarks • Choose most important subset of functions • Write benchmark to test those functions • Tests what computer will be used for • Need to be sure important characteristics aren’t missed © 1998, Geoff Kuenning

“Standard” Benchmarks • Often need to compare general-purpose computer systems for general-purpose use – “Standard” Benchmarks • Often need to compare general-purpose computer systems for general-purpose use – E. g. , should I buy a Compaq or a Dell PC? – Tougher: Mac or PC? • Desire for an easy, comprehensive answer • People writing articles often need to compare tens of machines © 1998, Geoff Kuenning

“Standard” Benchmarks (cont’d) • Often need to make comparisons over time – Is this “Standard” Benchmarks (cont’d) • Often need to make comparisons over time – Is this year’s Power. PC faster than last year’s Pentium? • Obviously yes, but by how much? • Don’t want to spend time writing own code – Could be buggy or not representative – Need to compare against other people’s results • “Standard” benchmarks offer a solution © 1998, Geoff Kuenning

 • • • Popular “Standard” Benchmarks Sieve Ackermann’s function Whetstone Linpack Dhrystone Livermore • • • Popular “Standard” Benchmarks Sieve Ackermann’s function Whetstone Linpack Dhrystone Livermore loops Debit/credit SPEC MAB © 1998, Geoff Kuenning

Sieve and Ackermann’s Function • Prime number sieve (Erastothenes) – Nested for loops – Sieve and Ackermann’s Function • Prime number sieve (Erastothenes) – Nested for loops – Usually such small array that it’s silly • Ackermann’s function – Tests procedure calling, recursion – Not very popular in Unix/PC community © 1998, Geoff Kuenning

Whetstone • Dates way back (can compare against 70’s) • Based on real observed Whetstone • Dates way back (can compare against 70’s) • Based on real observed frequencies • Entirely synthetic (no useful result) • Mixed data types, but best for floating • Be careful of incomparable variants! © 1998, Geoff Kuenning

LINPACK • Based on real programs and data • Developed by supercomputer users • LINPACK • Based on real programs and data • Developed by supercomputer users • Great if you’re doing serious numerical computation © 1998, Geoff Kuenning

Dhrystone • Bad pun on “Whetstone” • Motivated by Whetstone’s perceived excessive emphasis on Dhrystone • Bad pun on “Whetstone” • Motivated by Whetstone’s perceived excessive emphasis on floating point • Dates back to when p’s were integeronly • Very popular in PC world • Again, watch out for version mismatches © 1998, Geoff Kuenning

Livermore Loops • Outgrowth of vector-computer development • Vectorizable loops • Based on real Livermore Loops • Outgrowth of vector-computer development • Vectorizable loops • Based on real programs • Good for supercomputers • Difficult to characterize results simply © 1998, Geoff Kuenning

Debit/Credit Benchmark • Developed for transaction processing environments – CPU processing is usually trivial Debit/Credit Benchmark • Developed for transaction processing environments – CPU processing is usually trivial – Remarkably demanding I/O, scheduling requirements • Models real TPS workloads synthetically • Modern version is TPC benchmark © 1998, Geoff Kuenning

Modified Andrew Benchmark • Used in research to compare file system, operating system designs Modified Andrew Benchmark • Used in research to compare file system, operating system designs • Based on software engineering workload • Exercises copying, compiling, linking • Probably ill-designed, but common use makes it important © 1998, Geoff Kuenning

TPC Benchmarks Transaction Processing Performance Council • TPC-APP – applications server and web services, TPC Benchmarks Transaction Processing Performance Council • TPC-APP – applications server and web services, B 2 B transactions server • TPC-C – on-line transaction processing • TPC-H – ad hoc decision support • Considered obsolete: TPC-A, B, D, R, W • www. tpc. org © 1998, Geoff Kuenning

SPEC Benchmarks Standard Performance Evaluation Corp. • Result of multi-manufacturer consortium • Addresses flaws SPEC Benchmarks Standard Performance Evaluation Corp. • Result of multi-manufacturer consortium • Addresses flaws in existing benchmarks • Uses real applications, trying to characterize specific real environments • Becoming standard comparison method • www. spec. org © 1998, Geoff Kuenning

SPEC CPU 2000 • • Considers multiple CPUs Integer Floating point Geometric mean gives SPEC CPU 2000 • • Considers multiple CPUs Integer Floating point Geometric mean gives SPECmark for system • Working on CPU 2006 © 1998, Geoff Kuenning

SFS (SPEC System File Server) • Measures NSF servers • Operation mix that matches SFS (SPEC System File Server) • Measures NSF servers • Operation mix that matches real workloads SPEC Mail • Mail server performance based on SMTP and POP 3 • Characterizes throughput and response time © 1998, Geoff Kuenning

SPECweb 2005 • Evaluating web server performance • Includes dynamic content, caching effects, banking SPECweb 2005 • Evaluating web server performance • Includes dynamic content, caching effects, banking and e-commerce sites • Simultaneous user sessions SPECj. APP, JBB, JVM • Java servers, business apps, JVM client © 1998, Geoff Kuenning

Media. Bench • JPEG image encoding and decoding • MPEG video encoding and decoding Media. Bench • JPEG image encoding and decoding • MPEG video encoding and decoding • GSM speech transcoding • G. 721 voice compression • PGP digital sigs © 1998, Geoff Kuenning • PEGWIT public key encryption • Ghostscript for postscript • RASTA speech recognition • MESA 3 -D graphics • EPIC image compression

Others • Bio. Bench – DNA and protein sequencing applications • Tiny. Bench – Others • Bio. Bench – DNA and protein sequencing applications • Tiny. Bench – for Tiny. OS sensor networks © 1998, Geoff Kuenning

Exercisers and Drivers (Microbenchmarks) • For I/O, network, non-CPU measurements • Generate a workload, Exercisers and Drivers (Microbenchmarks) • For I/O, network, non-CPU measurements • Generate a workload, feed to internal or external measured system – I/O on local OS – Network • Sometimes uses dedicated system, interface hardware © 1998, Geoff Kuenning

Advantages /Disadvantages + + - Easy to develop, port Incorporate measurement Easy to parameterize, Advantages /Disadvantages + + - Easy to develop, port Incorporate measurement Easy to parameterize, adjust Good for “diagnosis” of problem, isolating bottleneck Often too small compared to real workloads • Thus not representative • May use caches “incorrectly” - Often don’t have real CPU activity • Affects overlap of CPU and I/O - Synchronization effects caused by loops © 1998, Geoff Kuenning

Workload Selection • • • Services Exercised Level of Detail Representativeness Timeliness Other Considerations Workload Selection • • • Services Exercised Level of Detail Representativeness Timeliness Other Considerations © 1998, Geoff Kuenning

Services Exercised • What services does system actually use? – Faster CPU won’t speed Services Exercised • What services does system actually use? – Faster CPU won’t speed “cp” – Network performance useless for matrix work • What metrics measure these services? – MIPS for CPU speed – Bandwidth for network, I/O – TPS for transaction processing © 1998, Geoff Kuenning

Completeness • Computer systems are complex – Effect of interactions hard to predict – Completeness • Computer systems are complex – Effect of interactions hard to predict – So must be sure to test entire system • Important to understand balance between components – I. e. , don’t use 90% CPU mix to evaluate I/O -bound application © 1998, Geoff Kuenning

Component Testing • Sometimes only individual components are compared – Would a new CPU Component Testing • Sometimes only individual components are compared – Would a new CPU speed up our system? – How does IPV 6 affect Web server performance? • But component may not be directly related to performance © 1998, Geoff Kuenning

Service Testing • May be possible to isolate interfaces to just one component – Service Testing • May be possible to isolate interfaces to just one component – E. g. , instruction mix for CPU • Consider services provided and used by that component • System often has layers of services – Can cut at any point and insert workload © 1998, Geoff Kuenning

Characterizing a Service • Identify service provided by major subsystem • List factors affecting Characterizing a Service • Identify service provided by major subsystem • List factors affecting performance • List metrics that quantify demands and performance • Identify workload provided to that service © 1998, Geoff Kuenning

Example: Web Server Web Page Visits Web Client TCP/IP Connections Network HTTP Requests Web Example: Web Server Web Page Visits Web Client TCP/IP Connections Network HTTP Requests Web Server Web Page Accesses File System Disk Transfers Disk Drive © 1998, Geoff Kuenning

Web Client Analysis • Services: visit page, follow hyperlink, display information • Factors: page Web Client Analysis • Services: visit page, follow hyperlink, display information • Factors: page size, number of links, fonts required, embedded graphics, sound • Metrics: response time • Workload: a list of pages to be visited and links to be followed © 1998, Geoff Kuenning

Network Analysis • Services: connect to server, transmit request, transfer data • Factors: bandwidth, Network Analysis • Services: connect to server, transmit request, transfer data • Factors: bandwidth, latency, protocol used • Metrics: connection setup time, response latency, achieved bandwidth • Workload: a series of connections to one or more servers, with data transfer © 1998, Geoff Kuenning

Web Server Analysis • Services: accept and validate connection, fetch HTTP data • Factors: Web Server Analysis • Services: accept and validate connection, fetch HTTP data • Factors: Network performance, CPU speed, system load, disk subsystem performance • Metrics: response time, connections served • Workload: a stream of incoming HTTP connections and requests © 1998, Geoff Kuenning

File System Analysis • Services: open file, read file (writing doesn’t matter for Web File System Analysis • Services: open file, read file (writing doesn’t matter for Web server) • Factors: disk drive characteristics, file system software, cache size, partition size • Metrics: response time, transfer rate • Workload: a series of file-transfer requests © 1998, Geoff Kuenning

Disk Drive Analysis • • Services: read sector, write sector Factors: seek time, transfer Disk Drive Analysis • • Services: read sector, write sector Factors: seek time, transfer rate Metrics: response time Workload: a statistically-generated stream of read/write requests © 1998, Geoff Kuenning

Level of Detail • Detail trades off accuracy vs. cost • Highest detail is Level of Detail • Detail trades off accuracy vs. cost • Highest detail is complete trace • Lowest is one request, usually most common • Intermediate approach: weight by frequency • We will return to this when we discuss workload characterization © 1998, Geoff Kuenning

Representativeness • Obviously, workload should represent desired application – Arrival rate of requests – Representativeness • Obviously, workload should represent desired application – Arrival rate of requests – Resource demands of each request – Resource usage profile of workload over time • Again, accuracy and cost trade off • Need to understand whether detail matters © 1998, Geoff Kuenning

Timeliness • Usage patterns change over time – File size grows to match disk Timeliness • Usage patterns change over time – File size grows to match disk size – Web pages grow to match network bandwidth • If using “old” workloads, must be sure user behavior hasn’t changed • Even worse, behavior may change after test, as result of installing new system © 1998, Geoff Kuenning

Other Considerations • Loading levels – Full capacity – Beyond capacity – Actual usage Other Considerations • Loading levels – Full capacity – Beyond capacity – Actual usage • External components not considered as parameters • Repeatability of workload © 1998, Geoff Kuenning

For Discussion Next Tuesday • Survey the types of workloads – especially the standard For Discussion Next Tuesday • Survey the types of workloads – especially the standard benchmarks – used in your proceedings (10 papers). © 2003, Carla Ellis © 1998, Geoff Kuenning

Metrics Discussion Mobisys submissions • Ad hoc routing: – Reliability - packet delivery ratio Metrics Discussion Mobisys submissions • Ad hoc routing: – Reliability - packet delivery ratio – #packets delivered to sink #packets generated by source – Energy consumption – total over all nodes – Average end-to-end packet delay (only over delivered packets) – Energy * delay / reliability (not clear what this delay was – average above? ) © 1998, Geoff Kuenning

Metrics Discussion (cont) • Route cache TTL: – End-to-end routing delay (an “ends” metric) Metrics Discussion (cont) • Route cache TTL: – End-to-end routing delay (an “ends” metric) • When the cache yields a path successfully, it counts as zero. – Control overhead (an “ends” metric) – Path residual time (a “means” metric) estimated as expected path duration - E(min (individual link durations)) • Shared sensor queries – Average relative error in sampling interval (elapsed time between two consecutively numbered returned tuples, given fixed desired sampling period). © 1998, Geoff Kuenning

Metrics Discussion (cont) • 2 Hoarding (caching/prefetching) papers: – Miss rate (during disconnected access) Metrics Discussion (cont) • 2 Hoarding (caching/prefetching) papers: – Miss rate (during disconnected access) – Miss free hoard size (how large cache must be for 0 misses) – Content completeness – probability of a content page requestd by a user being located in cache (hit rate on subset of web pages that are content not just surfing through) © 1998, Geoff Kuenning