Prophesy Analysis and Modeling of Parallel and Distributed

Prophesy: Analysis and Modeling of Parallel and Distributed Applications Valerie Taylor Texas A&M University Seung-Hye Jang, Mieke Prajugo, Xingfu Wu – TAMU Ewa Deelman – ISI Juan Gilbert – Auburn University Rick Stevens – Argonne National Laboratory SPONSORS: NSF, NASA http: //prophesy. cs. tamu. edu

Performance Modeling n n Necessary for good performance Requires significant time and effort http: //prophesy. cs. tamu. edu 2

Outline n Prophesy Infrastructure n Modeling Techniques n Case Studies n Summary http: //prophesy. cs. tamu. edu 3

Problem Statement n Given: • Performance models and analyses are critical – Requires significant development time • Parallel and distributed systems are complex Ø Goal Ø Efficient execution of parallel & distributed applications n Proposed Solution • Automate as much as possible • Community involvement http: //prophesy. cs. tamu. edu 4

Prophesy System PROPHESY GUI Profiling & Instrument. Template Database Model Builder Performance Database Actual Execution Performance Predictor Systems Database DATA COLLECTION http: //prophesy. cs. tamu. edu DATABASES DATA ANALYSIS 5

Automated Instrumentation n n Profiling & Instrument. n Actual Execution In-line data collection Instrument at one of several predefined levels Allow for user-specified instrumentation T=E * f; for (I=1; I<N; I++){ V(I) = A(I) * C(I); B(I) = A(2 I + 4); } T=E * f; INSTRUMENTATION CODE for (I=1; I<N; I++){ V(I) = A(I) * C(I); B(I) = A(2 I + 4); } INSTRUMENTATION CODE http: //prophesy. cs. tamu. edu 6

Databases n Template Database Performance Database n Hierarchical organization Organized into 4 areas: • • Application Executable Run Performance Statistics Systems Database http: //prophesy. cs. tamu. edu 7

Prophesy Database Application Executable Module_Info Modules Compilers Model Template Functions Function_Info Application Performance Run Inputs Systems Resource Connection Function Performance Basic Unit Performance Control Flow Model_Info http: //prophesy. cs. tamu. edu Library Data Structure Performance 8

Data Analysis n Model Builder n n Performance Predictor n n http: //prophesy. cs. tamu. edu Develop performance models Make predictions Performance tune codes Identify best implementation Identify trends 9

Automated Modeling Techniques n n Utilize information in the template and system databases Currently include three techniques • Curve fitting • Parameterization • Composition using coupling values http: //prophesy. cs. tamu. edu 10

Curve Fitting: Usage Analytical Equation (Octave: LSF) Performance Data Matrix-matrix multiply: LSF : 3 Model Template http: //prophesy. cs. tamu. edu Application Performance Function Performance Basic Unit Performance Data Structure Performance 11

Matrix-matrix multiplication, 16 P, IBM SP http: //prophesy. cs. tamu. edu 12

Parameterization: Usage Analytical Equation (Octave: Parameterization) System Data: Matrix-matrix multiply: Parameterization : Parameter(P, SGI Origin 2000, N, ADDM, MPISR, MPIBC) Model Template http: //prophesy. cs. tamu. edu MPISR, MPIBC, ADDM Systems Resource Connection 13

Modeling Techniques n Curve Fitting • Easy to generate the model • Very few exposed parameters n Parameterization • Requires one-time manual analysis • Exposes many parameters • Explore different system scenarios n Coupling • Builds upon previous techniques • Identify how to combine kernel models http: //prophesy. cs. tamu. edu 14

Kernel Coupling n n Two kernels (i & j) Three measurements • • • n Pi: performance of kernel i isolated Pj: performance of kernel j isolated Pij: performance of kernels i & j coupled Compute Cij = http: //prophesy. cs. tamu. edu Pij Pi + Pj 15

Coupling Categories n n n Cij = 1: no coupling Cij > 1: destructive coupling Cij < 1: constructive coupling http: //prophesy. cs. tamu. edu 16

Coupling Categories Cij = 1: No Coupling Kernel A Kernel B Cij > 1: Destructive Coupling Kernel A Shared Resource http: //prophesy. cs. tamu. edu Kernel B Kernel A Kernel B Shared Resource Cij < 1: Constructive Coupling Kernel A Kernel B Shared Resource 17

Using Coupling Parameters n n Use weighted averages to determine how to combine coupling values Example: • Given the pair-wise coupling values Want: T = Kernel A Kernel B Kernel C http: //prophesy. cs. tamu. edu EA + EB + EC = (CAB * PAB + CAC * PAC ) PAB + PAC = (CAB * PAB + CBC * PBC ) PAB + PBC = (CBC * PBC+ CAC * PAC ) PBC + PAC 18

Composition Method n Synthetic kernels (array updates) Kernel A (196. 44) Kernel B (207. 16) Kernel C (574. 19) = 0. 8472 Kernel Pair Coupling A - B 0. 97 B - C 0. 75 C - A 0. 76 = 0. 8407 = 0. 7591 Actual total time: 799. 63 s Coupling time: 776. 52 s (Error: 2. 89%) Adding individual times: 971. 81 s (Error: 23%) http: //prophesy. cs. tamu. edu 19

Coupling Method: Usage Analytical Equation (Octave: Coupling) Data and System Info Adjacent Kernels Coupling Values and Performance data Run Functions Inputs Systems Control Flow http: //prophesy. cs. tamu. edu Function Performance Coupling 20

Case Studies n Predication: Resource Allocation • • • n Grid Physics Network (Gri. Phy. N) Utilizes Grid 2003 infrastructure Geo. LIGO application Prediction: Resource Allocation • • AADMLSS: Educational Application Utilizes multiple servers http: //prophesy. cs. tamu. edu 21

Case 1: GEO LIGO n (Gri. Phy. N) The pulsar search is a process of finding celestial objects that may emit gravitational waves • GEO (German-English Observatory) LIGO (Laser Interferometer Gravitationalwave Observatory) pulsar search is the most frequent coherent search method that generates F-statistic for known pulsars http: //prophesy. cs. tamu. edu 22

Gri. Phy. N Tra nsfo rm usin g VDL Chimera Virtual Data System Grid Middleware Submission Resource Selection Prophesy Ganglia Monitoring GRID 2003 http: //prophesy. cs. tamu. edu 23

Resource Selector Prophesy Application Name Input Parameters, List of available sites http: //prophesy. cs. tamu. edu Interface Predictor Rankings of sites Weights of each site 24

Grid 2003 Testbed http: //prophesy. cs. tamu. edu 25

Execution Environment Site Name CPUs Batch Compute Nodes Processors Cache Size Memory alliance. unm. edu (UNM) 436 PBS 1 X PIII 731 GHz 256 KB 1 GB atlas. iu. edu (IU) 400 PBS 2 X Intel Xeon 2. 4 GHz 512 KB 2. 5 GB pdsfgrid 3. nersc. gov (PDSF) 349 LSF 2 X PIII 650 -1. 8 GHz 2 X AMD 2100+ - 2600+ 256 KB 2 GB atlas. dpcc. uta. edu (UTA) 158 PBS 2 X Intel Xeon 2. 4 – 2. 6 GHz 512 KB 2 GB nest. phys. uwm. edu (UWM) 296 CONDOR 1 X PIII 1 GHz 256 KB 0. 5 GB boomer 1. oscer. ou. edu (OU) 286 PBS 3 X Intel Xeon 2 GHz 512 KB 2 GB cmsgrid. hep. wisc. edu (UWMadison) 64 CONDOR 1 X Intel Xeon 2. 8 GHz 512 KB 2 GB cluster 28. knu. ac. kr (KNU) 104 CONDOR 1 X AMD Athlon XP 1700+ 256 KB 0. 8 GB acdc. ccr. buffalo. edu (Ubuffalo) 74 PBS 1 X Intel Xeon 1. 6 GHz 256 KB 3. 7 GB http: //prophesy. cs. tamu. edu 26

Experimental Results Parameters Prediction-based Load-based Selected Site Time (sec) 59. 05% UWMilwaukee 48065. 83 60. 09% 11360. 28 74. 91% KNU 7676. 56 62. 87% PDSF 20197. 88 -9. 37% UNM 77298. 13 71. 42% UTA 27412. 45 40. 84% UWMadison 31555. 10 48. 61% Ubuffalo 3226. 00 57. 67% UWMilwaukee 16009. 82 91. 47% IU 7343. 37 8. 44% KNU 8287. 77 18. 88% PDSF 13561. 01 0. 00% UNM 52379. 31 74. 65% 10121. 27 Ubuffalo 19649. 22 48. 49% IU 11158. 72 9. 30% 5241. 28 Ubuffalo 20799. 05 74. 80% UWM 51936. 49 89. 91% 19184. 36 UWMadison 24995. 94 23. 25% OU 23441. 16 18. 16% IU 13278. 68 UTA 20453. 30 35. 08% UWMadison 14137. 44 6. 07% IU 25021. 39 UWMadison 26246. 68 4. 67% OU 31538. 22 20. 66% Alpha Freq 0. 0065 0. 002 PDSF 3863. 66 UWMadison 9435. 80 0. 0085 0. 001 IU 2850. 39 UWMadison 0. 0075 0. 009 IU 22090. 17 0. 0055 0. 009 IU 16216. 25 0. 0005 0. 009 PDSF 1365. 51 0. 0075 0. 003 PDSF 6723. 30 0. 0065 0. 007 PDSF 13561. 01 0. 0085 0. 004 PDSF 0. 0035 0. 005 PDSF 0. 0065 0. 009 IU 0. 0045 0. 009 0. 0085 0. 009 Average Site Time (sec) Random 　 Site 　 Time (sec) Error http: //prophesy. cs. tamu. edu 33. 68% 　 Error 58. 62%

Case Study 2: AADMLSS African American Distributed Learning System (AADMLSS) developed by Dr. Juan E. Gilbert http: //prophesy. cs. tamu. edu 28

Site Selection Process User logs into AADMLSS YES NO Valid Username and Password? First time access? YES NO Get last concept Measure Network Performance Get default concept Measure Server Performance Display Concept NO Pass Quiz? NO YES Current concept (different instructor) Select server with best overall site performance Next concept (same instructor) Exit? YES User logs out http: //prophesy. cs. tamu. edu 29

Testbed Overview CATEGORY Loner (TX) Prophesy (TX) Tina (MA) Interact (AL) CPU Speed (MHz) 997. 62 3056. 85 1993. 56 697. 87 Bus Speed (MB/s) 205 856 638 214 Memory (MB) 256 2048 256 Hard Disk (GB) Hardware SPECS 30 146 40 10 Redhat Linux 9. 0 Redhat Linux Enterprise 3. 0 Redhat Linux 9. 0 Web Server Apache 2. 0 Web Application PHP 4. 2 PHP 4. 3 PHP 4. 2 PHP 4. 1 O/S Software http: //prophesy. cs. tamu. edu 30

Course/Module/Co ncept DAY SRT-LOAD (%) NIGHT SRTRANDOM SRT-LOAD (%) SRT-RANDOM (%) 3/0/0 9. 75 16. 97 8. 76 13. 54 3/0/1 12. 58 24. 76 12. 30 22. 54 3/0/2 16. 75 29. 70 15. 75 28. 95 3/0/3 20. 54 27. 10 18. 75 25. 54 3/1/0 9. 14 16. 92 8. 76 13. 96 3/1/1 8. 67 15. 76 8. 01 14. 15 3/1/2 13. 38 23. 57 11. 94 20. 67 3/1/3 12. 16 19. 76 11. 87 19. 11 3/2/0 8. 95 15. 15 8. 64 15. 09 3/2/1 11. 57 17. 40 9. 95 15. 54 3/2/2 10. 95 19. 75 9. 60 15. 27 3/2/3 11. 04 23. 08 12. 54 22. 84 3/3/0 8. 91 15. 94 7. 69 15. 91 3/3/1 9. 07 17. 90 8. 47 16. 95 3/3/2 9. 46 16. 77 9. 31 15. 76 3/3/3 10. 55 19. 57 9. 87 17. 95 20. 01 10. 76 18. 36 AVERAGE 11. 47 http: //prophesy. cs. tamu. edu 4 -Servers 31

Results - 4 Servers http: //prophesy. cs. tamu. edu 32

Results – 3 Servers Concept SRT-LOAD (%) SRT-RANDOM (%) 3/0/0 D 6. 21 14. 05 3/0/1 D 12. 13 21. 94 3/0/2 N 14. 02 25. 83 3/0/3 N 18. 12 23. 52 3/1/0 N 8. 05 12. 04 3/1/1 N 7. 31 12. 25 3/1/2 N 12. 60 18. 74 3/1/3 N 10. 96 19. 11 3/2/0 N 7. 93 12. 58 3/2/1 N 8. 05 14. 25 3/2/2 N 9. 14 15. 97 3/2/3 D 9. 79 20. 58 3/3/0 D 8. 94 13. 64 3/3/1 D 8. 26 16. 74 3/3/2 D 9. 21 15. 21 3/3/3 D 9. 97 19. 36 AVERAGE 10. 04 17. 24 http: //prophesy. cs. tamu. edu 33

Results – 3 Servers http: //prophesy. cs. tamu. edu 34

Results – 2 Servers Concept SRT-LOAD (%) SRT-RANDOM (%) 3/0/0 D 3. 13 4. 03 3/0/1 D 4. 26 5. 97 3/0/2 D 7. 02 8. 28 3/0/3 D 8. 64 9. 02 3/1/0 D 3. 25 4. 94 3/1/1 D 3. 27 4. 10 3/1/2 D 3. 93 5. 97 3/1/3 D 3. 64 4. 08 3/2/0 D 3. 15 3. 32 3/2/1 D 4. 39 5. 20 3/2/2 D 5. 80 5. 97 3/2/3 D 6. 52 6. 95 3/3/0 D 4. 39 5. 64 3/3/1 D 4. 16 5. 20 3/3/2 D 4. 81 5. 73 3/3/3 D 5. 02 5. 58 AVERAGE 4. 71 http: //prophesy. cs. tamu. edu 5. 62 35

Summary n n Prophesy Two case studies with resource allocation • Geo LIGO: on average 33% better than loadbased selection • AADMLSS: on average 4 -11% better than loadbased selection n Future work • Continue extending application base • Work on queue wait time predictions http: //prophesy. cs. tamu. edu 36

Performance Analysis Projects n Prophesy • http: //prophesy. cs. tamu. edu • Published over 20 conference and journal papers n PAPI • http: //icl. cs. utk. edu/papi/ n SCALEA-G • http: //www. dps. uibk. ac. at/projects/scaleag/ n Perf. Track • http: //web. cecs. pdx. edu/~karavan/perftrack n Paradyn • http: //www. cs. wisc. edu/~paradyn/ n Network Weather Service • http: //nws. cs. ucsb. edu http: //prophesy. cs. tamu. edu 37