Software Complexity Study Team Members EE 3970 Final

Software Complexity Study Team Members: EE 3970 Final Report April 27, 2005 4/27/2005 Craig Pankoff Lee Kamin Jason Shim Thomas Stout Software Complexity Study

Overview • • • 4/27/2005 Introduction Background Test Approach Test Results Conclusion Software Complexity Study 2

Introduction 1. Objective: • Test, analyze and verify assertions of software complexity 2. Assertions: • Computational Complexity vs order of input list o Strongly related • 4/27/2005 Run – time vs vector length o No relationship for a modern processor architecture Software Complexity Study 3

Introduction Start 3. Implementation • “Insertion Sort” Algorithm o Recursive o Iterative 4. Analysis • • Regression Analysis Hypothesis Test Get Input List: • Non-negative integers • Negative int => terminate • Separated by space and/or _carriage return Insertion Sort Link List Output: • Sorted List • Separated by carriage return End 4/27/2005 Software Complexity Study 4

Background Motivation: 1. Customers dissatisfactions • • 2. Product is slower Varies dramatically to input Macro. Hard Claims • • 4/27/2005 Product is of highest performance Variation of speed is unavoidable Software Complexity Study 5

Approach n. Program n. Regression Analysis n. Hypothesis Test 4/27/2005 Software Complexity Study

Program Approach n n n Written in MS Visual C++ Contained both an iterative and a recursive sorting algorithm A separate class was written to record statistics 4/27/2005 Software Complexity Study 7

Program Approach n n n A link list was used to store the sorted values Every node stores 2 links Every node is pointed to by 2 links 4/27/2005 Software Complexity Study 8

Program Approach First Assertion: Complexity of sorting depends strongly on initial ordering of input list n Computational Complexity – The number of links that must be created or followed n Stats. cpp was used to record these numbers n 4/27/2005 Software Complexity Study 9

Program Approach Stats. cpp and Stats. h were used to record statistics in order to measure computational complexity n Stored two integer values and had four functions n Integers stored number of links created and number of links followed n Functions could increment or retrieve these values n 4/27/2005 Software Complexity Study 10

Program Approach Second Assertion: Recursion is no more expensive than a simple loop structure on modern processors n Program was instrumented so that it would time how long it took to sort a given input n Program also reported whether the iterative or recursive algorithm was used n 4/27/2005 Software Complexity Study 11

Program Approach n Iterative Steps 1. 2. 3. 4/27/2005 Read in next value from input file Get a pointer to the first node past the header Compare new value to current node’s value Move to next node until position is found Software Complexity Study 12

Program Approach n Recursive Steps 1. 2. 3. 4. 4/27/2005 Get next value from input file Get a pointer to the first node past the header Call Recursive program first node and new value Recursive program calls itself with new value and the next node in the list until correct position is found Software Complexity Study 13

Regression Analysis n n Assertion Two experiments: n n Best case Worst case Not time dependent Measurement n 4/27/2005 Complexity = links created vs. links followed Software Complexity Study 14

Assertion n Complexity of sorting depends strongly on initial ordering of input list 4/27/2005 Software Complexity Study 15

Different Experiments n Best case: Reverse order n 9, 8, … 1, 0 n n Worst case: Already sorted n 0, 1, … 8, 9 n 4/27/2005 Software Complexity Study 16

Different Experiments n This is not always true for an insertion sort n It is possible to implement it so that the best and worst case are flipped 4/27/2005 Software Complexity Study 17

Running Experiments Program ran for variety of sizes n Ranging from 10 to 10, 000 elements n Not time dependent: n n 4/27/2005 It is not necessary to run this experiment more than once and use averages because when the same input is given the same output is always seen Software Complexity Study 18

Determining Results n Results compiled into a spreadsheet n Excel used to find best fit line and R 2 values for each of the scenarios 4/27/2005 Software Complexity Study 19

Hypothesis Testing Assertion n Hypothesis test set up n Four experiments n This IS time dependent n 4/27/2005 Software Complexity Study 20

Assertion The assertion being tested was that recursive algorithms are no more expensive than simple loop structures when using a modern processor n Tested using a 996 MHz Pentium III processor n 4/27/2005 Software Complexity Study 21

Hypothesis Test n Assertion must be Null Hypothesis n Null hypothesis: n Alternate hypothesis: n The recursive average is µx while the iterative average is µy 4/27/2005 Software Complexity Study 22

Running Experiment Again run for variety of vector length ranging from 10 to 10, 000 n Large time requirement for each run n Each scenario was ran five times n All runs performed on same machine at same time of day n 4/27/2005 Software Complexity Study 23

Determining Results Run times compiled into a spreadsheet n Average and variance found for each scenario n Population variance unknown n t-statistic n Pooled variance: n 4/27/2005 Software Complexity Study 24

Determining Results n t-statistic calculation: n E(x) is recursive average E(y) is iterative average n 4/27/2005 Software Complexity Study 25

Determining Results Significance level α needs to be chosen n α = 0. 001 n This yields a confidence interval of 99. 9% n n Means that all of our results for this experiment will have a 99. 9% probability of being accurate 4/27/2005 Software Complexity Study 26

Determining Results Final step n Find t-value from table for 99. 9% confidence level n Compare that to calculated value n This comparison determines whether the null hypothesis should be rejected or not n 4/27/2005 Software Complexity Study 27

Results n Regression Analysis Best case n Worst case n n Hypothesis Testing Best case n Worst case n 4/27/2005 Software Complexity Study 28

Regression Analysis n n Best case regression analysis showed that the relationship between complexity and input length is linear. This is exactly what one would expect 1. 2. 4/27/2005 For every node that is inserted, the first place checked is correct There a constant number of operations that must be done to insert the new node Software Complexity Study 29

Regression Analysis Complexity vs. Vector Size for Best Case Links Array Size Created 10 40 100 400 500 2000 1000 4000 2500 10000 5000 20000 7500 30000 10000 4/27/2005 Links Followed 51 501 2501 5001 12501 25001 37501 50001 Software Complexity Study Total 91 901 4501 9001 22501 45001 67501 90001 30

Regression Analysis 4/27/2005 Software Complexity Study 31

Regression Analysis Worst case regression analysis showed that the relationship between complexity and input length is O(n 2). n This is exactly what one would expect n 1. For every node that is inserted, the entire existing list must be traversed to the end to find the new position 2. There are n nodes that must be inserted, and all must follow the links from (n-1) previous nodes to be inserted 4/27/2005 Software Complexity Study 32

Regression Analysis Complexity vs. Vector Size for Worst Case Array Size 10 100 500 1000 2500 5000 7500 10000 4/27/2005 Links Created 40 400 2000 4000 10000 20000 30000 40000 Links Followed 96 5451 127251 504501 3136251 12522501 28158751 50045001 Software Complexity Study Total 136 5851 129251 508501 3146251 12542501 28188751 50085001 33

Regression Analysis 4/27/2005 Software Complexity Study 34

Hypothesis Testing Null Hypothesis is that the recursive program is no more costly than an iterative program that completes the same task. 4/27/2005 Software Complexity Study 35

Hypothesis Testing Tα = 4. 5008 Calculated T 0 Values for Best and Worst Case Array Size Best Case 10 0. 632455532 -2. 449489743 100 -1. 264911064 -1 500 4. 714045208 -1. 264911064 1000 42. 35301899 -1. 264911064 2500 426. 8822709 1 5000 535. 5889833 0. 411865886 7500 323. 7464778 -0. 577136554 10000 4/27/2005 Worst Case 328. 4259356 0. 207320686 Software Complexity Study 36

Hypothesis Testing Results for Best Case Array Size 10 100 500 4/27/2005 Result FAIL TO REJE CT FAIL TO Software Complexity Study 37

Hypothesis Testing Results for Worst Case Array Size Result FAIL TO REJ 10 ECT FAIL TO REJ 100 ECT REJEC 500 T 4/27/2005 Software Complexity Study 38

Conclusion First assertion made by Macro. Hard was correct n Second assertion made by Macro. Hard was incorrect n 4/27/2005 Software Complexity Study 39

Questions 4/27/2005 Software Complexity Study 40