Скачать презентацию Comparison Methodology ü Meaning of a sample ü Скачать презентацию Comparison Methodology ü Meaning of a sample ü

700169eee5c335f28b9d41b5eeee7208.ppt

  • Количество слайдов: 55

Comparison Methodology ü Meaning of a sample ü Confidence intervals • Making decisions and Comparison Methodology ü Meaning of a sample ü Confidence intervals • Making decisions and comparing alternatives • Special considerations in confidence intervals • Sample sizes © 1998, Geoff Kuenning

Estimating Confidence Intervals • Two formulas for confidence intervals – Over 30 samples from Estimating Confidence Intervals • Two formulas for confidence intervals – Over 30 samples from any distribution: z-distribution – Small sample from normally distributed population: t-distribution • Common error: using t-distribution for non-normal population – Central Limit Theorem often saves us © 1998, Geoff Kuenning

The z Distribution • Interval on either side of mean: • Significance level is The z Distribution • Interval on either side of mean: • Significance level is small for large confidence levels • Tables of z are tricky: be careful! © 1998, Geoff Kuenning

The t Distribution • Formula is almost the same: • Usable only for normally The t Distribution • Formula is almost the same: • Usable only for normally distributed populations! • But works with small samples © 1998, Geoff Kuenning

Making Decisions • Why do we use confidence intervals? – Summarizes error in sample Making Decisions • Why do we use confidence intervals? – Summarizes error in sample mean – Gives way to decide if measurement is meaningful – Allows comparisons in face of error • But remember: at 90% confidence, 10% of sample means do not include population mean © 1998, Geoff Kuenning

Testing for Zero Mean • Is population mean significantly nonzero? • If confidence interval Testing for Zero Mean • Is population mean significantly nonzero? • If confidence interval includes 0, answer is no • Can test for any value (mean of sums is sum of means) • Example: our height samples are consistent with average height of 170 cm – Also consistent with 160 and 180! © 1998, Geoff Kuenning

Comparing Alternatives • Often need to find better system – Choose fastest computer to Comparing Alternatives • Often need to find better system – Choose fastest computer to buy – Prove our algorithm runs faster • Different methods for paired/unpaired observations – Paired if ith test on each system was same – Unpaired otherwise © 1998, Geoff Kuenning

Comparing Paired Observations • Treat problem as 1 sample of n pairs • For Comparing Paired Observations • Treat problem as 1 sample of n pairs • For each test calculate performance difference • Calculate confidence interval for differences • If interval includes zero, systems aren’t different – If not, sign indicates which is better © 1998, Geoff Kuenning

Example: Comparing Paired Observations • Do home baseball teams outscore visitors? • Sample from Example: Comparing Paired Observations • Do home baseball teams outscore visitors? • Sample from 9 -4 -96: © 1998, Geoff Kuenning

Example: Comparing Paired Observations • H-V 2 -2 -7 5 6 -1 -7 6 Example: Comparing Paired Observations • H-V 2 -2 -7 5 6 -1 -7 6 7 3 2 1 -1 6 • Mean 1. 4, 90% interval (-0. 75, 3. 6) – Can’t reject the hypothesis that difference is 0. – 70% interval is (0. 10, 2. 76) © 1998, Geoff Kuenning

Comparing Unpaired Observations • A sample of size na and nb for each alternative Comparing Unpaired Observations • A sample of size na and nb for each alternative A and B • Start with confidence intervals – If no overlap: mean A B • Systems are different and higher mean is better (for HB metrics) – If overlap and each CI contains other mean: • Systems are not different at this level • If close call, could lower confidence level – If overlap and one mean isn’t in other CI • Must do t-test © 1998, Geoff Kuenning B mean A B A

The t-test (1) 1. Compute sample means and 2. Compute sample standard deviations sa The t-test (1) 1. Compute sample means and 2. Compute sample standard deviations sa and sb 3. Compute mean difference = 4. Compute standard deviation of difference: © 1998, Geoff Kuenning

The t-test (2) 5. Compute effective degrees of freedom: 6. Compute the confidence interval: The t-test (2) 5. Compute effective degrees of freedom: 6. Compute the confidence interval: ! 7. If interval includes zero, no difference © 1998, Geoff Kuenning

Comparing Proportions • If k of n trials give a certain result, then confidence Comparing Proportions • If k of n trials give a certain result, then confidence interval is ! • If interval includes 0. 5, can’t say which outcome is statistically meaningful • Must have k>10 to get valid results © 1998, Geoff Kuenning

Special Considerations • Selecting a confidence level • Hypothesis testing • One-sided confidence intervals Special Considerations • Selecting a confidence level • Hypothesis testing • One-sided confidence intervals © 1998, Geoff Kuenning

Selecting a Confidence Level • Depends on cost of being wrong • 90%, 95% Selecting a Confidence Level • Depends on cost of being wrong • 90%, 95% are common values for scientific papers • Generally, use highest value that lets you make a firm statement – But it’s better to be consistent throughout a given paper © 1998, Geoff Kuenning

Hypothesis Testing • The null hypothesis (H 0) is common in statistics – Confusing Hypothesis Testing • The null hypothesis (H 0) is common in statistics – Confusing due to double negative – Gives less information than confidence interval – Often harder to compute • Should understand that rejecting null hypothesis implies result is meaningful © 1998, Geoff Kuenning

One-Sided Confidence Intervals • Two-sided intervals test for mean being outside a certain range One-Sided Confidence Intervals • Two-sided intervals test for mean being outside a certain range (see “error bands” in previous graphs) • One-sided tests useful if only interested in one limit • Use z 1 - or t 1 - ; n instead of z 1 - /2 or t 1 - /2; n in formulas © 1998, Geoff Kuenning

Sample Sizes • Bigger sample sizes give narrower intervals – Smaller values of t, Sample Sizes • Bigger sample sizes give narrower intervals – Smaller values of t, v as n increases – in formulas • But sample collection is often expensive – What is the minimum we can get away with? • Start with a small number of preliminary measurements to estimate variance. © 1998, Geoff Kuenning

Choosing a Sample Size • To get a given percentage error ±r%: • Here, Choosing a Sample Size • To get a given percentage error ±r%: • Here, z represents either z or t as appropriate • For a proportion p = k/n: © 1998, Geoff Kuenning

Example of Choosing Sample Size • Five runs of a compilation took 22. 5, Example of Choosing Sample Size • Five runs of a compilation took 22. 5, 19. 8, 21. 1, 26. 7, 20. 2 seconds • How many runs to get ± 5% confidence interval at 90% confidence level? • = 22. 1, s = 2. 8, t 0. 95; 4 = 2. 132 © 1998, Geoff Kuenning

Linear Regression Models ü What is a (good) model? ü Estimating model parameters • Linear Regression Models ü What is a (good) model? ü Estimating model parameters • Allocating variation • Confidence intervals for regressions • Verifying assumptions visually © 1998, Geoff Kuenning

What Is a (Good) Model? • For correlated data, model predicts response given an What Is a (Good) Model? • For correlated data, model predicts response given an input • Model should be equation that fits data • Standard definition of “fits” is least-squares – Minimize squared error – While keeping mean error zero – Minimizes variance of errors © 1998, Geoff Kuenning

Least-Squared Error N • If y then error in estimate for xi is N Least-Squared Error N • If y then error in estimate for xi is N yi • Minimize Sum of Squared Errors (SSE) • Subject to the constraint © 1998, Geoff Kuenning

Estimating Model Parameters • Best regression parameters are where • Note error in book! Estimating Model Parameters • Best regression parameters are where • Note error in book! © 1998, Geoff Kuenning

Parameter Estimation Example • Execution time of a script for various loop counts: • Parameter Estimation Example • Execution time of a script for various loop counts: • = 6. 8, = 2. 32, xy = 88. 54, x 2 = 264 • b 0 = 2. 32 (0. 29)(6. 8) = 0. 35 © 1998, Geoff Kuenning

Graph of Parameter Estimation Example © 1998, Geoff Kuenning Graph of Parameter Estimation Example © 1998, Geoff Kuenning

Variants of Linear Regression • Some non-linear relationships can be handled by transformations – Variants of Linear Regression • Some non-linear relationships can be handled by transformations – For y = aebx take logarithm of y, do regression on log(y) = b 0+b 1 x, let b = b 1, – For y = a+b log(x), take log of x before fitting parameters, let b = b 1, a = b 0 – For y = axb, take log of both x and y, let b = b 1, © 1998, Geoff Kuenning

Allocating Variation • If no regression, best guess of y is • Observed values Allocating Variation • If no regression, best guess of y is • Observed values of y differ from , giving rise to errors (variance) • Regression gives better guess, but there are still errors • We can evaluate quality of regression by allocating sources of errors © 1998, Geoff Kuenning

The Total Sum of Squares • Without regression, squared error is © 1998, Geoff The Total Sum of Squares • Without regression, squared error is © 1998, Geoff Kuenning

The Sum of Squares from Regression • Recall that regression error is • Error The Sum of Squares from Regression • Recall that regression error is • Error without regression is SST • So regression explains SSR = SST - SSE • Regression quality measured by coefficient of determination © 1998, Geoff Kuenning

Evaluating Coefficient of Determination • Compute © 1998, Geoff Kuenning Evaluating Coefficient of Determination • Compute © 1998, Geoff Kuenning

Example of Coefficient of Determination • For previous regression example – y = 11. Example of Coefficient of Determination • For previous regression example – y = 11. 60, y 2 = 29. 79, xy = 88. 54, – – SSE = 29. 79 -(0. 35)(11. 60)-(0. 29)(88. 54) = 0. 05 SST = 29. 79 -26. 9 = 2. 89 SSR = 2. 89 -. 05 = 2. 84 R 2 = (2. 89 -0. 05)/2. 89 = 0. 98 © 1998, Geoff Kuenning

Standard Deviation of Errors • Variance of errors is SSE divided by degrees of Standard Deviation of Errors • Variance of errors is SSE divided by degrees of freedom – DOF is n 2 because we’ve calculated 2 regression parameters from the data – So variance (mean squared error, MSE) is SSE/(n 2) • Standard deviation of errors is square root: © 1998, Geoff Kuenning

Checking Degrees of Freedom • Degrees of freedom always equate: – SS 0 has Checking Degrees of Freedom • Degrees of freedom always equate: – SS 0 has 1 (computed from ) – SST has n 1 (computed from data and , which uses up 1) – SSE has n 2 (needs 2 regression parameters) – So © 1998, Geoff Kuenning

Example of Standard Deviation of Errors • For our regression example, SSE was 0. Example of Standard Deviation of Errors • For our regression example, SSE was 0. 05, so MSE is 0. 05/3 = 0. 017 and se = 0. 13 • Note high quality of our regression: – R 2 = 0. 98 – se = 0. 13 – Why such a nice straight-line fit? © 1998, Geoff Kuenning

Confidence Intervals for Regressions • Regression is done from a single population sample (size Confidence Intervals for Regressions • Regression is done from a single population sample (size n) – Different sample might give different results – True model is y = 0 + 1 x – Parameters b 0 and b 1 are really means taken from a population sample © 1998, Geoff Kuenning

Calculating Intervals for Regression Parameters • Standard deviations of parameters: • Confidence intervals are Calculating Intervals for Regression Parameters • Standard deviations of parameters: • Confidence intervals are bi t sbi • where t has n - 2 degrees of freedom ! © 1998, Geoff Kuenning

Example of Regression Confidence Intervals • Recall se = 0. 13, n = 5, Example of Regression Confidence Intervals • Recall se = 0. 13, n = 5, x 2 = 264, = 6. 8 • So • Using a 90% confidence level, t 0. 95; 3 = 2. 353 © 1998, Geoff Kuenning

Regression Confidence Example, cont’d • Thus, b 0 interval is ! 0. 35 2. Regression Confidence Example, cont’d • Thus, b 0 interval is ! 0. 35 2. 353(0. 16) = (-0. 03, 0. 73) – Not significant at 90% • And b 1 is ! 0. 29 2. 353(0. 004) = (0. 28, 0. 30) – Significant at 90% (and would survive even 99. 9% test) © 1998, Geoff Kuenning

Confidence Intervals for Nonlinear Regressions • For nonlinear fits using exponential transformations: – Confidence Confidence Intervals for Nonlinear Regressions • For nonlinear fits using exponential transformations: – Confidence intervals apply to transformed parameters – Not valid to perform inverse transformation on intervals © 1998, Geoff Kuenning

Confidence Intervals for Predictions • Previous confidence intervals are for parameters – How certain Confidence Intervals for Predictions • Previous confidence intervals are for parameters – How certain can we be that the parameters are correct? • Purpose of regression is prediction – How accurate are the predictions? – Regression gives mean of predicted response, based on sample we took © 1998, Geoff Kuenning

Predicting m Samples • Standard deviation for mean of future sample of m observations Predicting m Samples • Standard deviation for mean of future sample of m observations at xp is S N ymp • Note deviation drops as m • Variance minimal at x = • Use t-quantiles with n– 2 DOF for interval © 1998, Geoff Kuenning

Example of Confidence of Predictions • Using previous equation, what is predicted time for Example of Confidence of Predictions • Using previous equation, what is predicted time for a single run of 8 loops? • Time = 0. 35 + 0. 29(8) = 2. 67 • Standard deviation of errors se = 0. 13 S N yp • 90% interval is then ! © 1998, Geoff Kuenning

Verifying Assumptions Visually • Regressions are based on assumptions: – Linear relationship between response Verifying Assumptions Visually • Regressions are based on assumptions: – Linear relationship between response y and predictor x • Or nonlinear relationship used in fitting – Predictor x nonstochastic and error-free – Model errors statistically independent • With distribution N(0, c) for constant c • If assumptions violated, model misleading or invalid © 1998, Geoff Kuenning

Testing Linearity • Scatter plot x vs. y to see basic curve type Linear Testing Linearity • Scatter plot x vs. y to see basic curve type Linear Outlier © 1998, Geoff Kuenning Piecewise Linear Nonlinear (Power)

Testing Independence of Errors N • Scatter-plot i versus yi • Should be no Testing Independence of Errors N • Scatter-plot i versus yi • Should be no visible trend • Example from our curve fit: © 1998, Geoff Kuenning

More on Testing Independence • May be useful to plot error residuals versus experiment More on Testing Independence • May be useful to plot error residuals versus experiment number – In previous example, this gives same plot except for x scaling • No foolproof tests © 1998, Geoff Kuenning

Testing for Normal Errors • Prepare quantile-quantile plot • Example for our regression: © Testing for Normal Errors • Prepare quantile-quantile plot • Example for our regression: © 1998, Geoff Kuenning

Testing for Constant Standard Deviation • • Tongue-twister: homoscedasticity Return to independence plot Look Testing for Constant Standard Deviation • • Tongue-twister: homoscedasticity Return to independence plot Look for trend in spread Example: © 1998, Geoff Kuenning

Linear Regression Can Be Misleading • Regression throws away some information about the data Linear Regression Can Be Misleading • Regression throws away some information about the data – To allow more compact summarization • Sometimes vital characteristics are thrown away – Often, looking at data plots can tell you whether you will have a problem © 1998, Geoff Kuenning

Example of Misleading Regression x 10 6. 58 8 5. 76 13 7. 71 Example of Misleading Regression x 10 6. 58 8 5. 76 13 7. 71 9 8. 84 11 8. 47 14 7. 04 6 5. 25 4 12. 50 12 5. 56 © 1998, Geoff Kuenning 7 I y 8. 04 x 10 II y 9. 14 x 10 III y 7. 46 x 8 6. 95 8 8. 14 8 6. 77 8 7. 58 13 8. 74 13 12. 74 8 8. 81 9 8. 77 9 7. 11 8 8. 33 11 9. 26 11 7. 81 8 9. 96 14 8. 10 14 8. 84 8 7. 24 6 6. 13 6 6. 08 8 4. 26 4 3. 10 4 5. 39 19 10. 84 12 9. 13 12 8. 15 8 4. 82 7 7. 26 7 6. 42 8 IV y

What Does Regression Tell Us About These Data Sets? • • Exactly the same What Does Regression Tell Us About These Data Sets? • • Exactly the same thing for each! N = 11 Mean of y = 7. 5 Y = 3 +. 5 X Standard error of regression is 0. 118 All the sums of squares are the same Correlation coefficient =. 82 R 2 =. 67 © 1998, Geoff Kuenning

Now Look at the Data Plots I II IV © 1998, Geoff Kuenning Now Look at the Data Plots I II IV © 1998, Geoff Kuenning

For Discussion Today Project Proposal 1. Statement of hypothesis 2. Workload decisions 3. Metrics For Discussion Today Project Proposal 1. Statement of hypothesis 2. Workload decisions 3. Metrics to be used 4. Method