e223375b68bbf7c7209bd708221f84e9.ppt

- Количество слайдов: 84

FINAL EXAMINATION STUDY MATERIAL II • FINAL EXAMINATION STUDY MATERIAL II 1

FINAL EXAMINATION STUDY MATERIAL II • Recommended Textbook Reading From Intro Stats 3 rd Edition • CHAPTER 19 • PARTS OF CHAPTER 22 • PARTS OF CHAPTER 23 2

Final Examination Study Material II • CONFIDENCE INTERVALS FOR ONE SAMPLE PROPORTION • CONFIDENCE INTERVALS FOR THE DIFFERENCE BETWEEN TWO SAMPLE PROPORTIONS • CONFIDENCE INTERVALS FOR ONE SAMPLE MEAN • CONFIDENCE INTERVAL FOR THE DIFFERENCE BEWEEN TWO SAMPLE MEANS 3

Point Estimate and Interval Estimate A point estimate is a single number that is our “best guess” for the parameter. Point estimation produces a number (an estimate) which is believed to be close to the value of the unknown parameter. An interval estimate is an interval of numbers within which the parameter value is believed to fall. Interval estimation produces an interval that contains the estimated parameter with a prescribed confidence. 4

Point Estimate and Interval Estimate (Figure 1) 5

Point Estimate and Interval Estimate Question: Why is a point estimate alone not sufficiently informative? • A point estimate doesn’t tell us how close the estimate is likely to be to the parameter. • An interval estimate is more useful, it incorporates a margin of error which helps us to gauge the accuracy of the point estimate. 6

The Logic Behind Constructing A Confidence Interval To construct a confidence interval for a population proportion, start with the sampling distribution of a sample proportion, The sampling distribution: (1) Is approximately a normal distribution for large random samples by the Central Limit Theorem. (2) Has mean equal to the population proportion, p. (3) Has standard deviation called the standard error. 7

Confidence Interval or Interval Estimate Sample estimate Multiplier × Standard Error Sample estimate Margin of error • Multiplier is a number based on the confidence level desired and determined from the standard normal distribution (for proportions) or Student’s t-distribution (for means). 8

The Multiplier, denoted as z*, is the standardized score such that the area between -z* and z* under the standard normal curve corresponds to the desired confidence level. Note: Increase confidence level Implies larger multiplier 9

Example: For A 90% Confidence Level, What Is The Multiplier? 10

SOME MULTIPLIERS OR CRITICAL VALUES FOR STANDARD NORMAL DISTRIBUTION C % CONFIDENCE LEVEL 80% MULTIPLIERS 90% 1. 645 95% 1. 960 98% 2. 326 99% 2. 576 1. 282 11

WHAT DOES C% CONFIDENCE REALLY MEAN? • FORMALLY, WHAT WE MEAN IS THAT C% OF SAMPLES OF THIS SIZE WILL PRODUCE CONFIDENCE INTERVALS THAT CAPTURE THE TRUE PROPORTION. • C% CONFIDENCE MEANS THAT ON AVERAGE, IN C OUT OF 100 ESTIMATIONS, THE INTERVAL WILL CONTAIN THE TRUE ESTIMATED PARAMETER. • E. G. A 95% CONFIDENCE MEANS THAT ON THE AVERAGE, IN 95 OUT OF 100 ESTIMATIONS, THE INTERVAL WILL CONTAIN THE TRUE ESTIMATED PARAMETER. 12

CONFIDENCE INTERVAL FOR ONE SAMPLE PROPORTION P [ONE-PROPORTION Z-INTERVAL] • ASSUMPTIONS AND CONDITIONS RANDOMIZATION CONDITION • 10% CONDITION • SAMPLE SIZE ASSUMPTION OR SUCCESS/FAILURE CONDITION • • INDEPENDENCE ASSUMPTION NOTE: PROPER RANDOMIZATION CAN HELP ENSURE INDEPENDENCE. 13

Compact Formula For A Confidence Interval For A Population Proportion p • is the sample proportion; • z* denotes the multiplier; and • is the standard error of . 14

Constructing A Confidence Interval To Estimate A Population Proportion, p • The exact standard deviation of a sample proportion equals: • This formula depends on the unknown population proportion, p. • In practice, we don’t know p, and we need to estimate the standard error as 15

SAMPLE SIZE NEEDED TO PRODUCE A CONFIDENCE INTERVAL WITH A GIVEN MARGIN OF ERROR, ME SOLVING FOR n GIVES WHERE IS A REASONABLE GUESS. IF WE CANNOT MAKE A GUESS, WE TAKE 16

EXAMPLE 1 DIRECT MAIL ADVERTISERS SEND SOLICITATIONS (a. k. a. “junk mail”) TO THOUSANDS OF POTENTIAL CUSTOMERS IN THE HOPE THAT SOME WILL BUY THE COMPANY’S PRODUCT. THE RESPONSE RATE IS USUALLY QUITE LOW. SUPPOSE A COMPANY WANTS TO TEST THE RESPONSE TO A NEW FLYER, AND SENDS IT TO 1000 PEOPLE RANDOMLY SELECTED FROM THEIR MAILING LIST OF OVER 200, 000 PEOPLE. THEY GET ORDERS FROM 123 OF THE RECIPIENTS. (A) CREATE A 90% CONFIDENCE INTERVAL FOR THE PERCENTAGE OF PEOPLE THE COMPANY CONTACTS WHO MAY BUY SOMETHING. (B) EXPLAIN WHAT THIS INTERVAL MEANS. (C) EXPLAIN WHAT “ 90% CONFIDENCE” MEANS. (D) THE COMPANY MUST DECIDE WHETHER TO NOW DO A MASS MAILING. THE MAILING WON’T BE COST-EFFECTIVE UNLESS IT PRODUCES AT LEAST A 5% RETURN. WHAT DOES YOUR CONFIDENCE INTERVAL SUGGEST? EXPLAIN. 17

SOLUTION 18

C. I. With TI 83/84 Plus • • Press [STAT]. Select [TESTS]. Choose A: 1 -Prop. ZInt…. Input the following: o x: 123 o n: 1000 o C-Level: 0. 90 • Choose Calculate and press [ENTER]. 19

EXAMPLE 2 IN 1998 A SAN DIEGO REPRODUCTIVE CLINIC REPORTED 49 BIRTHS TO 207 WOMEN UNDER THE AGE OF 40 WHO HAD PREVIOUSLY BEEN UNABLE TO CONCEIVE. (A) FIND A 90% CONFIDENCE INTERVAL FOR THE SUCCESS RATE AT THIS CLINIC. (B) INTERPRET YOUR INTERVAL IN THIS CONTEXT. (C) EXPLAIN WHAT “ 90 CONFIDENCE” MEANS. (D) WOULD IT BE MISLEADING FOR THE CLINIC TO ADVERTISE A 25% SUCCESS RATE? EXPLAIN. (E) THE CLINIC WANTS TO CUT THE STATED MARGIN OF ERROR IN HALF. HOW MANY PATIENTS’ RESULTS MUST BE USED? (F) DO YOU HAVE ANY CONCERNS ABOUT THIS 20 SAMPLE? EXPLAIN.

SOLUTION 21

EXAMPLE 3 A MAY 2002 GALLUP POLL FOUND THAT ONLY 8% OF A RANDOM SAMPLE OF 1012 ADULTS APPROVED OF ATTEMPTS TO CLONE A HUMAN. (A) FIND THE MARGIN OF ERROR FOR THIS POLL IF WE WANT 95% CONFIDENCE IN OUR ESTIMATE OF THE PERCENT OF AMERICAN ADULTS WHO APPROVE OF CLONING HUMANS. (B) EXPLAIN WHAT THAT MARGIN OF ERROR MEANS. (C) IF WE ONLY NEED TO BE 90% CONFIDENT, WILL THE MARGIN OF ERROR BE LARGER OR SMALLER? EXPLAIN. (D) FIND THAT MARGIN OF ERROR. (E) IN GENERAL, IF ALL OTHER ASPECTS OF THE SITUATION REMAIN THE SAME, WOULD SMALLER SAMPLES PRODUCE SMALLER OR LARGER MARGINS OF ERROR? 22

SOLUTION 23

Effects of Confidence Level and Sample Size on Margin of Error The margin of error for a confidence interval: (i) Increases as the confidence level increases; (ii) Decreases as the sample size increases. For instance, a 99% confidence interval is wider than a 95% confidence interval, and a confidence interval with 200 observations is narrower than one with 100 observations at the same confidence level. These properties apply to all confidence intervals, not just the one for the population proportion. 24

What is the Error Probability for the Confidence Interval Method? 25

Confidence Intervals for the Difference Between Two Proportions where z* is the value of the standard normal variable with area between -z* and z* equal to the desired confidence level. 26

Necessary Conditions Condition 1: Sample proportions are available based on independent, randomly selected samples from the two populations. Condition 2: All of the quantities – are at least 10. 27

Example There has been debate among doctors over whether surgery can prolong life among men suffering from prostrate cancer, a type of cancer that typically develops and spreads very slowly. In the summer of 2003, The New England Journal of Medicine published results of some Scandinavian research. Men diagnosed with prostrate cancer were randomly assigned to either undergo surgery or not. Among the 347 men who had surgery, 16 eventually died of prostrate cancer, compared with 31 of the 348 men who did not have surgery. Create a 95% confidence interval for the difference in rates of death for the two groups of men. ANS = (0. 00579, 0. 08015) 28

Solution 29

C. I. With TI 83/84 Plus • • Press [STAT]. Select [TESTS]. Choose B: 2 -Prop. ZInt…. Input the following: o x 1: 31 o n 1: 348 o x 2: 16 o n 2: 347 o C-Level: 0. 95 • Choose Calculate and press [ENTER]. 30

Example: Age and Using the Internet Young: 92 of 262 use Internet as main news source =. 351 Old: 59 of 632 use Internet as main news source = . 093 • Approximate 95% Confidence Interval: . 258 1. 96(. 0317) . 196 to. 320 • We are 95% confident that somewhere between 19. 6% and 32. 0% more young adults than older adults use the Internet as their main news source. 31

Using Confidence Intervals to Guide Decisions Principle 1. A value not in a confidence interval can be rejected as a possible value of the population proportion. A value in a confidence interval is an “acceptable” possibility for the value of a population proportion. Principle 2. When a confidence interval for the difference in two population proportions does not cover 0, it is reasonable to conclude the two population proportions are different. Principle 3. When the confidence intervals for proportions in two different populations do not overlap, it is reasonable to conclude the two population 32 proportions are different.

Example: Which Drink Tastes Better? • Taste Test: A sample of 60 people taste both drinks and 55% like taste of Drink A better than Drink B Makers of Drink A want to advertise these results. Makers of Drink B make a 95% confidence interval for the population proportion who prefer Drink A. 95% Confidence Interval: • Note: Since. 50 is in the interval, there is not enough evidence to claim that Drink A is preferred by a majority of population represented by the sample. 33

PARTS OF CHAPTER 23 Intro Stats 3 rd Edition ESTIMATING MEANS WITH CONFIDENCE 34

Confidence Intervals For One Population Mean • For large n, from any population, and also, • For small n, from an underlying population that is normal; and the population standard deviation, is known; • The confidence interval for the population mean is: 35

Example • Assume that the helium porosity (in percentage) of coal samples taken from any particular seam is normally distributed with true standard deviation 0. 75. • Compute a 95% CI for the true average porosity of a certain seam if the average porosity for 20 specimens from the seam was 4. 85. 36

Solution 37

TI – 83 And TI - 84 PLUS COMMANDS • • Press STAT Scroll to TESTS Scroll to ZInterval Press ENTER Scroll to Stats Press ENTER Enter values for (i) the sample mean; (ii) the sample size; (iii) the standard deviation; (iv) the confidence level. • Scroll to CALCULATE and press ENTER 38

Case II • The sample size n is small; • The population standard deviation is unknown; 39

Confidence Intervals For One Population Mean In practice, we don’t know the population standard deviation . • Substituting the sample standard deviation s for to get introduces extra error. To account for this increased error, we must replace the z-score by a slightly larger score, called a t –score. The confidence interval is then a bit wider. This distribution is called the t distribution. 40

Properties Of The t-Distribution § The t-distribution is bell shaped and symmetric about 0. § The probabilities depend on the degrees of freedom, § The t-distribution has thicker tails than the standard normal distribution, i. e. , it is more spread out. § A t -score multiplied by the standard error gives the margin of error for a confidence interval for the mean. 41

More Thoughts about z and t – The Student’s t distribution: • • • Is unimodal. Is symmetric about its mean. Has higher tails than Normal. Is very close to Normal for large df; Is needed because we are using s as an estimate for s. – If you happen to know s, which almost never happens, use the Normal model and not Student’s t. 42

t - Distribution 43

Finding From Table For a confidence level = 95%, n = 7, what is the t – multiplier? 44

Finding The t – multiplier From Table 45

Finding The t – multiplier With TI – 83 or TI – 84 PLUS • TI – 84 PLUS COMMAND • Press 2 ND and VARS • Example • Scroll to inv. T(0. 975, 6) = 2. 447 • Press ENTER • inv. T(Left area, df) 46

Revisiting Degrees Of Freedom – For every sample size n there is a different Student’s t distribution. – Degrees of freedom: df = n – 1. – It is the number of independent quantities left after we’ve estimated the parameters. 47

One Sample t-Interval For One Sample Mean – When the assumptions are met (seen later), the confidence interval for the mean is – The critical value depends on the confidence level, C, and the degrees of freedom n – 1. 48

ASSUMPTIONS AND CONDITIONS • INDEPENDENCE ASSUMPTION: THE DATA VALUES SHOULD BE INDEPENDENT. THERE’S REALLY NO WAY TO CHECK INDEPENDENCE OF THE DATA BY LOOKING AT THE SAMPLE, BUT WE SHOULD THINK ABOUT WHETHER THE ASSUMPTION IS REASONABLE. • RANDOMIZATION CONDITION: THE DATA SHOULD ARISE FROM A RANDOM SAMPLE OR SUITABLY A RANDOMIZED EXPERIMENT. 49

ASSUMPTIONS AND CONDITIONS • 10% CONDITION: THE SAMPLE IS NO MORE THAN 10% OF THE POPULATION. • NORMAL POPULATION ASSUMPTION OR NEARLY NORMAL CONDITION: THE DATA COME FROM A DISTRIBUTION THAT IS UNIMODAL AND SYMMETRIC. REMARK: CHECK THIS CONDITION BY MAKING A HISTOGRAM OR NORMAL PROBABILITY PLOT. 50

More On Assumptions And Conditions • Independence Condition – Randomization Condition: The data should arise from a suitably randomized experiment. – Sample size < 10% of the population size. • Nearly Normal – For large sample sizes (n > 40), not severely skewed. – (15 ≤ n ≤ 40): Need unimodal and symmetric. – (n < 15): Need almost perfectly normal. – Check with a histogram. 51

Example – Mirex In Salmon • A study of mirex concentrations in salmon found • n = 150, = 0. 0913 ppm, s = 0. 0495 ppm • Find a 95% confidence interval for mirex concentrations in salmon. • df = 150 – 1 = 149 52

Example – Mirex In Salmon • Confidence Interval for m = 0. 0913 ± 0. 0079 = (0. 0834, 0. 0992) Interpreting The Interval – I’m 95% confident that the mean level of mirex concentration in farm-raised salmon is between 0. 0834 and 0. 0992 parts per million. 53

TI – 83 And TI - 84 PLUS COMMANDS • • Press STAT Scroll to TESTS Scroll to TInterval Press ENTER Scroll to Stats Press ENTER Enter values for (i) the sample mean; (ii) the sample size; (iii) the standard deviation; (iv) the confidence level. • Scroll to CALCULATE and press ENTER 54

Example • A nutrition laboratory tests 40 “reduced sodium” hot dogs, finding that the mean sodium content is 310 mg, with a standard deviation of 36 mg. Find a 95% confidence interval for the mean sodium content of this brand of hot dog. Explain clearly what your interval means. • Solution 55

Right Tail Areas Under The t – distribution With TI – 83 or TI – 84 PLUS If t has a Student's tdistribution with degrees of freedom, df, then the TI – 83 or TI - 84 PLUS function • Example: Find the one tail probability if t = 2. 447 and number of degrees of freedom is 6. • tcdf(a, b, df) • tcdf(2. 447, 10^10, 6) = computes the area under the t-curve and between a and b. = 0. 024997007 56

The Challenge of Finding the Sample Size To find the necessary sample size in order to have a small enough margin of error: • Decide on acceptable ME. • Determine s: Use a pilot to estimate s. • Determine : Use z* as an estimate. By the 68 -95 -99. 7 Rule, use 2 for 95% confidence. 57

Other Factors That Affect the Choice of the Sample Size § The first is the desired precision, as measured by the margin of error, ME. § The second is the confidence level. § The third factor is the variability in the data. § The fourth factor is cost. 58

What if You Have to Use a Small n? The t methods for a mean are valid for any n. However, you need to be extra cautious to look for extreme outliers or great departures from the normal population assumption. – In the case of the confidence interval for a population proportion, the method works poorly for small samples because the CLT no longer holds. 59

Confidence Intervals for Difference in Two Population Means (Independent Samples) 60

Confidence Intervals for Difference for the Difference Between Two Population Means Approximate CI for m 1 – m 2: where t* is the value in a t-distribution with area between -t* and t* equal to the desired confidence level. Approximate df difficult to specify. Use computer software or conservatively use the smaller of the two sample sizes and subtract 1. 61

Degrees of Freedom The t-distribution is only approximately correct and df formula is complicated (Welch’s approximation): Statistical software can use the above approximation, but if done by-hand then use a conservative df = smaller of n 1 – 1 and n 2 – 1. 62

Necessary Conditions Two samples must be independent and either: Situation 1: Populations of measurements both bell-shaped, and random samples of any size are measured. Situation 2: Large (n 30) random samples are measured. But if there are extreme outliers, or extreme skewness, it is better to have an even larger sample than n = 30. 63

Example • A random sample of 40 men drank an average of 20 cups of coffee per week during finals, while a sample of 30 women drank an average of 15 cups of coffee per week. The sample standard deviations were 6 cups for the men and 3 cups for the women. The standard error for the difference between the two sample means is 1. 095. Calculate an approximate 95% confidence interval for the difference in average cups of coffee drunk (men –women). • ANS = (2. 81, 7. 19) 64

TI – 83 And TI – 84 PLUS COMMANDS • • Press STAT Scroll to TESTS Scroll to 2 - Samp. TInt Press ENTER Scroll to Stats Press ENTER Enter values for (i) the sample mean; (ii) the sample size; (iii) the standard deviation; (iv) the confidence level for population 1 and population 2. • Scroll to Pooled and Select No • Scroll to CALCULATE and press ENTER 65

Self-Read Example: Effect of a Stare on Driving • Randomized experiment: Researchers either stared or did not stare at drivers stopped at a campus stop sign; Timed how long (sec) it took driver to proceed from sign to a mark on other side of the intersection. No Stare Group (n = 14): 8. 3, 5. 5, 6. 0, 8. 1, 8. 8, 7. 5, 7. 8, 7. 1, 5. 7, 6. 5, 4. 7, 6. 9, 5. 2, 4. 7 Stare Group (n = 13): 5. 6, 5. 0, 5. 7, 6. 3, 6. 5, 5. 8, 4. 5, 6. 1, 4. 8, 4. 9, 4. 5, 7. 2, 5. 8 • Task: Make a 95% CI for the difference between the mean crossing times for the two populations represented by these two independent samples. 66

Example: Effect of a Stare on Driving 67

Example: Effect of a Stare on Driving Checking Conditions Boxplots show … • No outliers and no strong skewness. • Crossing times in stare group generally faster and less variable. 68

Example: Effect on a Stare on Driving Note: The df = 21 was reported by the computer package based on the Welch’s approximation formula. 69

Equal Variance Assumption and the Pooled Standard Error • May be reasonable to assume the two populations have equal population standard deviations, or equivalently, equal population variances: • Estimate of this variance based on the combined or “pooled” data is called the pooled variance. The square root of the pooled variance is called the pooled standard deviation: 70

Pooled Standard Error 71

Pooled Degrees of Freedom (df) • Note: Pooled df = (n 1 – 1) + (n 2 – 1) = (n 1 + n 2 – 2). 72

Example • • The head circumference is measured for a sample of 15 girls and a separate sample of 15 boys. What is the correct combination of degrees of freedom and the value of the multiplier t* for a pooled 90% confidence interval for the difference in mean head circumference between girls and boys? A. df = 14, t* = 1. 76 B. df = 28, t* = 1. 70 C. df = 28, t* = 2. 05 D. df = 30, t* = 1. 70 KEY: B 73

Pooled Confidence Interval Pooled CI for the Difference Between Two Means (Independent Samples): where t* is found using a t-distribution with df = (n 1 + n 2 – 2) and sp is the pooled standard deviation. 74

Example A random sample of 60 mathematics majors spent an average of $200. 00 for textbooks for a term, with a standard deviation of $22. 50. A random sample of 40 English majors spent an average of $180. 00 for textbooks that term, with a standard deviation of $18. 30. Calculate a 90% pooled confidence interval for the difference in average amounts spent on textbooks (math majors – English majors). ANS = (12. 91, 27. 09) 75

TI – 83 And TI – 84 COMMANDS • • Press STAT Scroll to TESTS Scroll to 2 - Samp. TInt Press ENTER Scroll to Stats Press ENTER Enter values for (i) the sample mean; (ii) the sample size; (iii) the standard deviation; (iv) the confidence level for population 1 and population 2. • Scroll to Pooled and Select Yes • Scroll to CALCULATE and press ENTER 76

Self-Read Example: Male and Female Sleep Times • Q: How much difference is there between how long female and male students slept the previous night? • Data: The 83 female and 65 male responses from students in an intro stat class. • Task: Make a 95% CI for the difference between the two population means sleep hours for females versus males. • Note: We will assume equal population variances. 77

Example: Male and Female Sleep Times Two-sample T for sleep [with “Assume Equal Variance” option] Sex N Mean St. Dev SE Mean Female 83 7. 02 1. 75 0. 19 Male 65 6. 55 1. 68 0. 21 Difference = mu (Female) – mu (Male) Estimate for difference: 0. 461 95% CI for difference: (-0. 103, 1. 025) T-Test of difference = 0 (vs not =): T-Value = 1. 62 P = 0. 108 DF = 146 Both use Pooled St. Dev = 1. 72 78

Example: Male and Female Sleep Times Notes: • Two sample standard deviations are very similar. • Sample mean for females higher than for males. • 95% confidence interval contains 0 so cannot rule out that the population means may be equal. 79

Example: Male and Female Sleep Times • Pooled Standard Deviation and Pooled Standard Error “by – hand”: 80

Example: Male and Female Sleep Times 81

Pooled or Unpooled? • If the larger sample size produced the larger standard deviation, the pooled procedure is acceptable because it will be conservative. • If the smaller standard deviation accompanies the larger sample size, the pooled test can be quite misleading and not recommended. • If sample sizes are equal, the pooled and unpooled standard errors are equal. Unless the sample standard deviations are quite similar, it is best to use the unpooled procedure. 82

Confidence Interval for the Difference in Two Population Means 1. Make sure appropriate conditions apply checking sample size and/or a shape picture of the differences. 2. Choose a confidence level. 3. Compute the mean and std dev for each sample. 4. Determine whether the std devs are similar enough to pooled procedure can be used. 5. Calculate the appropriate standard error (pooled or unpooled). 6. Calculate the appropriate df. 7. Use Table A. 2 (or software) to find the multiplier t*. 83

More Examples From Practice Sheet 84