14_2_Basic_statistics.ppt

- Количество слайдов: 75

Statistica di base Poznan 2006 BASIC STATISTICS Prof. Saverio Mannino University of Milan 1 Prof. Saverio Mannino

Statistica di base Poznan 2006 2 Prof. Saverio Mannino

Statistica di base Poznan 2006 In the last decade almost all official methods for food analysis report under the section “Precision “ one of the following sentence: “Repeatability (r): The difference between two single results found on identical test material by one analyst using the same apparatus within a short time interval should not exceed 2, 0% of the arithmetic mean of the results. “ 3 Prof. Saverio Mannino

Statistica di base Poznan 2006 REMEMBER BEFORE YOU CAN CONTROL YOU MUST BE ABLE TO MEASURE AND RESULTS OF YOUR MEASUREMENTS SHOULD BE RELIABLE 4 Prof. Saverio Mannino

Statistica di base Poznan 2006 or “Repeatability (r): The difference between two determinations carried out simultaneously or in rapid succession by the same analyst should not exceed 0. 6 g of oil per 100 g of sample. “ or “Repeatability (r): The difference between the results of two determinations, carried out simultaneously or in rapid succession by the same analyst using the same apparatus, shall not exceed 6% relative of the arithmetic mean of the results for both sodium and potassium. “ 5 Prof. Saverio Mannino

Statistica di base Poznan 2006 ACCURACY The accuracy of an analytical method describes the closeness of mean test results obtained by the method to the true value (concentration) of the analyte. • Accuracy is determined by replicate analysis of samples containing known amounts of the analyte. • Accuracy should be measured using a minimum of five determinations per concentration. A minimum of three concentrations in the range of expected concentrations is recommended. • The mean value should be within 15% of the actual value except at LLOQ, where it should not deviate by more than 20%. The deviation of the mean from the true value serves as the measure of accuracy. 6 Prof. Saverio Mannino

Statistica di base Poznan 2006 PRECISION The precision of an analytical method describes the closeness of individual measures of an analyte when the procedure is applied repeatedly to multiple aliquots of a single homogeneous volume of biological matrix. Precision should be measured using a minimum of five determinations per concentration. A minimum of three concentrations in the range of expected concentrations is recommended. The precision determined at each concentration level should not exceed 15% of the coefficient of variation (CV) except for the LLOQ, where it should not exceed 20% of the CV. . 7 Prof. Saverio Mannino

Statistica di base Poznan 2006 PRECISION Precision is further subdivided in: • repeatability or into within-run, intra-batch precision, which assesses precision during a single analytical run, and between-runs by the same analyst in the same laboratory • reproducibility or interbatch precision, which measures precision with time, and may involve different analysts, equipment, reagents, and laboratories. 8 Prof. Saverio Mannino

Statistica di base Poznan 2006 Distribuzione normale – Curva di Gauss = Sigma = = Deviation ( Square root of variance ) 7 6 5 4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 Axis graduated in Sigma between + / - 1 68. 27 % between + / - 2 95. 45 % 45500 ppm between + / - 3 99. 73 % 2700 ppm between + / - 4 99. 9937 % 63 ppm between + / - 5 99. 999943 % 0. 57 ppm between + / - 6 99. 9999998 % 0. 002 ppm result: 317300 ppm outside (deviation) 9 Prof. Saverio Mannino

Statistica di base Poznan 2006 Random e Systematic Errors A B C D 8 13 9 11 9 9 9 8 11 11 11 12 12 8 7 7 10 14 9 12 10 Prof. Saverio Mannino

Statistica di base Poznan 2006 Random e Systematic Errors A B C D 8 13 9 11 9 9 9 8 11 11 11 12 12 8 7 7 10 14 9 12 M = 10 M= 11 M= 9 M= 10 11 Prof. Saverio Mannino

Statistica di base Poznan 2006 Random e Systematic Errors A B C D 8 11 9 8 11 12 12 7 10 12 M = 10 M= 11 M= 9 M= 10 12 Prof. Saverio Mannino

Statistica di base Poznan 2006 Random e Systematic Errors A 11 8 12 7 12 M = 10 X - M +1 -2 +2 -3 +2 (X – M)2 1 4 4 9 4 M 2= 22 13 Prof. Saverio Mannino

Statistica di base Poznan 2006 Random e Systematic Errors A X - M (X – M)2 11 +1 1 8 -2 4 12 +2 4 7 +3 9 12 +2 4 M = 10 M 2= 22 Divided by 5 we have an average of 4. 5 and the square root will be 2. 3 is called standard deviation (s) Prof. Saverio Mannino 14

Statistica di base Poznan 2006 Random e Systematic Errors A 8 9 11 12 10 M = 10 s = 1. 6 B 13 9 11 8 14 M= 11 s = 2. 5 C 9 9 11 7 9 M= 9 s = 1. 4 D 11 8 12 7 12 M= 10 s = 2. 3 15 Prof. Saverio Mannino

Statistica di base Poznan 2006 Average and Standard Deviation Average Variance Standard Deviation Prof. Saverio Mannino 16

Statistica di base Poznan 2006 Example of data precision calculation A sample has been analysed 5 times and we want to calculate the SD: 17 Prof. Saverio Mannino

Statistica di base Poznan 2006 Accuracy and Precision The relationship of accuracy and precision may be illustrated by the familiar example of firing a rifle at a target where the black dots below represent hits on the target: You can see that good precision does not necessarily imply good accuracy. However, if an instrument is well calibrated, the precision or reproducibility of the result is a good measure of its accuracy. 18 Prof. Saverio Mannino

Statistica di base Poznan 2006 Confidence limits: Considering a data set of measurements (n) of which is known the average and the standard deviation is possible to calculate an interval, known as confidence interval, within is possible to find the true value at a predetermined probability level. The extreme values are called confidence limits and are calculated by the following relation: Where t is a t-Student that is found from tables. It depends on the degree of freedom anf on the chosen level of probability 19 Prof. Saverio Mannino

Statistica di base Poznan 2006 Standard error of the mean: while the standard deviation is a measure of the data variability, the standard error of the mean is a measure of how accurate is known the average of the population ( ). How can be seen it dependes on the number of the measures 20 Prof. Saverio Mannino

Statistica di base Poznan 2006 Confidence interval Example: Calculate the confidence interval (or limits) for the average at the probability level of 80, 95 e 99% for the following data set: 47. 64; 47. 69; 47. 52; 47. 55 M= 47. 60 s= 0. 08 = n-1= 4 -1= 3 21 Prof. Saverio Mannino

Statistica di base Poznan 2006 Example: Calculate the confidence interval (or limits) for the average at the probability level of 95 e 99% for the following data set: Average = m = 3. 203 and SD = 0. 077 How you can notice greater the level of probability greater are the confidence limits 22 Prof. Saverio Mannino

Statistica di base Poznan 2006 CALCULATION OF STANDARD DEVIATION d = duplicates K = number of duplicates 1 ST CASE: SD calculated from series of duplicate analysis made on the sample 2° CASE : SD calculated from series of duplicate analyses made on different but similar samples 23 Prof. Saverio Mannino

Statistica di base Poznan 2006 Example 1 7 duplicate analyses made on the sample 24 Prof. Saverio Mannino

Statistica di base Poznan 2006 Example 2 6 sample of similar but different sample of cheeses 25 Prof. Saverio Mannino

Statistica di base Poznan 2006 Factors for SD from range n d 2 2 1. 128 3 1. 693 4 2. 059 5 2. 326 For n=2 26 Prof. Saverio Mannino

Statistica di base Poznan 2006 or “Repeatability (r): The difference between two determinations carried out simultaneously or in rapid succession by the same analyst should not exceed 0. 6 g of oil per 100 g of sample. “ or “Repeatability (r): The difference between the results of two determinations, carried out simultaneously or in rapid succession by the same analyst using the same apparatus, shall not exceed 6% relative of the arithmetic mean of the results for both sodium and potassium. “ 27 Prof. Saverio Mannino

Statistica di base Poznan 2006 REPEATABILITY LIMIT Repeatabilty limit: t is the t di Student (2 tails) for = and for the 95% level of probability is t = 1, 96 2). r is the repeatability standard deviation. r is the maximum value that can be foreseen for the difference of two results obtained in condition of repeatability. 28 Prof. Saverio Mannino

Statistica di base Poznan 2006 REPEATABILITY LIMIT It came out from the formula that allows to calculate the standard deviation from duplicate measurements. When n= 1 29 Prof. Saverio Mannino

Statistica di base Poznan 2006 REPRODUCIBILITY LIMIT Reproducibility limit: Same as before but in condition of reproducibility 30 Prof. Saverio Mannino

Statistica di base Poznan 2006 STATISTICAL HYPOTHESES Frequently, we are required to make decision about the population based on data obtained from the samples Examples: We are interested to know if one lot is better than another or which supplier is the best. In order to reach such decision it is useful to start with guesses or assumptions about the two populations. Such assumption is called statistical hypothesis A procedure that leads to establish the truth or the falseness of an hypothesis is called statistical test or test of significance or test of hypotheses. 31 Prof. Saverio Mannino

Statistica di base Poznan 2006 STATISTICAL HYPOTHESES The hypothesis to be tested, denoted by H 0, is called Null Hypothesis, because it implies that there is no real difference between the true value of the population parameter and his hypothesized value from the sample. Example: If we wish to determine wheteher one process is better than another, we would formulate the hypothesis that there is no difference between the two processes. Any hypothesis that is different from the Null Hypothesis is denoted by H 1 and called Alternative Hypothesis 32 Prof. Saverio Mannino

Statistica di base Poznan 2006 STATISTICAL HYPOTHESES In testing statistical hypotheses there is no absolute certainty that the conclusion reached will be correct. AT the 5% level, we are willing to be wrong once in 20 times ( or 5 over 100) and at the level of 1% once in 100 times. α value 33 Prof. Saverio Mannino

Statistica di base Poznan 2006 STATISTICAL HYPOTHESES Two types of incorrect conclusion are possible. If the hypothesis is true and the sample selected say that is false, we say that we committed a type 1 error or error α If it happens that the hypothesis being tested is actually false, and if from the sample we reach the conclusion that is true, we say that a type 2 error has been committed. ( or error β) 34 Prof. Saverio Mannino

Statistica di base Poznan 2006 STATISTICAL HYPOTHESES There are 4 possibilities for the outcome of the test Real situation Ho true Ho false Correct decision P = 1 - α Error 2° type P= β Error 1° type P= α Correct decision P= 1 - β Test conclusions Ho false α is called producer’s risk and β consumer’s risk 35 Prof. Saverio Mannino

Statistica di base Poznan 2006 Test F The F-test is used to determine if two variance are significantly different. The F value is defined by the following relation: F = sa 2 / sb 2 where sa 2 = is the variance of population A estimated with na-1 degree of freedom sb 2 = is the variance of population B estimated with na -1 degree of freedom sa 2 is the greater variance 36 Prof. Saverio Mannino

Statistica di base Poznan 2006 Test F The value of F calculated is compared with that reportaed in the F-table. In performing the F-test we assume that the variance of the two population A and B are the same. Therefore the null hypothesis is Ho : A = B E and the alternative hypotesis is that the variances are different: H 1 : A B If the value of F calculated is less than that tabulated we conclude that there is no difference between the two variance otherwise is valid the alternative hypothesis 37 Prof. Saverio Mannino

Statistica di base Poznan 2006 T-test In the analytical practice is frequent the necessity to confront two means in order to see if they are different. In these cases can be utilized the t-test. Procedure • Formulate the null hypothesis • Choose the confidence level (p = 0. 05) • Calculate t from the data • Compare t with that tabulated If the t-value from the data is lower than that tabulated we can state that there is no significant difference between the two means 38 Prof. Saverio Mannino

Statistica di base Poznan 2006 Examples • A company produces frozen shrimp in packages labeled “Content 12 ounces”. A sample of four packages selected at random yields the following weights: 12. 2; 11. 6; 11. 8; 11. 6 At the 5% level, is the mean of the sample significantly different from the label claim of 12 ounces? 39 Prof. Saverio Mannino

Statistica di base Poznan 2006 Examples • Mean = 47. 2/4 = 11. 8 s = 0. 282 tcalculated = 1. 41 ttabulated = 3. 18 for 3 degree of freedom Since ttabulated is greater than the tcalculated there is no significant difference between M and X 40 Prof. Saverio Mannino

Statistica di base Poznan 2006 Examples • A new micropipette is supposed to dispense 20, 0 µl. To check the performance 5 measurements using an analytical balance were made. Here the results: 19. 6 - 19. 4 - 20. 1 - 19. 9 – 19. 5 At the 5% level, is the mean of the sample significantly different from the claim of 20 µl ? NO……. WHY 41 Prof. Saverio Mannino

Statistica di base Poznan 2006 Examples • M= 19. 7 s= 0. 291 tcalculated = 2. 3 ttabulated = 2. 776 for 5 degree of freedom Since ttabulated is greater than the tcalculated there is no significant difference between M and X 42 Prof. Saverio Mannino

Statistica di base Poznan 2006 Examples • M= 19. 7 s= 0. 291 X = 20 tcalculated = 2. 3 ttabulated = 2. 776 for 5 degree of freedom The value of the denominator is 0. 130 and it represent the standard error of the mean. This means that our result can be represented as: µ = 19. 7 ± 2. 776 (0. 13) Going from 19. 34 to 20. 06 And 20 is in this interval Prof. Saverio Mannino 43

Statistica di base Poznan 2006 44 Prof. Saverio Mannino

Statistica di base Poznan 2006 T paired test We analyzed a sample with two methods and we found the results reported below. We want to know if the two methods give similar results. 6. 1 5. 8 7. 0 6. 1 5. 8 6. 4 6. 1 6. 0 5. 9 5. 8 5. 9 5. 7 6. 1 5. 8 5. 9 5. 6 5. 9 5. 7 5. 6 45 Prof. Saverio Mannino

Statistica di base Poznan 2006 T paired test We first calculate the difference Σ d = 3. 2 and dmean= 0. 32 0. 1 0. 9 0. 3 -0. 1 0. 8 0. 5 0. 1 0. 2 And then we square the differences Σd 2 = 1. 94 0. 01 0. 81 0. 09 0. 1 0. 64 0. 25 0. 1 0. 04 46 Prof. Saverio Mannino

Statistica di base Poznan 2006 T paired test Σ d = 3. 2 and dmean= 0. 32 Σd 2 = 1. 94 and df = 9 Hypotesis: H 0 : a = b H 1 : a ≠ b α = 0. 05 • 47 Prof. Saverio Mannino

Statistica di base Poznan 2006 T paired test t from the table for 9 degree of freedom is : t = 2. 26 Being 3. 172 > 2. 26 We can conclude that the two methods at the stated level of 95% give different results 48 Prof. Saverio Mannino

Statistica di base Poznan 2006 ACCURACY The accuracy of an analytical method describes the closeness of mean test results obtained by the method to the true value (concentration) of the analyte. • Accuracy is determined by replicate analysis of samples containing known amounts of the analyte. • Accuracy should be measured using a minimum of five determinations per concentration. A minimum of three concentrations in the range of expected concentrations is recommended. • The mean value should be within 15% of the actual value except at LLOQ, where it should not deviate by more than 20%. The deviation of the mean from the true value serves as the measure of accuracy. 49 Prof. Saverio Mannino

Statistica di base Poznan 2006 RECOVERY The recovery of an analyte in an assay is the detector response obtained from an amount of the analyte added to and extracted from the biological matrix, compared to the detector response obtained for the true concentration of the pure authentic standard. Recovery pertains to the extraction efficiency of an analytical method within the limits of variability. Recovery of the analyte need not be 100%, but the extent of recovery of an analyte and of the internal standard should be consistent, precise, and reproducible. Recovery experiments should be performed by comparing the analytical results for extracted samples at three concentrations (low, medium, and high) with unextracted standards that represent 100% recovery. 50 Prof. Saverio Mannino

Statistica di base Poznan 2006 SELECTIVITY Selectivity is the ability of an analytical method to differentiate and quantify the analyte in the presence of other components in the sample. For selectivity, analyses of blank samples of the appropriate biological matrix (plasma, urine, or other matrix) should be obtained from at least six sources. Each blank sample should be tested for interference, and selectivity should be ensured at the lower limit of quantification (LLOQ). Potential interfering substances in a biological matrix include endogenous matrix components, metabolites, decomposition products, and in the actual study, concomitant medication and other exogenous xenobiotics. If the method is intended to quantify more than one analyte, each analyte should be tested to ensure that there is no interference. 51 Prof. Saverio Mannino

Statistica di base Poznan 2006 Calibration/Standard Curve A calibration (standard) curve is the relationship between instrument response and known concentrations of the analyte. A calibration curve should be generated for each analyte in the sample. A sufficient number of standards should be used to adequately define the relationship between concentration and response. A calibration curve should be prepared in the same biological matrix as the samples in the intended study by spiking the matrix with known concentrations of the analyte. The number of standards used in constructing a calibration curve will be a function of the anticipated range of analytical values and the nature of the analyte/response relationship. 52 Prof. Saverio Mannino

Statistica di base Poznan 2006 CALIBRATION CURVE Concentrations of standards should be chosen on the basis of the concentration range expected in a particular study. A calibration curve should consist of a blank sample (matrix sample processed without internal standard), a zero sample (matrix sample processed with internal standard), and six to eight non-zero samples covering the expected range, including LLOQ. 53 Prof. Saverio Mannino

Statistica di base Poznan 2006 1. Lower Limit of Quantification (LLOQ) The lowest standard on the calibration curve should be accepted as the limit of quantification if the following conditions are met: C The analyte response at the LLOQ should be at least 5 times the response compared to blank response. C Analyte peak (response) should be identifiable, discrete, and reproducible with a precision of 20% and accuracy of 80 -120%. . 54 Prof. Saverio Mannino

Statistica di base Poznan 2006 2. Calibration Curve/Standard Curve/Concentration-Response The simplest model that adequately describes the concentrationresponse relationship should be used. Selection of weighting and use of a complex regression equation should be justified. The following conditions should be met in developing a calibration curve: C #20% deviation of the LLOQ from nominal concentration C #15% deviation of standards other than LLOQ from nominal concentration At least four out of six non-zero standards should meet the above criteria, including the LLOQ and the calibration standard at the highest concentration. Excluding the standards should not change the model used. 55 Prof. Saverio Mannino

Statistica di base Poznan 2006 GLOSSARY ØAccuracy: The degree of closeness of the determined value to the nominal or known true value under prescribed conditions. This is sometimes termed trueness. ØAnalyte: A specific chemical moiety being measured, which can be intact drug, biomolecule or its derivative, metabolite, and/or degradation product in a biologic matrix. ØAnalytical run (or batch): A complete set of analytical and study samples with appropriate number of standards and QCs for their validation. Several runs (or batches) may be completed in one day, or one run (or batch) may take several days to complete. 56 Prof. Saverio Mannino

Statistica di base Poznan 2006 Ø Calibration standard: A biological matrix to which a known amount of analyte has been added or spiked. Calibration standards are used to construct calibration curves from which the concentrations of analytes in QCs and in unknown study samples are determined. ØInternal standard: Test compound(s) (e. g. structurally similar analog, stable labeled compound) added to both calibration standards and samples at known and constant concentration to facilitate quantification of the target analyte(s). ØLimit of detection (LOD): The lowest concentration of an analyte that the bioanalytical procedure can reliably differentiate from background noise. ØLower limit of quantification (LLOQ): The lowest amount of an analyte in a sample that can be quantitatively determined with suitable precision and accuracy. 57 Prof. Saverio Mannino

Statistica di base Poznan 2006 Ø Matrix effect: The direct or indirect alteration or interference in response due to the presence ofunintended analytes (for analysis) or oth er interfering substances in the sample. ØMethod: A comprehensive description of all procedures used in sample analysis. ØPrecision: The closeness of agreement (degree of scatter) between a series of measurements obtained from multiple sampling of the same homogenous sample under the prescribed conditions. ØQuantification range: The range of concentration, including ULOQ and LLOQ, that can be reliably and reproducibly quantified with accuracy and precision through the use of a concentrationresponse relationship. 58 Prof. Saverio Mannino

Statistica di base Poznan 2006 ØRecovery: The extraction efficiency of an analytical process, reported as a percentage of the known amount of an analyte carried through the sample extraction and processing steps of the method. ØReproducibility: The precision between two laboratories. It also represents precision of the method under the same operating conditions over a short period of time. ØBlank: A sample of a biological matrix to which no analytes have been added that is used to assess the specificity of the bioanalytical method. ØQuality control sample (QC): A spiked sample used to monitor the performance of abioanalytical method and to assess the integrity and validity of the results of the unknownsamples analyzed in an individual batch. 59 Prof. Saverio Mannino

Statistica di base Poznan 2006 ØSelectivity: The ability of the bioanalytical method to measure and differentiate the analytes in the presence of components that may be expected to be present. ØStability: The chemical stability of an analyte in a given matrix under specific conditions for given time intervals. ØStandard curve: The relationship between the experimental response value and the analytical concentration (also called a calibration curve). ØSystem suitability: Determination of instrument performance (e. g. , sensitivity and chromatographic retention) by analysis of a reference standard prior to running the analytical batch. ØUpper limit of quantification (ULOQ): The highest amount of an analyte in a sample that can be quantitatively determined with precision and accuracy. 60 Prof. Saverio Mannino

Statistica di base Poznan 2006 Validation ØFull validation: Establishment of all validation parameters to apply to sample analysis for the bioanalytical method for each analyte. ØPartial validation: Modification of validated bioanalytical methods that do not necessarily call for full revalidation. ØCross-validation: Comparison validation parameters of two bioanalytical methods. 61 Prof. Saverio Mannino

Statistica di base Poznan 2006 62 Prof. Saverio Mannino

Statistica di base Poznan 2006 Confidence interval The purpose of taking a random sample from a lot or population and computing a statistic, such as the mean from the data, is to approximate the mean of the population. How well the sample statistic estimates the underlying population value is always an issue. A confidence interval addresses this issue because it provides a range of values which is likely to contain the population parameter of interest. 63 Prof. Saverio Mannino

Statistica di base Poznan 2006 Confidence intervals are constructed at a confidence level, such as 95%, selected by the user. What does this mean? It means that if the same population is sampled on numerous occasions and interval estimates are made on each occasion, the resulting intervals would bracket the true population parameter in approximately 95% of the cases. A confidence stated at a 1 -alfa level can be thought of as the inverse of a significance level, alfa. 64 Prof. Saverio Mannino

Statistica di base Poznan 2006 Confidence interval In the same way that statistical tests can be one or two-sided, confidence intervals can be one or two-sided. A two-sided confidence interval brackets the population parameter from above and below. A one-sided confidence interval brackets the population parameter either from above or below and furnishes an upper or lower bound to its magnitude. Example of a two-sided confidence interval 65 Prof. Saverio Mannino

Statistica di base Poznan 2006 Confidence interval For example, a 100( )% confidence interval for the mean of a normal population is; where is the sample mean, is the upper critical value of the standard normal distribution which is found in the table of the standard normal distribution, is the known population standard deviation, and N is the sample size. 66 Prof. Saverio Mannino

Statistica di base Poznan 2006 What is the relationship between a test and a confidence interval? In general, for every test of hypothesis there is an equivalent statement about whether the hypothesized parameter value is included in a confidence interval. For example, consider the previous example of linewidths where photomasks are tested to ensure that their linewidths have a mean of 500 micrometers. The null and alternative hypotheses are: H 0: mean linewidth = 500 micrometers Ha: mean linewidth # 500 micrometers 67 Prof. Saverio Mannino

Statistica di base Poznan 2006 What is the relationship between a test and a confidence interval? For the test, the sample mean, , is calculated from N linewidths chosen at random positions on each photomask. For the purpose of the test, it is assumed that the standard deviation, , is known from a long history of this process. A test statistic is calculated from these sample statistics, and the null hypothesis is rejected if: where is a tabled value from the normal distribution. 68 Prof. Saverio Mannino

Statistica di base Poznan 2006 What is the relationship between a test and a confidence interval? With some algebra, it can be seen that the null hypothesis is rejected if and only if the value 500 micrometers is not in the confidence interval Equivalent confidence interval In fact, all values bracketed by this interval would be accepted as null values for a given set of test data. 69 Prof. Saverio Mannino

Statistica di base Poznan 2006 STATISTICAL HYPOTHESES A statistical test provides a mechanism for making quantitative decisions about a process or processes. The intent is to determine whethere is enough evidence to "reject" a conjecture or hypothesis about the process. The conjecture is called the null hypothesis. Not rejecting may be a good result if we want to continue to act as if we "believe" the null hypothesis is true. Or it may be a disappointing result, possibly indicating we may not yet have enough data to "prove" something by rejecting the null hypothesis. 70 Prof. Saverio Mannino

Statistica di base Poznan 2006 STATISTICAL HYPOTHESES A classic use of a statistical test occurs in process control studies. For example, suppose that we are interested in ensuring that photomasks in a production process have mean linewidths of 500 micrometers. The null hypothesis, in this case, is that the mean linewidth is 500 micrometers. Implicit in this statement is the need to flag photomasks which have mean linewidths that are either much greater or much less than 500 micrometers. This translates into the alternative hypothesis that the mean linewidths are not equal to 500 micrometers. This is a two-sided alternative because it guards against alternatives in opposite directions; namely, that the linewidths are too small or too large. 71 Prof. Saverio Mannino

Statistica di base Poznan 2006 STATISTICAL HYPOTHESES The testing procedure works this way. Line widths at random positions on the photomask are measured using a scanning electron microscope. A test statistic is computed from the data and tested against pre-determined upper and lower critical values. If the test statistic is greater than the upper critical value or less than the lower critical value, the null hypothesis is rejected because there is evidence that the mean linewidth is not 500 micrometers. 72 Prof. Saverio Mannino

Statistica di base Poznan 2006 STATISTICAL HYPOTHESES Null and alternative hypotheses can also be one-sided. For example, to ensure that a lot of light bulbs has a mean lifetime of at least 500 hours, a testing program is implemented. The null hypothesis, in this case, is that the mean lifetime is greater than or equal to 500 hours. The complement or alternative hypothesis that is being guarded against is that the mean lifetime is less than 500 hours. The test statistic is compared with a lower critical value, and if it is less than this limit, the null hypothesis is rejected. Thus, a statistical test requires a pair of hypotheses; namely, H 0: a null hypothesis Ha: an alternative hypothesis. 73 Prof. Saverio Mannino

Statistica di base Poznan 2006 STATISTICAL HYPOTHESES The null hypothesis is a statement about a belief. We may doubt that the null hypothesis is true, which might be why we are "testing" it. The alternative hypothesis might, in fact, be what we believe to be true. The test procedure is constructed so that the risk of rejecting the null hypothesis, when it is in fact true, is small. This risk, , is often referred to as the significance level of the test. By having a test with a small value of , we feel that we have actually "proved" something when we reject the null hypothesis. 74 Prof. Saverio Mannino

Statistica di base Poznan 2006 STATISTICAL HYPOTHESES The risk of failing to reject the null hypothesis when it is in fact false is not chosen by the user but is determined, as one might expect, by the magnitude of the real discrepancy. This risk, , is usually referred to as the error of the second kind. Large discrepancies between reality and the null hypothesis are easier to detect and lead to small errors of the second kind; while small discrepancies are more difficult to detect and lead to large errors of the second kind. Also the risk increases as the risk decreases. 75 Prof. Saverio Mannino