Скачать презентацию 2 DS 00 Statistics 1 for Chemical Engineering Скачать презентацию 2 DS 00 Statistics 1 for Chemical Engineering

922bff4bd6428c46bb7a14372ebd4d74.ppt

  • Количество слайдов: 50

2 DS 00 Statistics 1 for Chemical Engineering /k 2 DS 00 Statistics 1 for Chemical Engineering /k

Lecturers • Dr. A. Di Bucchianico – Department of Mathematics, – Statistics group – Lecturers • Dr. A. Di Bucchianico – Department of Mathematics, – Statistics group – HG 9. 24 – phone (040) 247 2902 – a. d. [email protected] nl • Ir. G. D. Mooiweer, • Dr. R. W. van der Hofstad – Department of Mathematics, – ICTOO – Statistics group – HG 9. 12 – HG 9. 04 – phone 040 247 4277 (Thursdays) – phone (040) 247 2910 – – [email protected] tue. nl g. d. [email protected] nl /k

Goals of this course • to prepare students for (first-year) laboratory assignments • to Goals of this course • to prepare students for (first-year) laboratory assignments • to learn students how to perform basic statistical analyses of experiments • to learn students how to use software for data analysis • to learn students how to avoid pitfalls in analysing measurements /k

Important to remember • Web site for this course: www. win. tue. nl/~sandro/2 DS Important to remember • Web site for this course: www. win. tue. nl/~sandro/2 DS 00/ • No textbook, but handouts (Word) + Powerpoint sheets through web site • Bring notebook to both lectures and self-study • (Optional) buy lecture notes 2256 “Statgraphics voor regulier onderwijs” • (Optional) buy lectures notes 2218 “Statistisch Compendium” /k

How to study • read lecture notes briefly before lecture • ask questions during How to study • read lecture notes briefly before lecture • ask questions during lecture • study lecture notes carefully after lecture • make excercises during guided self-study • reread lecture notes after guided self-study • try out previous examinations shortly before the examination N. B. Lecture notes (pdf documents) Power. Point files /k

Week schedule Week 1: Measurement and statistics Week 2: Error propagation Week 3: Simple Week schedule Week 1: Measurement and statistics Week 2: Error propagation Week 3: Simple linear regression analysis Week 4: Multiple linear regression analysis Week 5: Nonlinear regression analysis /k

Detailed contents of week 1 • measurement errors • graphical displays of data • Detailed contents of week 1 • measurement errors • graphical displays of data • summary statistics • normal distribution • confidence intervals • hypothesis testing /k

Measurements and statistics • perfect measurements do not exist • possible sources of measurement Measurements and statistics • perfect measurements do not exist • possible sources of measurement errors: – reading – environment • temperature • humidity • . . . – impurities –. . . /k

Necessity of good measurement system /k Necessity of good measurement system /k

Three experiments /k Three experiments /k

Types of measurement errors • Random errors – always present – reduce influence by Types of measurement errors • Random errors – always present – reduce influence by averaging repeated measurements • Systematic errors – requires adjustment/repair of measuring devices • Outliers – recording errors – mistakes in applying procedures /k

Illustration of measurement concepts /k Illustration of measurement concepts /k

Accuracy difference between average of measured values and true value /k Accuracy difference between average of measured values and true value /k

Accuracy • relates to systematic errors • absolute error • relative error /k Accuracy • relates to systematic errors • absolute error • relative error /k

Location statistics • mean • median • trimmed means /k Location statistics • mean • median • trimmed means /k

Precision the degree in which consistent results are obtained /k Precision the degree in which consistent results are obtained /k

Accurate and precise /k Accurate and precise /k

Statistics for precision: standard deviation & co • standard deviation • standard error • Statistics for precision: standard deviation & co • standard deviation • standard error • variation coefficient • variance • range /k

Robust statistics for precision • robust statistics – less sensitive to outliers – difficult Robust statistics for precision • robust statistics – less sensitive to outliers – difficult mathematical theory – requires use of statistical software • interquartile range – IQR = 75% quantile – 25% quantile = 3 rd quartile – 1 st quartile • mean absolute deviation /k

Graphical displays • always make graphical displays for first impression • “one picture says Graphical displays • always make graphical displays for first impression • “one picture says more than 1000 words” 2 3. 1 4 1. 9 2. 8 /k

Basic graphical displays • scatter plot – watch out for scale (automatic resizing) • Basic graphical displays • scatter plot – watch out for scale (automatic resizing) • time sequence plot – for detecting time effects like warming up • Box-and-Whisker plot – outliers – quartiles – skewness /k

Time sequence plot /k Time sequence plot /k

Box-and-Whisker plot /k Box-and-Whisker plot /k

/k /k

Probability theory (cumulative) distribution function density to distribution function /k Probability theory (cumulative) distribution function density to distribution function /k

The concept of probability density function a b area denotes probability that observation falls The concept of probability density function a b area denotes probability that observation falls between a and b /k

Normal distribution /k Normal distribution /k

Normal distribution bell shaped curve Important because of Central Limit Theorem Normal distribution • Normal distribution bell shaped curve Important because of Central Limit Theorem Normal distribution • symmetric around µ (location of centre) • spread parametrised by 2 – http: //www. win. tue. nl/~marko/stat. Applets/function. Plots. html – http: //www-stat. stanford. edu/~naras/jsm/Normal. Density. html • µ=0 and 2=1: standard normal distribution Z /k

More on normal distribution Area between 0, 67 is 0, 500 1, 00 is More on normal distribution Area between 0, 67 is 0, 500 1, 00 is 0, 683 1, 645 is 0, 975 1, 96 is 0, 950 2, 00 is 0, 954 2, 33 is 0, 980 2, 58 is 0, 990 3, 00 is 0, 997 /k

Standardisation X normally distributed with parameters en 2, then (X- )/ standard normal suppose Standardisation X normally distributed with parameters en 2, then (X- )/ standard normal suppose =3 2=4 /k

Testing normality • many statistical procedures implicitly assume normality • if data are not Testing normality • many statistical procedures implicitly assume normality • if data are not normally distributed, then outcome of procedure may be completely wrong • user is always responsible for checking assumptions of statistical procedures • Graphical checks: – normal probability plot – density trace • Formal check – Shapiro-Wilks test /k

Estimation of density function: histogram curve: normal distribution with sample mean and variance as Estimation of density function: histogram curve: normal distribution with sample mean and variance as parameters /k

Drawbacks of the histogram • misused for investigating normality • time ordering of data Drawbacks of the histogram • misused for investigating normality • time ordering of data is lost • shape depends heavily on bin width + bin location: Histogram for strength 5 frequency 4 same data set 3 2 1 0 24 29 34 39 44 49 54 strength • shape is stable for data sets of size 75 or larger • optimal number of bins n /k

Alternative to histogram: Density Trace (also called naive density estimator): • use moving bins Alternative to histogram: Density Trace (also called naive density estimator): • use moving bins instead of fixed bins • choose bin width (automatically in Statgraphics) • count number of observations in bin at each point • divide by length of bin /k

Density Trace Example dataset: 3. 45 1. 98 2. 92 4. 67 1. 07 Density Trace Example dataset: 3. 45 1. 98 2. 92 4. 67 1. 07 5. 34 3. 24 2. 41 3. 93 4/9 3/9 2/9 1 2 3 4 /k 5 6 *

Choice of bin widths in density trace • too small bin width yields too Choice of bin widths in density trace • too small bin width yields too fluctuating curve • too large bin width yields too smooth curve /k

Patterns in distribution – normal curve • Depicted by a bell-shaped curve • Indicates Patterns in distribution – normal curve • Depicted by a bell-shaped curve • Indicates that measurement process is running normally /k

Patterns in distribution – bi-modal curve • Distribution appears to have two peaks • Patterns in distribution – bi-modal curve • Distribution appears to have two peaks • May indicate that data from more than process are mixed together /k

Patterns in distribution – saw-toothed Also commonly referred to as a comb distribution, appears Patterns in distribution – saw-toothed Also commonly referred to as a comb distribution, appears as an alternating jagged pattern Often indicates a measuring problem – improper gauge readings – gauge not sensitive enough for readings /k

Testing normality /k Testing normality /k

Normal Probability Plot /k Normal Probability Plot /k

Normally distributed? /k Normally distributed? /k

Normal Probability Plot of not normally distributed data /k Normal Probability Plot of not normally distributed data /k

Test for Normality: Shapiro-Wilks • statistical test for Normality: Shapiro-Wilks • idea: sophisticated regression Test for Normality: Shapiro-Wilks • statistical test for Normality: Shapiro-Wilks • idea: sophisticated regression analysis in the spirit of normal probability plot • makes Normal Probability Plot objective • check outliers (measurement error? ; normality sometimes disturbed by single observation) • analyse if not normally distributed /k

Statgraphics: Shapiro Wilks Tests for Normality for width Computed Chi-Square goodness-of-fit statistic = 254. Statgraphics: Shapiro Wilks Tests for Normality for width Computed Chi-Square goodness-of-fit statistic = 254. 667 P-Value = 0. 0 Shapiro-Wilks W statistic = 0. 921395 P-Value = 0. 000722338 Interpretation: • value statistic itself cannot need be interpreted • P-value indicates how likely normal distribution is • use = 0. 01 as critical value in order to avoid too strict rejections of normality /k

Dixon’s test • Box-and-Whisker plot graphical test of outliers • if data are normally Dixon’s test • Box-and-Whisker plot graphical test of outliers • if data are normally distributed, then formal test may be used: /k

Disadvantages of point estimators /k Disadvantages of point estimators /k

Confidence intervals • 95% confidence interval for µ: probability 0. 95 that interval contains Confidence intervals • 95% confidence interval for µ: probability 0. 95 that interval contains true value µ • more observations narrower interval (effect in particular for n < 20) • higher confidence wider interval • example : =0, 05 /k

Confidence intervals: example /k Confidence intervals: example /k

Hypothesis testing • example: test whethere is a systematic error Hypothesis Tests for meting Hypothesis testing • example: test whethere is a systematic error Hypothesis Tests for meting Sample mean = 4. 994 Sample median = 5. 01 t-test -----Null hypothesis: mean = 5. 0 Alternative: not equal Computed t statistic = -0. 155011 P-Value = 0. 880233 Do not reject the null hypothesis for alpha = 0. 05. /k