Topic 4.ppt
- Количество слайдов: 52
Introduction to Business Management: Statistics Class 3 Topic 4. Descriptive statistics Measures of central tendency Variation and shape
In this topic you learn l To describe the properties of central tendency, variation, and shape in numerical data l To calculate descriptive summary measures for a population l To construct and interpret a boxplot
Planned 4. 1 Measures of central tendency and characteristics of the distribution center. 4. 2 Measures of variation and the shape of the distribution. Z-scores. 4. 3 Characteristics of the population: parameters. 4. 4 The empirical rule and the Chebyshev Rule. 4. 5 Characteristics of samples: parameters estimation.
Live Exercise Ethan and Cindy decide to buy a car. They’ve visited 7 markets and tested 7 cars Toyota and KIA (the same model). They put 10 liters of gasoline and drive on a closed track until it runs out gas. The results are: T 228 223 178 220 233 271 K 277 164 326 215 259 217 321 Which model is better? Please, take your calculators and help!
4. 1 Measures of central tendency and characteristics of the distribution center The central tendency is the extent to which all the data values group around a typical or central value. Central Tendency Measures Mean Median Mode
The Mean The arithmetic mean is the most common measure of central tendency For any sample of size n: Individual values Mean value
The Mean: example Mean value: Shift Number of defects 1 2 4 7 2 5 5 2 3 4 6 4
The Median In an ordered array, the median is the “middle” number (50% above, 50% below) Number of defects 2 2 Median position (not the value!) can be found as (n+1)/2 4 4 Median = (4+4)/2 5 7
The Mode The mode is the value that occurs most often l l l Used for either numerical or categorical (nominal) data There may be no mode There may be several modes Number of defects 2 2 Mode = 2, Mode = 4 4 4 5 7
Which Measure to Choose? § The mean is generally used, unless extreme values (outliers) exist. § The median is often used, since the median is not sensitive to extreme values. For example, median home prices may be reported for a region; it is less sensitive to outliers. § In some situations it makes sense to report both the mean and the median. Mean, …
Shape of a distribution and central tendency measures The shape is the pattern of the distribution of values from the lowest value to the highest value.
Shape of a distribution and central tendency measures Left-Skewed Symmetric Right-Skewed Mean < Median Mean = Median < Mean Skewness Statistic <0 0 >0
Planned 4. 1 Measures of central tendency and characteristics of the distribution center. 4. 2 Measures of variation and the shape of the distribution. Z-scores. 4. 3 Characteristics of the population: parameters. 4. 4 The empirical rule and the Chebyshev Rule. 4. 5 Characteristics of samples: parameters estimation.
Variation is… The term variability will be taken to mean the [varying] characteristic of the entity that is observable, and the term variation to mean the describing or measuring of that characteristic. Example: we’ll take 50 bottles of water (50 ml) and measure the content. In each bottle we’ll found from 48 till 51 ml of water. Different in each bottle. It’s variability of the characteristics (content). We can describe it using mean value and individual differences (variation).
4. 2 Measures of variation Variation Range Variance Measures of variation give information on the spread or variability or dispersion of the data values. Same center, different variation Standard Coefficient of Deviation Variation
The Range Difference between the largest and the smallest values: Range = Xlargest – Xsmallest § § Ignores the way in which data are distributed Sensitive to outliers
The Variance Average of squared deviations of values from the mean: Sample Variance Population Variance
The Standard Deviation Shows variation about the mean Has the same units as the original data : Sample Standard Deviation Population Standard Deviation
The Standard Deviation Steps for Computing Standard Deviation 1. Compute the difference between each value and the mean. 2. Square each difference. 3. Add the squared differences. 4. Divide this total by n-1 to get the sample variance. 5. Take the square root of the sample variance to get the sample standard deviation.
Measures of Variation: § § The more the data are spread out, the greater the range, variance, and standard deviation. The more the data are concentrated, the smaller the range, variance, and standard deviation. If the values are all the same (no variation), all these measures will be zero. None of these measures are ever negative.
The Coefficient of Variation Measures relative variation Always in percentage (%) Shows variation relative to mean Can be used to compare the variability of two or more sets of data measured in different units
4. 2 Measures of variation and the shape of the distribution. Smaller standard deviation Larger standard deviation
Z-scores The Z-score is the number of standard deviations a data value is from the mean. § § § To compute the Z-score of a data value, subtract the mean and divide by the standard deviation. A data value is considered an extreme outlier if its Zscore is less than -3. 0 or greater than +3. 0. The larger the absolute value of the Z-score, the farther the data value is from the mean.
Z-scores: example Suppose the mean math SAT score is 490, with a standard deviation of 100. Compute the Z-score for a test score of 620. A score of 620 is 1. 3 standard deviations above the mean and would not be considered an outlier.
Shape of a Distribution Describes how data are distributed Two useful shape related statistics are: l Skewness l l Measures the amount of asymmetry in a distribution Kurtosis l Measures the relative concentration of values in the center of a distribution as compared with the tails
Shape of a distribution Skewness Left-Skewed Symmetric Right-Skewed Mean < Median Mean = Median < Mean Skewness Statistic <0 0 >0
Shape of a Distribution Kurtosis Describes relative concentration of values in the center as compared to the tails Flatter Than Bell-Shaped Kurtosis Statistic <0 Bell-Shaped 0 Sharper Peak Than Bell-Shaped >0
The Five Number Summary The five numbers that help describe the center, spread and shape of data are: Xsmallest First Quartile (Q 1) Median (Q 2) Third Quartile (Q 3) Xlargest
Quartile Measures l Quartiles split the ranked data into 4 segments with an equal number of values per segment 25% Q 1 n n n 25% Q 2 25% Q 3 The first quartile, Q 1, is the value for which 25% of the observations are smaller and 75% are larger Q 2 is the same as the median (50% of the observations are smaller and 50% are larger) Only 25% of the observations are greater than the third quartile
The Interquartile Range X minimum Q 1 25% 12 Median (Q 2) 25% 30 25% 45 X Q 3 maximum 25% 57 Interquartile range 70
Five Number Summary and The Boxplot: A Graphical display of the data based on the five-number summary: Xsmallest -- Q 1 -- Median -- Q 3 -- Xlargest 25% of data Xsmallest 25% of data Q 1 25% of data Median 25% of data Q 3 Xlargest
Five Number Summary: Shape of Boxplots If data are symmetric around the median the box and central line are centered between the endpoints Xsmallest Q 1 Median Q 3 Xlargest A Boxplot can be shown in either a vertical or horizontal orientation
Distribution Shape and The Boxplot Left-Skewed Q 1 Q 2 Q 3 Symmetric Q 1 Q 2 Q 3 Right-Skewed Q 1 Q 2 Q 3
Boxplot Example Below is a Boxplot for the following data: Xsmallest 0 2 Q 1 2 Q 2 2 3 3 Q 3 4 5 5 Xlargest 9 0 2 3 5 27 The data are right skewed, as the plot depicts 27
Planned 4. 1 Measures of central tendency and characteristics of the distribution center. 4. 2 Measures of variation and the shape of the distribution. Z-scores. 4. 3 Characteristics of the population: parameters. 4. 4 The empirical rule and the Chebyshev Rule. 4. 5 Characteristics of samples: parameters estimation.
4. 3 Characteristics of the population: parameters. Summary measures describing a population, called parameters, are denoted with Greek letters. Important population parameters are the population mean, variance, and standard deviation.
The Variance Average of squared deviations of values from the mean: Sample Variance Population Variance
The Standard Deviation Shows variation about the mean Has the same units as the original data : Sample Standard Deviation Population Standard Deviation
Sample statistics versus population parameters
Planned 4. 1 Measures of central tendency and characteristics of the distribution center. 4. 2 Measures of variation and the shape of the distribution. Z-scores. 4. 3 Characteristics of the population: parameters. 4. 4 The empirical rule and the Chebyshev Rule. 4. 5 Characteristics of samples: parameters estimation.
The Empirical Rule The empirical rule approximates the variation of data in a bell-shaped distribution Approximately 68% of the data in a bell shaped distribution is within 1 standard deviation of the mean or µ ± 1σ 68%
The Empirical Rule Approximately 95% of the data in a bell-shaped distribution lies within two standard deviations of the mean, or µ ± 2σ Approximately 99. 7% of the data in a bell-shaped distribution lies within three standard deviations of the mean, or µ ± 3σ 95% 99. 7%
Using the Empirical Rule § Suppose that the variable Math SAT scores is bell-shaped with a mean of 500 and a standard deviation of 90. Then, § 68% of all test takers scored between 410 and 590 (500 ± 90). § 95% of all test takers scored between 320 and 680 (500 ± 180). § 99. 7% of all test takers scored between 230 and 770 (500 ± 270).
Chebyshev Rule Regardless of how the data are distributed, at least (1 - 1/k 2) x 100% of the values will fall within k standard deviations of the mean (for k > 1) l Examples: At least within (1 - 1/22) x 100% = 75% …. . . . k=2 (μ ± 2σ) (1 - 1/32) x 100% = 89% ………. k=3 (μ ± 3σ)
Planned 4. 1 Measures of central tendency and characteristics of the distribution center. 4. 2 Measures of variation and the shape of the distribution. Z-scores. 4. 3 Characteristics of the population: parameters. 4. 4 The empirical rule and the Chebyshev Rule. 4. 5 Characteristics of samples: parameters estimation.
4. 5 Characteristics of samples: parameters estimation The goal in sampling is to obtain individuals for a study in such a way that accurate information about the population can be obtained. Statistical inference is used for this purpose – special instrument, which allows to use information from a sample to draw conclusions about a population.
Statistics in business Inferential Statistics Descriptive Statistics SAMPLES POPULATIONS PROBABILITY Point estimation Statistics = parameters estimates Interval estimation Parameters, estimated using statistics
Assess your understanding l l l Describe the relationship between variance and standard deviation. True of False? Chebyshev’s inequality applies to all distributions, but the Empirical Rule holds only for distributions that are bellshaped. Describe how the mean and median can be used to determine the shape of the distribution.
In the Topic 4 we have Described measures of central tendency l Mean, median, mode Described measures of variation l Range, interquartile range, variance and standard deviation, coefficient of variation, Z-scores Illustrated shape of distribution l Skewness & Kurtosis Described data using the 5 -number summary l Boxplots
Course content 2. Sampling methods 4. Variation 1. Data Collection 3. Data Visualization 5. Sampling distributions 7. Inferential statistics- 1 6. Hypothesis testing basic Final Test
Home Reading LSKB pages 112 -159 Ex. 3. 50 -3. 61 (page 151) – to check your understanding Home assignment #2
Home assignment #2 1. 2. 3. 4. 5. 6. Choose your favorite book (for example, Statistics for managers, by Levine at all…) Register prices for the book in 30 on-line shops, provide ordered array for prices and construct steam-and leaf plot. Compute the five-number summary. Draw a box plot of the data. Check the data set for outliers. Provide a short resume for your investigations (200 -300 words, not more).
Topic 4.ppt