Lecture 4_ENG_2014.pptx

- Количество слайдов: 44

Lecture 4 Introductory Econometrics INTRODUCTION TO LINEAR REGRESSION MODEL II September 27, 2014

On previous lecture • We studied PRM vs SRM • We listed the classical assumptions of regression models: • • • model linear in parameters, explanatory variables linearly independent (normally distributed) error term with zero mean and constant variance, no serial autocorrelation no correlation between error term and explanatory variables • We saw that if the assumptions hold, OLS estimate is • • consistent unbiased efficient normally distributed

On today’s lecture • We will show that under these assumption, OLS is the best estimator available for regression models • See the distribution, mean and variance of linear regression model parameters • We are going to discuss how hypothesis about coefficients can be tested in regression models • We will explain what significance of coefficients means • We will learn how to read regression output

Distribution of parameters .

Probability density function for linear regression model parameters with Normal distribution PDF for b 1 b 1 2

1. Mean of (1) 0 .

1. Mean of (2) BLUE -linear estimation random variable . =0 =1 Demonstrate

1. Mean of Random variable . (3) Random variable =0 BLUE

2. Variance of Random variable (1) Random Variable Square RHS &LHS and take the expected value .

2. Variance (2) =0.

2. Variance - minimum VARIANCE. BLUE

Now we will study 1. How to check coefficients for significance 2. Construct confidence intervals for parameters

We demonstrated that the distribution of .

If are normally distributed, then i=0, 1. k- number of estimated model parameters k=2

PDF for Student’s t-distribution f(t) Region of acceptance Critical or rejection region 0 t Probability of getting into the critical region 2

Significance of the coefficients i=0, 1 Rule: reject null hypothesis H 0 if i=0, 1 . at significance level

PDF for Student’s t-distribution f(t) Critical or rejection region 0 t Probability of getting into the critical region 2

Example from E-Views Real Consumption (CONS 1); Real GDP (GDP 1) Dependent Variable: CONS 1 Method: Least Squares Sample: 2003: 1 2007: 4 Included observations: 16 Variable Coefficient Std. Error t-Statistic Prob. GDP 1 0. 106038 0. 027689 3. 829529 0. 0018 C 4. 695067 0. 814695 5. 762976 0. 0000 R-squared 0. 511604 Mean dependent var 7. 7125 Adjusted R-squared 0. 476719 S. D. dependent var 1. 1451 S. E. of regression 0. 828308 Akaike info criterion 2. 5777 Sum squared resid 9. 605324 Schwarz criterion 2. 6742 F-statistic 14. 666 Prob(F-statistic) 0. 0018 Log likelihood Durbin-Watson stat -18. 62085 1. 061248

Приклад CONS 1 = 4. 7 + 0. 11 GDP 1 (0. 81) s. e. (0. 028) RULE: Reject H 0, if i=0, 1 . with significance level Reject H 0 Conclusion: Intercept is statistically significant

Check for the slope CONS 1 = 4. 7 + 0. 11 GDP 1 (0. 81) . (0. 028) s. e.

Type I and Type II Errors • It would be unrealistic to think that conclusions drawn from regression analysis will always be right • There are two type of errors we can make: – Type I: We reject a true null hypothesis – Type II: We do not reject a false null hypothesis Example: H 0: HA : Type I error: it holds that Type II error: it holds that we conclude that

Type I and Type II Errors Correct Accept True decision (1 -a)- significance level Type I error Reject. We reject a true null hypothesis Probability of Error Type 1 error = α – significance level Incorrect Type 2 error We do not reject a false null hypothesis Probability of accepting incorrect H 0 = β True decision Probability = 1 -β = power of the test

Type I and Type II Errors Example: • H 0 : The defendant is innocent • HA : The defendant is guilty • Type I error = Sending an innocent person to jail • Type II error = Freeing a guilty person • Obviously, lowering the probability of Type I error means increasing the probability of Type II error • In hypothesis testing, we focus on Type I error and we ensure that its probability is not unreasonably large

Decision Rule • A sample statistic must be calculated that allows the null hypothesis to be rejected or not depending on the magnitude of that sample statistic compared with a preselected critical value found in tables • The critical value divides the range of possible values of the statistic into two regions: acceptance region and rejection region • The idea is that if the value of the coefﬁcient is not such as stated under H 0, the value of the sample statistic should not fall into the rejection region • If the value of the sample statistic falls into the rejection region, we reject H 0

Two-sided rejection region • H 0: vs HA : • Distribution of :

P-value Concept (probability value) p-value . -the lowest level of significance at which we reject H 0

Two-sided t-test We reject H 0 if

Example from Stata

Example: P-value for intercept = 0. 000 It means that at any significance level we reject H 0 for intercept, including 10% , 5 % and 1%. . . Conclusion Intercept is statistically significant

Example: i=0, 1 RULE: Reject H 0 , if i=0, 1 At. significance level -significance level is usually given - critical value

PDF for Student’s t-distribution f(t) Critical Region 0 t Critical Value 2

+ . =

Example i=0, 1 Rule: Reject H 0, if i=0, 1 s. e. (0. 1397). (0. 01958) - critical value

Example s. e. (0. 1397) . (0. 01958)

PDF of Student’s t-distribution f(t) Critical region 0 t Critical value=3. 18 2

Simplified test of statistical significance of parameters 1. For (n – k) > 8 critical value RULE: Reject If . k- number of estimated parameters of the model n - number of observations i=0, 1

Test for parameters’ statistical significance 2. When n > 30 RULE: We reject : at α = 10% , if at α = 5% , if. at α = 1% , if i=0, 1

Confidence intervals construction for parameters

A 95% confidence interval of such that is an interval centered around with probability 95% PDF of Student’s t-distribution f(t) (1 -α) 0 Critical region t 2

.

.

Exercise • Construct confidence intervals for parameter

The first practical example how to built the simple regression model and choose the correct functional form. bananas income (lbs) ($10, 000) household Y X 1 2 3 4 5 6 7 8 9 10 1. 71 6. 88 8. 25 9. 52 9. 81 11. 43 11. 09 10. 87 12. 15 10. 94 1 2 3 4 5 6 7 8 9 10 Suppose that you have data on annual consumption of bananas and annual income for a sample of 10 households. You need specify the relationships between consumption of bananas and annual income. 8

Here is the output from a linear regression. . reg Y X Source | SS df MS -----+---------------Model | 58. 8774834 1 58. 8774834 Residual | 27. 003764 8 3. 3754705 -----+---------------Total | 85. 8812475 9 9. 54236083 Number of obs F( 1, 8) Prob > F R-squared Adj R-squared Root MSE = = = 10 17. 44 0. 0031 0. 6856 0. 6463 1. 8372 ---------------------------------------Y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -----+----------------------------------X |. 8447878. 2022741 4. 176 0. 003. 378343 1. 311233 _cons | 4. 618667 1. 255078 3. 680 0. 006 1. 724453 7. 512881 --------------------------------------- As you would expect from seeing the scatter diagram, the coefficient of X is highly significant, and the fit, as measured by R 2, is quite good. 10