Economics 173 Business Statistics Lecture 18 Fall 2001

Economics 173 Business Statistics Lecture 18 Fall, 2001 Professor J. Petry http: //www. cba. uiuc. edu/jpetry/Econ_173_fa 0 1/

17. 9 Regression Diagnostics - I • The three conditions required for the validity of the regression analysis are: – the error variable is normally distributed. – the error variance is constant for all values of x. – The errors are independent of each other. • How can we diagnose violations of these conditions? • For now we will use visual inspection, soon we will conduct formal tests to analyze these conditions. 2

• One Other Issue Before Using Equation: Outliers – An outlier is an observation that is unusually small or large. – Several possibilities need to be investigated when an outlier is observed: • There was an error in recording the value. • The point does not belong in the sample. • The observation is valid. – Identify outliers from the scatter diagram. – It is customary to suspect an observation is an outlier if its |standard residual| > 2 3

An outlier An influential observation + + + + +++++ … but, some outliers may be very influential + + + + The outlier causes a shift in the regression line 4

• Procedure for regression diagnostics – Develop a model that has a theoretical basis. – Gather data for the two variables in the model. – Draw the scatter diagram to determine whether a linear model appears to be appropriate. – Check the required conditions for the errors. – Assess the model fit. 5 – If the model fits the data, use the

Chapter 18 Multiple Regression 18. 1 Introduction • In this chapter we extend the simple linear regression model, and allow for any number of independent variables. • We expect to build a model that fits the data better than the simple linear regression model. 6

• We will use computer printout to – Assess the model • How well it fits the data • Is it useful • Are any required conditions violated? – Employ the model • Interpreting the coefficients • Predictions using the prediction equation • Estimating the expected value of the dependent variable 7

18. 2 Model and Required Conditions • We allow for k independent variables to potentially be related to the dependent variable Coefficients Random error variable y = b 0 + b 1 x 1+ b 2 x 2 + …+ bkxk + e Dependent variable Independent variables 8

y The simple linear regression model allows for one independent variable, “x” y =b 0 + b 1 x + e b 1 x y = b 0 + b 1 x Note how the straight line y = b 0 y= 0 becomes a plain, and. . . + b 2 x 2 + b 1 x 12 x 2 b b 0 x 1 + b 2 x 2 y = b 1 x+ + 2 x 2 b + x 1 b x b+ + 1 b+b 122 x 22 0 x = bb x y =+b 0 1 b 11+1 x 2 X 1 y + = b b 1 y b 00 b 0 x 1 + b 2 y= y =0 + 1 y = b The multiple linear regression model allows for more than one independent variable. Y = b 0 + b 1 x 1 + b 2 x 2 + e X 2 9

y y= b 0+ b 1 x 2 b 0 X 1 y = b 0 + b 1 x 12 + b 2 x 2 … a parabola becomes a parabolic surface X 2 10

• Required conditions for the error variable e – The error e is normally distributed with mean equal to zero – The error term has a constant standard deviation se (independent of the value of y). – The errors are independent. • These conditions are required in order to – estimate the model coefficients, – assess the resulting model. 11

18. 3 Estimating the Coefficients and Assessing the Model • The procedure – Obtain the model coefficients and statistics using a statistical computer software. – Diagnose violations of required conditions. Try to remedy problems when identified. – Assess the model fit and usefulness using the model statistics. – If the model passes the assessment tests, use it to interpret the coefficients and 12 generate predictions.

Example 18. 1 Where to locate a new motor inn? – La Quinta Motor Inns is planning an expansion. – Management wishes to predict which sites are likely to be profitable. – Several areas where predictors of profitability can be identified are: • • • Competition Market awareness Demand generators Demographics Physical quality 13

Profitabil ity Competition Rooms Market awareness Nearest Customers Office space Distance to Number of hotels/motels the nearest rooms within La Quinta inn. 3 miles from the site. Margin Community Physical College Income enrollment Disttown Median Distance to household downtown. income. 14

– Data was collected from randomly selected 100 inns that belong to La Quinta, and ran for the following suggested model: Margin =b 0 + b 1 Rooms + b 2 Nearest + b 3 Office + b 4 College + b 5 Income + b 6 Disttwn + 15

This is the sample regression equation (sometimes called the prediction equation • Excel output. MARGIN = 72. 455 - 0. 008 ROOMS - 1. 646 NEAREST + 0. 02 OFFICE +0. 212 COLLEGE - 0. 413 INCOME + 0. 225 DISTTWN Assessing this equation 16

• Standard error of estimate – We need to estimate the standard error of estimate – Compare se to the mean value of y • From the printout, Standard Error = 5. 5121 • Calculating the mean value of y we have – It seems se is not particularly small. – Can we conclude the model does not fit the data well? 17

• Coefficient of determination – The definition is – From the printout, R 2 = 0. 5251 – 52. 51% of the variation in the measure of profitability is explained by the linear regression model formulated above. – When adjusted for degrees of freedom, Adjusted R 2 = 1 -[SSE/(n-k-1)] / [SST/(n-1)] = = 49. 44% 18

• Testing the validity of the model – We pose the question: Is there at least one independent variable linearly related to the dependent variable? – To answer the question we test the hypothesis zero. H 0: b 1 = b 2 = … = b k = 0 H 1: At least one bi is not equal to – If at least one bi is not equal to zero, the model is valid. 19

• To test these hypotheses we perform an analysis of variance procedure. • The F test – Construct the F statistic SST = SSR + SSE. F Large F results from a large SSR. = Then, much of the variation in y is explained by the regression model. The null hypothesis should be rejected; thus, the model is valid. – Rejection region MSR=SSR/k MSR MSE F>Fa, k, n-k-1 MSE=SSE/(n-k 1) Required conditions mus be satisfied. 20

Example 18. 1 - continued • Excel provides the following ANOVA MSR/MSE results SSE SSR MSE MSR 21

Example 18. 1 - continued • Excel provides the following ANOVA results Conclusion: There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At least one of the bi is not equal to zero. Thus, at least one independent variable is linearly related to y. Fa, k, n-k-1 = F 0. 05, 6, 100 -6 -1=2. 17 This linear regression model is valid F = 17. 14 > 2. 17 Also, the p-value (Significance F) = 3. 03382(10)-13 Clearly, a = 0. 05>3. 03382(10)-13, and the null hypothesis 22 is rejected.