- Количество слайдов: 17
Learning Objectives By the end of this lecture, you should be able to: – Describe causation and the ways in which it differs from correlation. – Describe what is far and away the best method of establishing causation. – Explain what a confounding variable is.
*** Correlation does not necessarily imply causation *** • A correlation does not mean that there is causation. • As you know, correlation means that there is a relationship between two variables. Causation means that if you see a change in your explanatory variable, it should cause a change in the response variable. – Example: If you give someone extra beers, it should cause a change in BAC. – Example: If you allow more powerboat licenses, it should cause a change in the number of manatee deaths. • Even if a correlation is very strong, this is not by itself good evidence that a change in x will cause a change in y
Causation v. s. Correlation • Causation means that whenever there is a change in an explanatory variable, it should cause in a change in the response variable. • Correlations: Correlations between two variables are extremely common and easy to find. However, saying that two variables are correlated in NO way guarantees that there is causation. • Put another way: Having correlation without causation means that changing the explanatory variable will NOT guarantee a change in the response variable.
In the real world… Often (very, very often!) people report “associations” (i. e. correlation) between two variables. Yet upon further examination, it turns out that there is not ANY causation whatsoever! As humans, though, upon hearing about “associations” we often jump to an assumption of correlation. Most of the time, the causation simply is NOT there.
Example • One study in Victorian England showed a strong correlation between people wearing top hats, and their life expectancy. This relationship was shown to be very strong (high ‘r’). • Does this mean that had Queen Victoria provided free top-hats for all, the life expectancy in England would have shot up? – There is a confirmed correlation. However, there is NO causation. That is, wearing top hats does not cause people to live longer. So, what’s going on here? – Answer: There is a lurking variable! In this case, there is the lurking variable is income. People with higher incomes could afford doctors and medicines. These were in no way a given in Victorian England! – So in this case, while there is correlation between top-hats and life epxectancy, there is no causation. – However, there would be a causal relationship between Income and life expectancy.
Reminder: Before embarking on a regression analysis… Pop Quiz: After today, you should be able to answer the following question without looking at your notes…. • What are three key requirements that should be met before embarking on a regression analysis? 1. 2. 3. If you are doing a linear* regression analysis, the relationship must be, well, linear! The correlation (‘r’) should not be very weak. There must be causation. Which of these would cause us to reject a regression analysis of the relationship between top-hats and life expectancy? • Answer: #3. *There are versions of regression that can be done on non-linear relationships. However, we will not cover them in this course.
Example Correlation v. s. Causation One study during the polio epidemic in the 1920 s showed a strong correlation between ice cream consumption and cases of polio. As a result, the public was warned to avoid eating ice cream as it increased the risk of contracting the disease. Thoughts? – Again, there was a strongly confirmed correlation. However, it turned out that there was NO causation. With a properly controlled experiment, it could have been easily shown that increased ice cream consumption did NOT increase the risk of polio. – Again, there was a lurking variable hiding in the background. It turns out that the virus that causes polio (a virus of the picornoviridae family for anyone who cares) thrives in warmer weather. So the lurking variable here was temperature!
An example using R 2 q Even when causation is present, does it give the whole picture? q A mother’s weight and her daughter’s weight are clearly correlated. In addition, experimentation has shown that there is also causation. One study came up with r=0. 50, R 2 = 0. 25. q Why is R 2 so small? q Answer: What’s missing is that weight gain is multifactorial. That is, it is caused by many things. While genetics clearly does play a significant role, (i. e. people with a natural tendency to be overweight are more likely to have overweight kids), many other factors also contribute. q These include: TV, dietary habits in the house, attitude towards exercise, etc. In other words, ‘mom’s weight’ is not a very useful explanatory variable in this case. It would be more helpful to try to analyze the genetics relationship separately from TV habits in the house, separate from exercise habits, etc.
How can we establish causation? q So how CAN we establish if causation is present? q Answer: Only a well-designed experiment with proper control groups can prove causation.
Let’s Play: Correlation, Causation or Both? Can we simply sit back and allow time to magically improve child mortality rates? No. Strong correlation, but no causation. In this case, the state of medical research is the lurking variable. Are kids with small feet doomed to be bad readers? No. Strong correlation, but no causation. In this case, the age of the child is the lurking variable.
Did you know? ? ? Rooster crowing is perfectly correlated (r=1. 0) with the sun rising?
Correlation is EXTREMELY common! Causation… not so much! We are constantly bombarded with relationships between variables. However, even when you DO find correlations (and they love to talk about these on the 6: 00 news), there is very often no good evidence of causation.
Is there causation? 1. Student’s SAT score with subsequent college GPA – There is certainly a correlation since students who are good students will probably do well on the SAT and then again in college. However, if you sent everyone to a 4 -week intensive SAT prep, you would probably see improvement in test scores on that exam, but the better SAT score would not cause an improvement later in college. 2. Being married with being happy • People who are happier are more statistically more likely to get married than people who are not. 3. Being deeply religious with life expectancy – People who are religious are less likely to be the kind of people who smoke, use, drugs, etc.
Confounding variables • Two variables are confounded when their effects on a response variable cannot be distinguished from each other. • Example: Heavy drinking is strongly correlated and causal with decreased lifespan. • Heavy drinkers are also statistically more likley to be smokers, less likely to adhere to a good diet, and are more likely to have some form of depressive disorder. If you were trying to determine to what degree alcohol use decreased lifespan, it would be hard to do without “controlling” for these confounding variables. – ‘Controlling’ is an important term in study design.
Some possible explanations for an observed association. The dashed lines show an association. The solid arrows show a causeand-effect link. x is explanatory, y is response, and z is a lurking variable. Figure 2. 28 Introduction to the Practice of Statistics, Sixth Edition © 2009 W. H. Freeman and Company I will not ask you to distinguish between common-response / lurking / confounding variables. .
Establishing causation It appears that lung cancer is associated with smoking. How do we know that both of these variables are not being affected by an unobserved third (lurking) variable? For instance, what if there is a genetic predisposition that causes people to both get lung cancer and become addicted to smoking, but the smoking itself doesn’t CAUSE lung cancer? We can evaluate the association using the following criteria: 1) The association is strong. 2) The association is consistent. 3) Higher doses are associated with stronger responses. 4) Alleged cause precedes the effect. 5) The alleged cause is plausible. Ultimately, however, THERE IS NO SUBSTITUTE FOR AN EXPERIMENT!!!