Скачать презентацию Which statistical test is best for my data Скачать презентацию Which statistical test is best for my data

d67ebaa5e460e7a856ae955efaeffe57.ppt

  • Количество слайдов: 40

Which statistical test is best for my data? or: What do you graph, dear? Which statistical test is best for my data? or: What do you graph, dear? What do you test, dear? • • Refresher on tests that we know Several examples Fatal errors Data for you to analyze

Which tests and graphs fit which situations? Independent variable (X) Dependent variable (Y) Graph Which tests and graphs fit which situations? Independent variable (X) Dependent variable (Y) Graph Test Continuous Scatterplot Linear regression Class Continuous Bar graph t-test, ANOVA Class Bar graph, contingency table Chi-square Continuous Class Scatterplot, bar chart (Logistic regression) All of these tests assume that the data are independent and normally distributed (more on this later!)

Claim 1: Money can’t buy you love, but it can buy you a good Claim 1: Money can’t buy you love, but it can buy you a good ball team • Specifically, claim is that baseball teams with bigger salaries win more games than those will smaller salaries • Data are average (mean) salaries and winning percentages for the 2012 baseball season

The data TEAM Arizona Diamondbacks Atlanta Braves Baltimore Orioles Boston Red Sox Chicago Cubs The data TEAM Arizona Diamondbacks Atlanta Braves Baltimore Orioles Boston Red Sox Chicago Cubs Chicago White Sox Cincinnati Reds Cleveland Indians Colorado Rockies Detroit Tigers Houston Astros Kansas City Royals Los Angeles Angels Los Angeles Dodgers Miami Marlins Milwaukee Brewers Minnesota Twins New York Mets New York Yankees Oakland Athletics Philadelphia Phillies Pittsburgh Pirates San Diego Padres San Francisco Giants Seattle Mariners St. Louis Cardinals Tampa Bay Rays Texas Rangers Toronto Blue Jays Washington Nationals AVG SALARY $ 2, 653, 029 $ 2, 776, 998 $ 2, 807, 896 $ 5, 093, 724 $ 3, 392, 193 $ 3, 876, 780 $ 2, 935, 843 $ 2, 704, 493 $ 2, 692, 054 $ 4, 562, 068 $ 2, 332, 730 $ 2, 030, 540 $ 5, 327, 074 $ 3, 171, 452 $ 4, 373, 259 $ 3, 755, 920 $ 3, 484, 629 $ 3, 457, 554 $ 6, 186, 321 $ 1, 845, 750 $ 5, 817, 964 $ 2, 187, 310 $ 1, 973, 025 $ 3, 920, 689 $ 2, 927, 789 $ 3, 939, 316 $ 2, 291, 910 $ 4, 635, 037 $ 2, 696, 042 $ 2, 623, 746 winning percentage 0. 58 0. 574 0. 426 0. 377 0. 525 0. 599 0. 42 0. 395 0. 543 0. 34 0. 444 0. 549 0. 531 0. 426 0. 512 0. 407 0. 457 0. 586 0. 58 0. 5 0. 488 0. 469 0. 58 0. 463 0. 543 0. 556 0. 574 0. 451 0. 605

How is this claim best evaluated? -graph and statistical analysis How is this claim best evaluated? -graph and statistical analysis

How is this claim best evaluated? -graph and statistical analysis Scatter plot How is this claim best evaluated? -graph and statistical analysis Scatter plot

How is this claim best evaluated? -graph and statistical analysis Scatter plot, Linear regression How is this claim best evaluated? -graph and statistical analysis Scatter plot, Linear regression

Conclusion • Money can’t buy you a winning ball team, either Conclusion • Money can’t buy you a winning ball team, either

Claim 2: Eels control crayfish populations • Specifically, claim is that crayfish population densities Claim 2: Eels control crayfish populations • Specifically, claim is that crayfish population densities are lower in streams where eels are present • Background: dietary studies show that eels eat a lot of crayfish, and old Swedish stories suggest that eels eliminate crayfish • Data are crayfish densities (count along transects, snorkelling) in local streams with and without eels

The data River Site Croton Green Chimneys 3. 225 0 Croton PEP 0. 119 The data River Site Croton Green Chimneys 3. 225 0 Croton PEP 0. 119 0 Delaware Buckingham 0. 25 1 Delaware Callicoon 0 1 Delaware Hankins 0. 109 1 Delaware Mongaup 0 1 Delaware Pond Eddy 0. 067 1 Neversink Bridgeville 0. 233 0 Neversink TNC 0 1 4. 53 0 1. 1 0 Shawangunk Mount Hope Shawangunk Ulsterville Crayfish (no. /m^2) eels Webatuck Levin 0. 812 0 Webatuck Shope 1. 719 0 Webatuck Still Point 1. 4 0

How is this claim best evaluated? -graph and statistical analysis How is this claim best evaluated? -graph and statistical analysis

How is this claim best evaluated? -graph and statistical analysis Bar graph How is this claim best evaluated? -graph and statistical analysis Bar graph

How is this claim best evaluated? -graph and statistical analysis Bar graph, t-test p How is this claim best evaluated? -graph and statistical analysis Bar graph, t-test p = 0. 02

Conclusion • Looks like streams containing eels have fewer crayfish Conclusion • Looks like streams containing eels have fewer crayfish

Claim 3: Human life expectancy varies among continents • Data are mean life expectancy Claim 3: Human life expectancy varies among continents • Data are mean life expectancy for women in different countries

The data Africa Asia Americas Europe algeria 75 bangladesh 70. 2 argentina 79. 9 The data Africa Asia Americas Europe algeria 75 bangladesh 70. 2 argentina 79. 9 austria 83. 6 cameroon 53. 6 china 75. 6 brazil 77. 4 belgium 82. 8 cote d'ivoire 57. 7 india 67. 6 canada 85. 3 bulgaria 77. 1 egypt 75. 5 indonesia 71. 8 chile 82. 4 czech rep 81 kenya 59. 2 iran 75. 3 columbia 77. 7 denmark 87. 4 morocco 74. 9 japan 87. 1 mexico 79. 6 estonia 80 nigeria 53. 4 malaysia 76. 9 peru 76. 9 finland 83. 3 south africa 54. 1 pakistan 66. 9 usa 81. 3 france 84. 9 zimbabwe 52. 7 philippines 72. 6 venezuela 77. 7 germany 83 singapore 83. 7 greece 82. 6

How is this claim best evaluated? -graph and statistical analysis How is this claim best evaluated? -graph and statistical analysis

How is this claim best evaluated? -graph and statistical analysis Bar graph Note that How is this claim best evaluated? -graph and statistical analysis Bar graph Note that y-axis doesn’t start at 0

How is this claim best evaluated? -graph and statistical analysis Bar graph, 1 -way How is this claim best evaluated? -graph and statistical analysis Bar graph, 1 -way ANOVA, p = 0. 0000001

Anova: Single Factor SUMMARY Groups Africa Asia Americas Europe Count 9 10 Sum Average Anova: Single Factor SUMMARY Groups Africa Asia Americas Europe Count 9 10 Sum Average 556. 1 61. 78889 747. 7 74. 77 718. 2 79. 8 825. 7 82. 57 ANOVA Source of Variation Between Groups Within Groups 1353. 931 34 Total 3705. 531 37 SS 2351. 6 df Variance 104. 6261 42. 78233 7. 7875 7. 731222 MS F P-value 3 783. 8666 19. 68451 F crit 1. 42 E-07 2. 882604 39. 8215

Conclusion • Life expectancy of women appears to differ among continents • (The ANOVA Conclusion • Life expectancy of women appears to differ among continents • (The ANOVA doesn’t tell us which continents are different; further tests would be necessary to test claims about specific continents)

Claim 4: predators with experience eat more invasive prey • Specific claim is that Claim 4: predators with experience eat more invasive prey • Specific claim is that sunfish from bodies of water that were invaded a long time ago will eat more zebra mussels than sunfish from recently invaded waters or waters without zebra mussels • Data are from an aquarium experiment using sunfishes from rivers invaded 20 years ago, a lake that was invaded 9 years ago, and streams without zebra mussels • Each aquarium contained 15 zebra mussels; the number of mussels eaten in 3 days was recorded

The data old invasions recent invasions uninvaded 15 5 2 15 14 1 15 The data old invasions recent invasions uninvaded 15 5 2 15 14 1 15 15 1 12 8 3 15 11 0 15 0 2 0 15 6 15 15 11

How is this claim best evaluated? -graph and statistical analysis How is this claim best evaluated? -graph and statistical analysis

How is this claim best evaluated? -graph and statistical analysis Bar graph How is this claim best evaluated? -graph and statistical analysis Bar graph

How is this claim best evaluated? -graph and statistical analysis Bar graph, p = How is this claim best evaluated? -graph and statistical analysis Bar graph, p = 0. 00000009

Anova: Single Factor SUMMARY Groups old invasions recent invasions uninvaded Count Sum Variance 12 Anova: Single Factor SUMMARY Groups old invasions recent invasions uninvaded Count Sum Variance 12 160 13. 33333 14. 60606 6 64 10. 66667 13. 86667 9 13 1. 444444 4. 027778 ANOVA Source of Variation SS Between Groups 754. 4444 Within Groups 262. 2222 Total Average 1016. 667 df MS F P-value F crit 2 377. 2222 34. 52542 8. 67 E-08 3. 402826 24 10. 92593 26

Conclusion • Fish living in places that have had zebra mussels for a long Conclusion • Fish living in places that have had zebra mussels for a long time eat more zebra mussels

Claim 5: Zebra mussels reduce phytoplankton biomass in the Hudson • Data are growing-season Claim 5: Zebra mussels reduce phytoplankton biomass in the Hudson • Data are growing-season (May-Sept) means for zebra mussel population filtration rate and phytoplankton biomass in the freshwater tidal Hudson River

The data year ZMFR 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 The data year ZMFR 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 chl a 0. 00 0. 03 0. 36 7. 10 3. 96 4. 44 2. 60 6. 50 5. 06 2. 79 3. 59 3. 02 5. 70 2. 34 3. 16 1. 51 0. 06 4. 09 0. 26 4. 08 17. 45 28. 95 17. 25 17. 52 25. 48 12. 18 5. 04 4. 91 5. 34 3. 74 6. 89 7. 50 6. 56 4. 40 11. 47 5. 44 4. 81 4. 64 8. 84 5. 90 4. 10 4. 96 6. 71

How is this claim best evaluated? -graph and statistical analysis How is this claim best evaluated? -graph and statistical analysis

How is this claim best evaluated? -graph and statistical analysis scatterplot How is this claim best evaluated? -graph and statistical analysis scatterplot

How is this claim best evaluated? -graph and statistical analysis Scatterplot, linear regression, … How is this claim best evaluated? -graph and statistical analysis Scatterplot, linear regression, … but clearly not linear

How is this claim best evaluated? -graph and statistical analysis • Non-linear regression (available How is this claim best evaluated? -graph and statistical analysis • Non-linear regression (available in many statistical packages) • Not really fair to choose a non-linear model after looking at the data, so think about whether your claim suggests a linear model or a nonlinear one before analyzing the data

Conclusion • Yes, it looks like zebra mussel feeding reduces phytoplankton population in the Conclusion • Yes, it looks like zebra mussel feeding reduces phytoplankton population in the Hudson • The relationship is nonlinear

What to do if both variables are class variables? Status birds + mammals FW What to do if both variables are class variables? Status birds + mammals FW fish FW shellfish FW insects extinct (GX, GH) 1. 65 2. 13 6. 56 1. 34 Critically imperiled (G 1) 3. 71 11. 39 20. 85 2. 2 Imperiled (G 2) 4. 89 10. 89 15. 6 9. 09 Vulnerable (G 3) 7. 95 13. 14 17. 24 19. 98 Secure (G 4, G 5) 81. 57 62. 33 39. 73 67. 4

What to do if the predictor variable is continuous but the response variable is What to do if the predictor variable is continuous but the response variable is a class variable? Baby mussels present Baby mussels absent

Common fatal errors: non-independence Common fatal errors: non-independence

Common fatal errors: undue influence of a single point Common fatal errors: undue influence of a single point

Claims for you to test • Large, mobile predators (i. e. , crabs) reduce Claims for you to test • Large, mobile predators (i. e. , crabs) reduce zebra mussel populations in the Hudson • Cell phone ownership increases with income among countries • Levels of dissolved oxygen affect behavior of baby mussels