7b16ed4bd1e9e7b74901b5b5cf83ea74.ppt
- Количество слайдов: 29
Beware, Statistics! Brani Vidakovic ISy. E & BME, Ga. Tech
They said… There are lies, damned lies, and statistics. -- Attributed by Mark Twain to Benjamin Disraeli u In earlier times, they had no statistics, and so they had to fall back on lies. – Stephen Leacock u Numbers are like people; torture them enough and they'll tell you anything. u
Intentional Statistical Inaccuracies Level of sophistication u Very Low – Very High u Often hard to distinguish incompetence from intention Donoho D – Reproducible Research Baggerly K – Forrensic Statistics (given data and results –> methods used) Gelman A, Feinberg S
ASA Guidelines To help statistical practitioners make and communicate ethical decisions. Committee on Professional Ethics u u u u A. Professionalism B. Responsibilities to Funders, Clients, and Employers C. Responsibilities in Publications and Testimony D. Responsibilities to Research Subjects F. Responsibilities to Other Statistical Practitioners G. Responsibilities Regarding Allegations of Misconduct
Location Measures Perils of “On average, …” u The average Australian has less that two legs. True! u Small company salaries: 4 employees 20 K, 3 employees 30 K, vice-president 200 K, president 400 K. Average salary ? ? Mean=85. 5 K, Geo. Mean=41. 2 K, Median = 30 K, Har. Mean=29. 3 K, Mode=20 K. u
More violations u u u Cherry picking of data/studies Fallacy of Incomplete Evidence Discarding Influential data and Outliers Confirmation Bias ``myside’’ bias Anecdotal Evidence Hyperbolic Discounting 1000 now or 3000 next year Bandwagon Fallacy False Dichotomy Will that be cash or charge? ``Golden Sample’’ Attrition Bias Publication Bias (File Drawer Problem) Funnel Plots
Even More… Loaded questions "Have you stopped smoking? " ua. Should people have the right to smoke? ub. Since cigarettes are dangerous and have deadly side effects such as cancer, don’t you agree that smoking should be controlled? Anchoring phenomenon u u Think about 4 last digits of your SS# -> Estimate # of physicians in Atlanta
Kahneman & Tversky 1 x 2 x 3 x … x 7 x 8 u 8 x 7 x 6 x … x 2 x 1 u u u The anchor was the number shown first in the sequence, either 1 or 8. When 1 was the anchor, the average estimate was 512; When 8 was the anchor, the average estimate was 2, 250. The correct answer is 40, 320.
Geometric misdeeds
From one dollar to 44 cents
Truncated Graphs
Correlations Galore… A correlated with B (but because of C!!) Number of people who buy ice cream at the beach is correlated by number of people who drown at the beach (but because of # of people!) u Correlation different than Dependence! E. g. , (xi, yi), i=1, …, n on a circle. u
Perils of Aggregation
Voodoo Correlations
Data Dredging u u u Data dredging is an abuse of data mining. In data dredging, large compilations of data are examined in order to find a relationship, without any pre-defined choice of a hypothesis to be tested (e. g. , endpoints in Clinical Trials). A clear distinction between data analyses that are confirmatory and analyses that are exploratory. Statistical inference appropriate for confirmatory.
Perils of Aggregation: Simpson’s Paradox Hospitals A and B Measure of Quality: prop of SAT Bad Tot Hosp SAT 41 39 80 SAT 32 11 43 UNS 5 10 15 UNS 4 3 7 TOT 46 49 95 TOT 36 14 50 79. 5 % 84. 2 % 78. 57 % 86% Hosp A Fair 89. 13 % B Fair 88. 89 % Bad Tot
u u u u % Death rates in Sweden and Panama % population 0 - 29 30 - 59 60+ population. S = [3145000 3057000 1294000]'; population. P = [ 714000 275000 59000]'; % %deaths per year 1962 deaths. S = [3523 10928 57104]'; deaths. P = [3904 1421 2756]'; mortality. S = deaths. S. /population. S mortality. P = deaths. P. /population. P % mortality. S = 0. 0011 0. 0036 0. 0441 % mortality. P = 0. 0055 0. 0052 0. 0467 totmortality. S = sum(deaths. S)/sum(population. S) totmortality. P = sum(deaths. P)/sum(population. P) % totmortality. S = 0. 0095 % totmortality. P = 0. 0077
Cohen and Nagel (1934) u Simpson (1951) u A, B, C events u It is possible P(A|B C) > P(A|Bc C) & P(A|B Cc) > P(A|Bc Cc) P(A|B) < P(A|Bc) u u Kotz S and Stroup D (1998). Educated Guessing, Marcel & Dekker
Testing Any fixed correlation coefficient is significant if the sample size is large enough. t ~ C*sqrt(n) In classical testing hypotheses, ANY precise H 0 will be rejected if the sample size is large enough.
Need for Equivalence Tests u Testing can be compared by the judicial process, where the accused is considered innocent (H 0) until proven guilty (H 1) beyond a reasonable doubt (alpha). Key Word: CONSIDERED! A suspect found not guilty ~= found inocent u If H 0 is not rejected, it is not proven!
Biased Sampling dependent on the observation size (Inspection Paradox) u Example: Tourists in Morocco – a study in 1966: Mean sojourn times by tourists: Hotels 17. 8 days; Frontier stations 9. 0 days
Biased Sampling u Waiting times on a bus stop. Example: Times between two successive buses Exponential (lambda) -> Expected wait=1/lambda A passenger comes at the station at random moment, his expected waiting time is 1/lambda! Source of many wrong models.
Prosecutor’s Fallacy Replace P(A|B) with P(B|A) u P(match|innocent)=0. 000001, thus P(innocent|match)=0. 000001! Wrong! u In the community of 5 mil people expected number of matches is 5. u P(innocent|match) = 4/5 (given no other evidence) u
Sensitivity/Specificity/PPV Casscells et al. (1978) u 60 Studensts & Staff at an elite medical school on East Cost. u u u If a test for a disease with prevalence of 1/1000 has false positive rate 5% what is the probability of a person testing positive having the disease? Given the disease the test is always positive. 18% gave correct answer (approx 2%), most answered: 95%.
Sensitivity/Specificity Interpretation Sensitivity <-> PPV Desease D has prevalence 2/10000. Test: P(+|D)=0. 999, P(-|ND)=0. 99 u A subject tests +, no other symptoms Tempting…P(D|+)=0. 999, but P(D|+)=P(+|D)P(D)/P(+) = 0. 999*0. 0002/(0. 999*0. 0002 + 0. 01*0. 9998) = 0. 0196 …less than 2% u
Cryptographic Surveys Boss present, 100 workers to be asked: u Do you like your boss? Boss interested only in the proportion of YES. Cryptographic Solution: Flip a coin twice: u If 1 st flip H: Answer the question: Is the 2 nd flip H? u If 1 st flip T: Answer the question: Do you like your boss? u u SOL: ½ p + ½ x ½ = obs. prop of YES p (approx=) 2 x obs. prop of YES – 1/2
Rational Decisions: South Dakota Lottery Data for 4 th quarter, 1987 u u Total Revenue $11, 812, 905 Prize Payments $5, 322, 975 Joe Sixpack knows his $1 investment returns less than $0. 50, and he still plays. Why? Is he irrational? No. The value of $ is not linear in $.
More reading … u u u Hooke, R. , 1983, How to tell the liars from the statisticians; Marcel Dekker, Inc. , New York, NY Jaffe, A. J. and H. F. Spirer, 1987, Misused Statistics; Marcel Dekker, Inc. , NY Campbell, S. K. , 1974, Flaws and Fallacies in Statistical Thinking; Prentice Hall, Inc. , Englewood Cliffs, NJ Hollanfer, M. and Proschan, F. , 1984, The Statistical Exorcist, Marcel Dekker, Inc. , NY Goldacre, B. , 2009, Bad Science, Fourth Estate, London
7b16ed4bd1e9e7b74901b5b5cf83ea74.ppt