STATISTICAL METHODS FOR REDUCING BIAS IN WEB SURVEYS

STATISTICAL METHODS FOR REDUCING BIAS IN WEB SURVEYS 13 rd September 2012 Myoung Ho Lee

§ Introduction § Web surveys § Methodology - Propensity Score Adjustment - Calibration (Rim weighting) § Case Study § Discussion and Conclusion Contents

• Trends in Data Collection Paper and Pencil => Telephone => Computer => Internet (Web) • Internet penetration Introduction

q Pros and Cons of Web surveys • Pros - Low cost and Speed - No interviewer effect - Visual, flexible and interactive - Respondents convenience • Cons - Quality of sample estimates ü Web surveys may be solutions! But, Problems!!! Introduction

q Previous Studies • Harris Interactive (2000 ~ ) • Lee (2004), Lee and Valliant (2009) • Hur and Cho (2009) • Bethlehem (2010), etc. ü Lee and Valliant (2009) : good performance in simulation ü But, most other results do not seem to be so good. - Malhotra and Krosnick (2007), Huh and Cho (2009) Introduction

q Volunteer Panel Web Survey Protocol (Lee, 2004) Under-coverage Self-selection Non-response ü Challenge: Fix anticipated biases in web survey estimates that result from under-coverage, self-selection and non-response Web surveys

Proposed Adjustment Procedure for Volunteer Panel Web surveys (Lee, 2004) Methodology

q Propensity Score Adjustment (PSA) • Original idea : Comparison of two groups, treatment and control, in observational studies (Rosenbaum and Rubin, 1983) - by weighting using all auxiliary variables that are thought to account for the differences • In context of web surveys, this technique aims to correct for differences between offline people and online people - by certain inclinations of people who participate in the volunteer panel web survey Methodology

• “Webographic” : overlapping variables between web and reference survey - To capture the difference between online and offline populations (Schonlau et al. , 2007) - For example, “Do you feel alone? ”, “In the last month have you read a book? ”…… (Harris Interactive) Methodology

• Propensity score : It is assumed that zi are independent given a set of covariates (xi) • ‘Strong ignorability assumption’ : Response variable is conditionally independent of treatment assignment given the propensity score. Methodology

q Logistic regression model : q Variable Selection • Include variables related to not only treatment assignment but also response in order to satisfy the ‘strong ignorability assumption’ (Rosenbaum and Rubin, 1984; Brookhart et al. , 2006) Methodology

q Variable Selection • In practice, stepwise selection method has been often used to develop good predictive models for treatment assignment • Most previous web studies : Use of all available covariates (5 -30) • Huh and Cho (2009) : 9 or 7 out of 123 covariates were chosen by their “subjective” views Methodology

q Variable Selection • Stepwise logistic regression using SIC - large number of covariates, little theoretical guidance • LASSO (PROC GLMSELECT in SAS) - a good alternative to stepwise variable selection • Boosted tree (“gbm” in R) - determine a set of split conditions Methodology

q Applying methods for PSA • Inverse propensity scores as weights - weights : - then, multiply them with sampling weights • Subclassification (Stratification) - subgrouping homogenous people into each stratum Methodology

• Subclassification (Stratification) 1. Combine both reference and web data into one 2. Estimate each propensity score from the combined sample 3. Partition those units into C subclasses according to ordered values, where each subclass has about the same number of units 4. Compute adjustment factor, and apply to all units in the cth subclass. 5. Multiply the factor with sampling weights to get PSA weights Methodology

q Calibration (Rim weighting) • Matching sample and population characteristics only with respect to the marginal distributions of selected covariates • Little and Wu (1991) - Iterative algorithm to alternatively adjust weights according to each covariates’ marginal distribution until convergence Methodology

q Case Study • Reference survey : “ 2009 Social Survey” by Statistics Korea - Culture & Leisure, Income & Consumption, etc. - All persons aged +15 in 17, 000 households - Sample size : 37, 049 - Face-to-face mode - Post-stratification estimation - Assumed to be “True” Case Study

• Web survey - Recruiting volunteers from web sites (6, 854 households) - Systematic sampling with non-equal selection probabilities (inverse of rim weights using region, age, gender) - Sample size : 1, 500 households and 2, 903 respondents - Overlapping covariates : 123 Case Study

M 1 = Stepwise(22), M 2 = Stepwise(17), M 3 = LASSO(12), M 4 = Boosted tree(18) Case Study – Model Selecion

q Assessment methods • 16 combinations : (Model 1, 2, 3 and 4) × (Inverse weighting and Subclassification) × (No Calibration and Rim weighting) • 12 response variables • Percentage of bias reduction Case Study

Percentage of bias reduction PSA alone Calibration Inverse weighting Subclassification M 1 M 2 M 3 M 4

• Why PSA doesn’t work well alone ? ? ? Propensity scores for each survey in 5 strata in Model 1 Discussion

q What are the possible solutions to fix poor PSA? • Setting maximum value of weight • Different subclassification algorithm - Formula for the variance of weights that depends on both the number of cases from each group within a stratum and the variability of propensity scores with the stratum • Matching PSA - limited number of treated group members and a larger number of control group members Discussion

• Violation of some assumptions - ‘Strong ignorability assumption’ - Missing at random (MAR) - Mode effects • Variable selection (What are webographic variables? ) - Models affect the performance of PSA significantly - Maybe expert knowledge, not statistical approach - Further studies are needed Discussion

• Web surveys have attractive advantages • However, bias from self-selection, under-coverage, non-responses • According to my case study results, => It seems to be difficult to apply PSA to “real world” just now • Further researches on webographic variables and different PSA methods are needed Conclusion