5295d3839233d89f9c9d6e1818cacaae.ppt
- Количество слайдов: 34
Adjusting for coverage error in administrative sources in population estimation Owen Abbott Research, Development and Infrastructure Directorate
Agenda • Framework for population estimates • Where are we now • Census • Beyond 2011 • Recent work on estimation • Plans for future research • Summary
Introduction • Fundamental need for high quality population estimates (Local Authority by age and sex) • Currently obtained through decennial census and cohort component method for intervening years • Quality varies (and is not even) • This session reviews: • Making a population estimate with and without a census • Future plans for research
Framework for producing population estimates
Where are we now? • Reminder of how the framework was used in the 2011 Census • Outline the how we applied the framework with administrative data only
2011 Census framework for producing population estimates
The 2011 Census • Census Coverage Survey • Large (350 k households) • Designed around expected coverage patterns • High quality matching • Automated and lots of clerical effort • Dual System Estimation • Bias adjustments: • Corrections for biases in the DSE • Overcoverage
Producing population estimates using linked administrative data
Population estimation without a census • Construction of SPDs from admin data • Reliance on matching • Developed rules using multiple sources • Large PCS similar to CCS • Can use web data collection • Similar estimation methodology • DSE based • Explored alternative weighting classes • Beginning to develop bias adjustments
Estimation methodology research • Coverage survey non-response is key issue • Have used DSE, but requires: • • accurate matching of persons Independence overcoverage adjustments Lots of other assumptions • Have been considering using weighting classes as an alternative
Estimation methodology research • Weighting Classes: • This approach requires addresses to be linked between survey and auxiliary • Then can use information about (survey) responding and non-responding addresses
New developments - estimation
Plans for future research • 2021 Census • Administrative data based
Improvements for 2021 Census • Expand use of admin data collection • Aim to reduce variability in response rates • Use admin data to enhance base census data • NISRA did this in 2011 • Can use SPD construction ideas • Explore Weighting classes • What would 2011 estimate have looked like? • Revise sample design • Aim to reduce variability in quality across LAs
Further work on admin data based estimates • Continue to explore matching methods • Understanding and measuring matching error • Continue to learn more about key sources • List lag/inflation/cleaning/changes • Continue to explore ways of combining sources to construct SPDs • Develop signs of life indicators • Use of address register
Further work on alternative • Coverage survey • Sample design – clustered/unclustered? • Practicalities (e. g. Timing) • Carry on work to explore estimation methodology • • Comparing DSE vs Weighting Class Performance in presence of matching error Adjusting for erroneous inclusions Adjusting for within-household non-response • Develop small area estimation method(s)
Key research questions • What will the coverage patterns be like in an online 2021 Census? • What are the coverage patterns in the evolving SPDs? • Where does administrative (or other) data have the most benefit (cost/quality)?
Summary • Population estimates are the key outputs • Need to focus on how these are delivered from an online census • AND carry on developing potential administrative based methods • Understanding and influencing the underlying coverage patterns is critical
Discussant Li-Chun Zhang University of Southampton & Statistics Norway
Population size estimation • • • Internationally speaking England & Wales: options so far explored Trimmed Dual-System Estimation (TDSE) Modelling erroneous enumerations Census 2021 and Beyond 2021
Internationally speaking • Register-based population counts Negligible cost; no field work ‘Near-perfect’ Central Population Register (CPR) • “Traditional” census Census enumeration + 2 coverage surveys Independent sample for under-coverage adj. Dependent sample for over-coverage adjustment • In-between CPR-enumeration + 2 coverage surveys Can afford much larger surveys
England & Wales • Dependent sampling of records from SPD deemed infeasible • Dependent sampling of addresses/postcodes from SPD deemed feasible • Independent under-coverage survey can not yield valid “type 4” over-coverage estimates “Type 4”: erroneous inclusion
Options explored: SPD, Weighting, DSE
Trimmed DSE (TDSE) • Score selection of SPD records → k • PCS matching → k = (k 1 , k 0) • TDSE
TDSE: an illustration
TDSE: N=1000, high-quality scenario • Scoring rate: P(erroneous) high, say, 70% • Catch rate (PCS, SPD): high, say, 90% • Erroneous SPD enumeration: low, say, 2%
Stopping rule: r=50, N=1000
Stopping rule: r=250, N=1000
Stopping rule in expectation: N=1000 Rates (%) Initial DSE Stoppage TDSE Ideal SD(DSE) Approx SD(TDSE) No. errors Expected selection (1) 70, 90 1022 1001 4 4 20 29 (2) 70, 90 1056 1001 4 4 50 71 70, 75, 70 1071 1001 12 13 50 71 30, 75, 70 1071 1000 12 15 50 167 70, 90 1278 1001 4 5 250 357 70, 75 1332 1000 10 14 250 357 30, 75, 70 1357 1006 12 51 250 833 (3)
Modelling erroneous counts: 2021 • Model-A: P(erroneous | in Census and T-SPD) = P(erroneous | in Census but not in T-SPD) * P(erroneous | in T-SPD but not in Census) • Model-B: P(erroneous | in Census and T-SPD) = P(erroneous | in Census) * P(erroneous | in T-SPD) (Can be fitted with PCS in addtion)
Discrimination: Model A (left) B (right)
Beyond 2021 option: unwinding SPD? • SPD has multiple input datasets • Unwinding SPD, say, SPD-I = PR, somewhat trimmed SPD-II = everything else, somewhat trimmed • Less stringent model assumptions? SPD-III (Analogous: independence vs. null 2 nd-order interaction)
Discrimination: Model A (left) B (right)
Investigations forward • Premise: no dependent sampling? • Weighting class adjustment Nonresponse bias after reweighting acceptable? • SPDs: trimming & scoring • Connecting SPDs and TDSE-modelling Early-stoppage once model captures remaining bias Improve efficiency via bias-adjusted TDSE • Use SPDs to improve census 2021 estimates • Small-area smoothing of adjustments? • Future population statistics without census


