Скачать презентацию Adjusting for coverage error in administrative sources in Скачать презентацию Adjusting for coverage error in administrative sources in

5295d3839233d89f9c9d6e1818cacaae.ppt

  • Количество слайдов: 34

Adjusting for coverage error in administrative sources in population estimation Owen Abbott Research, Development Adjusting for coverage error in administrative sources in population estimation Owen Abbott Research, Development and Infrastructure Directorate

Agenda • Framework for population estimates • Where are we now • Census • Agenda • Framework for population estimates • Where are we now • Census • Beyond 2011 • Recent work on estimation • Plans for future research • Summary

Introduction • Fundamental need for high quality population estimates (Local Authority by age and Introduction • Fundamental need for high quality population estimates (Local Authority by age and sex) • Currently obtained through decennial census and cohort component method for intervening years • Quality varies (and is not even) • This session reviews: • Making a population estimate with and without a census • Future plans for research

Framework for producing population estimates Framework for producing population estimates

Where are we now? • Reminder of how the framework was used in the Where are we now? • Reminder of how the framework was used in the 2011 Census • Outline the how we applied the framework with administrative data only

2011 Census framework for producing population estimates 2011 Census framework for producing population estimates

The 2011 Census • Census Coverage Survey • Large (350 k households) • Designed The 2011 Census • Census Coverage Survey • Large (350 k households) • Designed around expected coverage patterns • High quality matching • Automated and lots of clerical effort • Dual System Estimation • Bias adjustments: • Corrections for biases in the DSE • Overcoverage

Producing population estimates using linked administrative data Producing population estimates using linked administrative data

Population estimation without a census • Construction of SPDs from admin data • Reliance Population estimation without a census • Construction of SPDs from admin data • Reliance on matching • Developed rules using multiple sources • Large PCS similar to CCS • Can use web data collection • Similar estimation methodology • DSE based • Explored alternative weighting classes • Beginning to develop bias adjustments

Estimation methodology research • Coverage survey non-response is key issue • Have used DSE, Estimation methodology research • Coverage survey non-response is key issue • Have used DSE, but requires: • • accurate matching of persons Independence overcoverage adjustments Lots of other assumptions • Have been considering using weighting classes as an alternative

Estimation methodology research • Weighting Classes: • This approach requires addresses to be linked Estimation methodology research • Weighting Classes: • This approach requires addresses to be linked between survey and auxiliary • Then can use information about (survey) responding and non-responding addresses

New developments - estimation New developments - estimation

Plans for future research • 2021 Census • Administrative data based Plans for future research • 2021 Census • Administrative data based

Improvements for 2021 Census • Expand use of admin data collection • Aim to Improvements for 2021 Census • Expand use of admin data collection • Aim to reduce variability in response rates • Use admin data to enhance base census data • NISRA did this in 2011 • Can use SPD construction ideas • Explore Weighting classes • What would 2011 estimate have looked like? • Revise sample design • Aim to reduce variability in quality across LAs

Further work on admin data based estimates • Continue to explore matching methods • Further work on admin data based estimates • Continue to explore matching methods • Understanding and measuring matching error • Continue to learn more about key sources • List lag/inflation/cleaning/changes • Continue to explore ways of combining sources to construct SPDs • Develop signs of life indicators • Use of address register

Further work on alternative • Coverage survey • Sample design – clustered/unclustered? • Practicalities Further work on alternative • Coverage survey • Sample design – clustered/unclustered? • Practicalities (e. g. Timing) • Carry on work to explore estimation methodology • • Comparing DSE vs Weighting Class Performance in presence of matching error Adjusting for erroneous inclusions Adjusting for within-household non-response • Develop small area estimation method(s)

Key research questions • What will the coverage patterns be like in an online Key research questions • What will the coverage patterns be like in an online 2021 Census? • What are the coverage patterns in the evolving SPDs? • Where does administrative (or other) data have the most benefit (cost/quality)?

Summary • Population estimates are the key outputs • Need to focus on how Summary • Population estimates are the key outputs • Need to focus on how these are delivered from an online census • AND carry on developing potential administrative based methods • Understanding and influencing the underlying coverage patterns is critical

Discussant Li-Chun Zhang University of Southampton & Statistics Norway Discussant Li-Chun Zhang University of Southampton & Statistics Norway

Population size estimation • • • Internationally speaking England & Wales: options so far Population size estimation • • • Internationally speaking England & Wales: options so far explored Trimmed Dual-System Estimation (TDSE) Modelling erroneous enumerations Census 2021 and Beyond 2021

Internationally speaking • Register-based population counts Negligible cost; no field work ‘Near-perfect’ Central Population Internationally speaking • Register-based population counts Negligible cost; no field work ‘Near-perfect’ Central Population Register (CPR) • “Traditional” census Census enumeration + 2 coverage surveys Independent sample for under-coverage adj. Dependent sample for over-coverage adjustment • In-between CPR-enumeration + 2 coverage surveys Can afford much larger surveys

England & Wales • Dependent sampling of records from SPD deemed infeasible • Dependent England & Wales • Dependent sampling of records from SPD deemed infeasible • Dependent sampling of addresses/postcodes from SPD deemed feasible • Independent under-coverage survey can not yield valid “type 4” over-coverage estimates “Type 4”: erroneous inclusion

Options explored: SPD, Weighting, DSE Options explored: SPD, Weighting, DSE

Trimmed DSE (TDSE) • Score selection of SPD records → k • PCS matching Trimmed DSE (TDSE) • Score selection of SPD records → k • PCS matching → k = (k 1 , k 0) • TDSE

TDSE: an illustration TDSE: an illustration

TDSE: N=1000, high-quality scenario • Scoring rate: P(erroneous) high, say, 70% • Catch rate TDSE: N=1000, high-quality scenario • Scoring rate: P(erroneous) high, say, 70% • Catch rate (PCS, SPD): high, say, 90% • Erroneous SPD enumeration: low, say, 2%

Stopping rule: r=50, N=1000 Stopping rule: r=50, N=1000

Stopping rule: r=250, N=1000 Stopping rule: r=250, N=1000

Stopping rule in expectation: N=1000 Rates (%) Initial DSE Stoppage TDSE Ideal SD(DSE) Approx Stopping rule in expectation: N=1000 Rates (%) Initial DSE Stoppage TDSE Ideal SD(DSE) Approx SD(TDSE) No. errors Expected selection (1) 70, 90 1022 1001 4 4 20 29 (2) 70, 90 1056 1001 4 4 50 71 70, 75, 70 1071 1001 12 13 50 71 30, 75, 70 1071 1000 12 15 50 167 70, 90 1278 1001 4 5 250 357 70, 75 1332 1000 10 14 250 357 30, 75, 70 1357 1006 12 51 250 833 (3)

Modelling erroneous counts: 2021 • Model-A: P(erroneous | in Census and T-SPD) = P(erroneous Modelling erroneous counts: 2021 • Model-A: P(erroneous | in Census and T-SPD) = P(erroneous | in Census but not in T-SPD) * P(erroneous | in T-SPD but not in Census) • Model-B: P(erroneous | in Census and T-SPD) = P(erroneous | in Census) * P(erroneous | in T-SPD) (Can be fitted with PCS in addtion)

Discrimination: Model A (left) B (right) Discrimination: Model A (left) B (right)

Beyond 2021 option: unwinding SPD? • SPD has multiple input datasets • Unwinding SPD, Beyond 2021 option: unwinding SPD? • SPD has multiple input datasets • Unwinding SPD, say, SPD-I = PR, somewhat trimmed SPD-II = everything else, somewhat trimmed • Less stringent model assumptions? SPD-III (Analogous: independence vs. null 2 nd-order interaction)

Discrimination: Model A (left) B (right) Discrimination: Model A (left) B (right)

Investigations forward • Premise: no dependent sampling? • Weighting class adjustment Nonresponse bias after Investigations forward • Premise: no dependent sampling? • Weighting class adjustment Nonresponse bias after reweighting acceptable? • SPDs: trimming & scoring • Connecting SPDs and TDSE-modelling Early-stoppage once model captures remaining bias Improve efficiency via bias-adjusted TDSE • Use SPDs to improve census 2021 estimates • Small-area smoothing of adjustments? • Future population statistics without census