Скачать презентацию UNECE Workshop on Census Technology for SPECA and Скачать презентацию UNECE Workshop on Census Technology for SPECA and

121f2f7b2c4404bec602df0e3ba7d132.ppt

  • Количество слайдов: 28

UNECE Workshop on Census Technology for SPECA and CIS member countries (Astana, 7 -8 UNECE Workshop on Census Technology for SPECA and CIS member countries (Astana, 7 -8 June 2007) Technology for census data coding, editing and imputation Paolo Valente (UNECE) Paolo Valente - UNECE Statistical Division Slide 1

Content: 1. Coding 2. Editing and imputation Reference material: ØHandbook on Census Management for Content: 1. Coding 2. Editing and imputation Reference material: ØHandbook on Census Management for Population and Housing Censuses (Chapter IV, sections D-F) ØHandbook on Population and Housing Census Editing Paolo Valente - UNECE Statistical Division Slide 2

1. Census data coding Questions: 1. How did you code the data in the 1. Census data coding Questions: 1. How did you code the data in the last census? 2. Were you satisfied or not with coding? 3. What problems did you find in coding? 4. Paolo Valente - UNECE Statistical Division Any problems with specific Slide 3

Census data coding Ø Data coding = Assigning classification codes to the responses written Census data coding Ø Data coding = Assigning classification codes to the responses written on the census form Ø Coding systems: a) Manual b) Computer assisted c) Automatic d) Mix of a), b) or c) Ø Coding methodologies: a) Simple (1 or 2 words): ex. Birth place b) Structured (> 1 question): ex. Occupation c) Hierarchical: ex. Address Paolo Valente - UNECE Statistical Division Slide 4

Manual data coding Ø Clerks identify code using “code books”, and write it in Manual data coding Ø Clerks identify code using “code books”, and write it in the census form for later processing Ø Pros: Ø Easy to implement Ø No technology needed Ø Cons: Ø Time consuming Ø Labor intensive Ø Risk of inconsistency Paolo Valente - UNECE Statistical Division Slide 5

Computer-assisted coding Ø Ø Ø Assisted by computerized system Computer-based code books How it Computer-assisted coding Ø Ø Ø Assisted by computerized system Computer-based code books How it works: 1) Coder type only few characters 2) System selects matching list 3) Coder choose right code 4) Code automatically recorded by the system Paolo Valente - UNECE Statistical Division Slide 6

Computer-assisted coding Ø Pros: Ø Efficiency Ø Good quality Ø Particularly suitable for structured Computer-assisted coding Ø Pros: Ø Efficiency Ø Good quality Ø Particularly suitable for structured coding (possibility to include coding rules) Ø Cons: Ø Relatively complex system Ø Long time needed for development Ø Cost relatively high Paolo Valente - UNECE Statistical Division Slide 7

Automatic coding Based on computerized algorithms No human intervention Text captured by ICR and Automatic coding Based on computerized algorithms No human intervention Text captured by ICR and matched against indexes Ø A score is assigned by the system to the matched response: Ø If score is above certain level, response accepted Ø If score is below level, human intervention is needed (computer-assisted coding) Ø Ø Ø Paolo Valente - UNECE Statistical Division Slide 8

Automatic coding Matching rates depend on algorithms used and type of variable Ø Maximum Automatic coding Matching rates depend on algorithms used and type of variable Ø Maximum matching rates in ideal circumstances: Ø For simple variables (birth place), approx. 80% Ø For complex variables (occupation, industry), approx. 50% Ø All responses not matched have to be processed with computer assisted coding Paolo Valente - UNECE Statistical Division Slide 9

Automatic coding Ø Pros: Ø High efficiency Ø Good quality (if system developed accurately) Automatic coding Ø Pros: Ø High efficiency Ø Good quality (if system developed accurately) Ø Consistency Ø Particularly suitable for structured coding (possibility to include coding rules) Ø Cons: Ø Very complex system Ø Long time needed for development Ø High cost Ø Risk of systematic errors in case of faults in matching algorithms or indexes Paolo Valente - UNECE Statistical Division Slide 10

Ø Ø Coding – Practices in 2000 round CIS countries used manual coding In Ø Ø Coding – Practices in 2000 round CIS countries used manual coding In general About half of UNECE countries used automatic coding, in combination with computer-assisted or manual coding Ø Ø In most cases software developed in-house Software for automatic coding: Ø ACTR (Automated Coding by Text Recognition) developed by Statistics Canada, also used by Italy, UK See “Measuring Population and Housing”, Chapter III Ø Integrated software system, including computer assisted coding: CSPro (US Census Bureau) Paolo Valente - UNECE Statistical Division Slide 11

Coding in the 2010 census round Questions: 1. What are your plans for coding Coding in the 2010 census round Questions: 1. What are your plans for coding data of next census? 2. Are you considering computerassisted coding? 3. Why? …or why NOT? Paolo Valente - UNECE Statistical Division Slide 12

2. Editing and imputation Questions on editing: 1. Which data did you edit in 2. Editing and imputation Questions on editing: 1. Which data did you edit in the last census? 2. How did you edit the data? 3. Did you have any problems? Paolo Valente - UNECE Statistical Division Slide 13

2. Editing and imputation Questions on imputation: 1. Did you impute any missing data? 2. Editing and imputation Questions on imputation: 1. Did you impute any missing data? If yes: For which variables? 3. What method and software you used? 4. Did you produce statistics on imputation rates? 2. Paolo Valente - UNECE Statistical Division Slide 14

Editing and imputation Ø Editing = Detecting and correcting errors in census data Ø Editing and imputation Ø Editing = Detecting and correcting errors in census data Ø Imputation = assigning values to missing data Ø The two concepts are related and the two terms are sometimes used in different ways Paolo Valente - UNECE Statistical Division Slide 15

Editing and imputation Ø Different types of errors: Ø Ø Ø Coverage errors (ex. Editing and imputation Ø Different types of errors: Ø Ø Ø Coverage errors (ex. omissions, duplicates) Enumerator errors Respondent errors Coding errors Data entry errors but also… Ø Editing errors! Paolo Valente - UNECE Statistical Division Slide 16

Editing and imputation Ø Important not only to detect errors, but also to identify Editing and imputation Ø Important not only to detect errors, but also to identify causes, in order to take appropriate measures and improve overall quality Ø Objectives of editing and imputation: Ø Improve quality of census data Ø Facilitate analysis of census data Ø Identify types and sources of errors Paolo Valente - UNECE Statistical Division Slide 17

Editing and imputation Ø Dilemma: what should be edited and what should NOT be Editing and imputation Ø Dilemma: what should be edited and what should NOT be edited? Ø Complex editing systems can be difficult and expensive to implement, and in some cases may introduce distortions Go for relatively simple editing system! Paolo Valente - UNECE Statistical Division Slide 18

Editing and imputation In general, the editing system should be: Ø Minimalist (only obvious Editing and imputation In general, the editing system should be: Ø Minimalist (only obvious errors) Ø Automated (as much as possible) Ø Systematic Ø Compliant with other NSI procedures Ø Compliant with intl. standards Ø Paolo Valente - UNECE Statistical Division Slide 19

Editing and imputation General guidelines for editing: Ø Ø Ø Make the fewest required Editing and imputation General guidelines for editing: Ø Ø Ø Make the fewest required changes possible Eliminate obvious inconsistencies Supply entries for erroneous or missing items by using other entries for the housing unit, person, or other persons in the household or comparable group as a guide Paolo Valente - UNECE Statistical Division Slide 20

Editing and imputation Example of inconsistent information 1: Ø Reference person and spouse have Editing and imputation Example of inconsistent information 1: Ø Reference person and spouse have same sex Paolo Valente - UNECE Statistical Division Slide 21

Editing and imputation Example of inconsistent information 2: Ø Excessive age difference between mother Editing and imputation Example of inconsistent information 2: Ø Excessive age difference between mother and children Paolo Valente - UNECE Statistical Division Slide 22

Editing and imputation Editing approaches: Top-down: Items in sequence, from first to last Ø Editing and imputation Editing approaches: Top-down: Items in sequence, from first to last Ø Multiple variable (Fellegi-Holt): Ø A set of statements and relationships among variables are checked in the household 2. The edit keeps track of all false statements 3. The system assess how to best changes the data 1. Paolo Valente - UNECE Statistical Division Slide 23

Editing and imputation Imputation methods: Ø Static imputation (or “cold deck”) Used mainly for Editing and imputation Imputation methods: Ø Static imputation (or “cold deck”) Used mainly for missing values only Value assigned from predetermined set, or distribution of valid responses à The set of values does not change over time Ø Ø Ø Dynamic imputation (or “hot deck”) Used for missing or inconsistent values Value assigned from “donor” with similar characteristics, that changes constantly à Response imputations change over time Ø Ø See “Handbook on Census Editing”, Ch. II. E and Annex V Paolo Valente - UNECE Statistical Division Slide 24

Editing and imputation Ø Types of edits: Fatal edits identify errors with certainty Ø Editing and imputation Ø Types of edits: Fatal edits identify errors with certainty Ø Query edits identify suspected errors Ø Ø Structure edits Ø Ø Check coverage and relations between different units: persons, households, housing units, enumeration areas etc. Edits for population and housing items See “Handbook on Census Editing”, Chapters III, IV and V Paolo Valente - UNECE Statistical Division Slide 25

Editing and imputation Practices in 2000 round Ø Most ECE countries (33 out of Editing and imputation Practices in 2000 round Ø Most ECE countries (33 out of 40) performed computer-supported editing, including several CIS countries Ø 22 countries performed automatic imputations Ø Most countries developed specific software Ø Some countries used SAS, Oracle, SQL, CSPro See “Measuring Population and Housing”, Chapter III Paolo Valente - UNECE Statistical Division Slide 26

Editing and imputation Plans for 2010 round Questions: Ø What are your plans for Editing and imputation Plans for 2010 round Questions: Ø What are your plans for editing and imputation? Ø What editing approaches/methods are you considering? Paolo Valente - UNECE Statistical Division Slide 27

Editing and imputation Plans for 2010 round Questions: Ø For which variables would you Editing and imputation Plans for 2010 round Questions: Ø For which variables would you consider imputation of missing values? Paolo Valente - UNECE Statistical Division Slide 28