Скачать презентацию Quality Metrics for Assessing the Impact of Editing Скачать презентацию Quality Metrics for Assessing the Impact of Editing

b346b124862f9b9badba99d9b1354cf8.ppt

  • Количество слайдов: 24

Quality Metrics for Assessing the Impact of Editing and Imputation on Economic Data Broderick Quality Metrics for Assessing the Impact of Editing and Imputation on Economic Data Broderick E. Oliver and Katherine Jenny Thompson Office of Statistical Methods and Research for Economic Programs 1

Outline • • Motivation for the study Quality Metrics (Formulas) Quality Metrics (Actual Results) Outline • • Motivation for the study Quality Metrics (Formulas) Quality Metrics (Actual Results) Future Research 2

Motivation Economic Directorate conducted a series of studies to evaluate the editing efficiency of Motivation Economic Directorate conducted a series of studies to evaluate the editing efficiency of selected surveys and censuses. 1. What value is added from subjecting the same record to multiple editing phases? 2. What is the impact of editing and imputation on the final data? 3

Development of Quality Metrics • Assess overall changes to “reported” data at the: – Development of Quality Metrics • Assess overall changes to “reported” data at the: – Micro level – Macro level • Examine – the size of change to reported data. – the source of change to reported data. • Determine which changes had greatest impact on final tabulations 4

Key Terms • • Critical Item Reported Data Final Data Flag 5 Key Terms • • Critical Item Reported Data Final Data Flag 5

Metric 1 • Item Level (Critical Items) • Percentage of records with reported values Metric 1 • Item Level (Critical Items) • Percentage of records with reported values whose value was changed by editing/imputation • Where: yi = 1 if reported value final value • 0 otherwise. and n = number of records 6

Metric 2 • Item Level (Critical Items). • The percentage of changes to the Metric 2 • Item Level (Critical Items). • The percentage of changes to the records with reported values that is attributable to analyst correction versus machine correction. Where ai = reported 10 ifotherwise. value final value and source is analyst correction. mi = 1 if reported 0 otherwise value final value and source is machine correction. n = number of records. 7

Metric 3 • • Item Level (Critical Items). The source of change of the Metric 3 • • Item Level (Critical Items). The source of change of the reported data. The size of change of the reported data. The impact of the changes on the final tabulations. 8

Metric 3: Tabular Format: (Item Level) Source of Change (1) Change Category (2) No. Metric 3: Tabular Format: (Item Level) Source of Change (1) Change Category (2) No. of Records (3) Tabulated (Weighted) Reported (4) 1. 0 < R/E < 1. 1 n x Tabulated (Weighted) Edited (5) Percent Difference (6) Sum of the Absolute Difference (7) y (y-x)*100/x z Total 5 Percent Difference Average Absolute Difference (8) z/n 1. 1 R/E < 9 Analyst Correction 9 R/E < 90 90 R/E < 900 R/E 900 No Change Totals R/E=1 Total 3 Total 4 9

Metrics Applied to: • Annual Wholesale Trade Survey (AWTS) • Annual Survey of Manufactures Metrics Applied to: • Annual Wholesale Trade Survey (AWTS) • Annual Survey of Manufactures (ASM) 10

Annual Wholesale Trade Survey (AWTS) • Sample Survey • Approximately 8, 000 wholesale businesses Annual Wholesale Trade Survey (AWTS) • Sample Survey • Approximately 8, 000 wholesale businesses • Critical Items: – Sales – Total Purchases – Total Inventories • Processed in Standard Economic Processing System (St. EPS) 11

AWTS Editing/Imputation • St. EPS Automatic Processing Flow – Simple Imputation Module: Data “clean AWTS Editing/Imputation • St. EPS Automatic Processing Flow – Simple Imputation Module: Data “clean up” – Edit Module: Identifies “suspicious” values – General Imputation module: Replaces “suspicious” values • Item Flagging – Can identify four distinct sources of change: • Analyst Correction • Analyst Impute • Machine Correction • No Change • “Cycling” between analyst and machine corrections 12

Annual Survey of Manufactures (ASM) • Sample Survey • 55, 000 establishments • Critical Annual Survey of Manufactures (ASM) • Sample Survey • 55, 000 establishments • Critical Items: – – Cost of Materials Employment Annual Payroll Receipts • Processed in the Economic Census System – Plain Vanilla Editing Module 13

ASM Editing/Imputation • ASM Automatic Processing Flow – Pre-editing Module: Data filling and clean ASM Editing/Imputation • ASM Automatic Processing Flow – Pre-editing Module: Data filling and clean up – Plain Vanilla Edit Modules • Ratio (editing/imputation) • Balancing (editing/imputation) • Item Flagging – Can identify three sources of change: • Analyst correction/impute (cannot distinguish) • Machine impute • No change • “Cycling” between analyst and machine 14

Illustration of Metric 1: AWTS Critical Item No. of records with reported values No. Illustration of Metric 1: AWTS Critical Item No. of records with reported values No. of records changed Percent Reported Amount (Weighted) In Millions Edited Amount (Weighted) In Millions Percent Difference Sales 4, 819 238 4. 9% $41, 156, 147 $2, 156, 439 - 93. 9 % Purchases 4, 628 403 8. 7% $4, 486, 157 $1, 953, 140 - 56. 5% Inventories 4, 334 326 7. 5% $21, 392, 659 $256, 920 - 98. 8% – Relatively few of the reported values for each critical item changed. – Changes to these records had a great impact on final tabulations. 15

Illustration of Metric 1: ASM Critical Item No. of records with reported values No. Illustration of Metric 1: ASM Critical Item No. of records with reported values No. of records changed Cost of Materials 35, 908 3, 520 Employment 31, 603 Annual Payroll Receipts Percent Reported Amount (Weighted) In Millions Edited Amount (Weighted) In Millions Percent Difference 9. 8% $2, 157 $1, 936 -10. 2% 4, 032 12. 8% 12 6 - 44. 6% 30, 756 454 1. 5% $293 $291 - 0. 4% 38, 074 4, 157 10. 9% $4, 320 $3, 647 - 15. 6% – Relatively few of the reported values for each critical item changed. – Except for employment, changes to these records had a “small” impact on final tabulations 16

Illustration of Metric 2: AWTS Critical Item Average Absolute Difference Between Reported and Edited Illustration of Metric 2: AWTS Critical Item Average Absolute Difference Between Reported and Edited Amount (In Millions) Ratio of AC to AI and AC to MI AC 221 92. 9% $175, 545 ----- 15 6. 3% $653 269/1 2 0. 8% $55 3150/1 AC 363 90. 1% $7, 404 ----- AI 37 9. 2% $289 26/1 MI 3 0. 7% $79 94/1 AC 285 87. 4% $74, 196 ----- AI 15 4. 6% $73 1011/1 MI Inventories Percent of Total MI Purchases No. Records Changed AI Sales Source of Change 26 8. 0% $39 1914/1 AC = Analyst Correction; AI = Analyst Impute; MI = Machine Impute 17

Illustration of Metric 2: ASM Critical Item Source of Change No. Records Changed Percent Illustration of Metric 2: ASM Critical Item Source of Change No. Records Changed Percent of Total Average Absolute Difference Ratio of AC to MI Cost of Materials AC 940 26. 7% $188, 435 -- MI 2, 580 73. 3% $63, 468 3/1 Receipts AC 2, 723 65. 5% $270, 852 -- MI 1, 434 34. 5% $74, 262 4/1 AC = Analyst Correction MI = Machine Impute 18

Key Findings With Metric 3: AWTS • Analyst corrections accounted for the majority of Key Findings With Metric 3: AWTS • Analyst corrections accounted for the majority of the changes to all three critical items • Correction of “rounding” errors – Corrected by analysts – Most substantive impact on tabulations – Relatively few records 19

Key Findings Metric 3: ASM • A high percentage of changes to reported data Key Findings Metric 3: ASM • A high percentage of changes to reported data fell into the “small change” categories. – For Cost of Materials, machine imputes made the majority of these small changes (74. 7 percent). – For Receipts, analysts made the majority of these changes (68. 4 percent). • Correction of “rounding” errors: – Corrected equally by analyst and machine – Most substantive impact on tabulations – Relatively few records 20

Study Highlights/Key Findings • Importance of rounding errors: – Small number of cases – Study Highlights/Key Findings • Importance of rounding errors: – Small number of cases – Resolved generally by analysts in AWTS – Resolved by analysts and machine in ASM • Large proportion of small changes in ASM: – Identified potential edit parameter problems 21

Advantages of Standardized Metrics • Allowed for direct comparisons between different programs. • Uncovered Advantages of Standardized Metrics • Allowed for direct comparisons between different programs. • Uncovered different areas of investigation in different programs. • Facilitated “buy-in” from all parties via development process. • Provides baseline measures for future investigation. 22

Future Research • Apply metrics at various processing stages (AWTS). • Apply metrics at Future Research • Apply metrics at various processing stages (AWTS). • Apply metrics at industry level. • Examine the number of times the records are subjected to changes. 23

Contact Information broderick. e. oliver@census. gov katherine. j. thompson@census. gov 24 Contact Information broderick. e. oliver@census. gov katherine. j. thompson@census. gov 24