Statistical data editing near the source using cloud computing concepts George Pongas, Christine Wirtz -Eurostat MSIS 2011 – 23 -25 May 2011, Luxembourg 24/5/2011 MSIS 2011 – 23 -25 May 2011, Luxembourg 1
Editing near the source n Accelerates speed of final delivery to users and institutions n Checks and imputations are near the respondent n Data knowledge is frequently more profound in the primary collector institutions n Logical proximity is better than physical: Data and application sharing 24/5/2011 2
Cloud and SOA in few Lines n Separates ownership and usage of data storage computer power and application development and execution (cloud) n Cloud variants are Iaa. S, Paa. S, Saa. S n Cloud architectures are: – Public – Private – Mixed – Community n Based on web technologies and independent software components to interlink on demand (SOA) 24/5/2011 3
Data Editing in Eurostat n High volume of arrivals (>60. 000 per year) n Format heterogeneity n Data checking absorbs substantial volume of human resources n Erroneous data imply communications with MS n Eurostat as a rule does not Impute… n Interest to have a Common distributed solutions 24/5/2011 4
Eurostat’s web enabled system for editing (Editing building block (Ebb) n Completely Metadata Driven n Exists in 2 versions: – PC version – Web-based version n Technologies used: – ANTLR – Java – Tomcat or Weblogic – Hibernate – Postgres or Oracle 24/5/2011 5
EBB Information Flow 24/5/2011 6
Implementation Details EBB is written using a set of Web services of the following types: – Administration – Program – Job 24/5/2011 7
EBB functionalities n Support of categorical, text and numeric variables n Separation of programmer and user interfaces n Conditional and unconditional rules n Multi-record rules n Deterministic imputation n Use of auxiliary data n File operations n Special functions (unicity, duplication checks. . . ) n Outliers (HB, Sigma Gap, Terror) n Input/output of data/metadata n Reporting 24/5/2011 8
Usage until now n Embedded in SAS (for microdata editing) n To distribute to data providers as standalone version – – – 24/5/2011 FDI (foreign direct investments) ITS (international trade in services) SBS (structural business statistics) CVTS (continuous vocational training survey), AES (adult education survey) 9