- Количество слайдов: 47
DATA COLLECTION STRATEGY AND INTEGRATED SOLUTIONS CES Seminar, Geneva, 13 -15 June 2005 CODACMOS PROJECT CLUSTER OF DATA COLLECTION INTEGRATION AND METADATA SYSTEMS FOR OFFICIAL STATISTICS (1. 11. 2002 – 31. 10. 2004)
The CODACMOS problem How to improve processes between the Data Collectors and the Data Respondents with the primary focus on the respondent needs, in order to: a) improve the data quality; b) making easier and more efficient the task of electronic responding; c) obey to the rules of confidentiality; d) simplify the information asked.
CODACMOS main objectives Review and rationalize the state of art on the solutions proposed so far on electronic Data Collection and exchange, as well as on the metadata involved.
CODACMOS main objectives Identify two experimental field areas and implement demonstrations for selected solutions (or models) on the integration of different data sources, such as existing archives /registers or other administrative data, collected for statistical purposes.
CODACMOS main objectives On the basis of the demonstrations and through a wide consultation, specify EU key issues for the standardization / harmonization of data collection models and methods and for the description of metadata standards.
The framework for data collection
The data to be collected from the respondents should fulfil the following conditions: Metadata should be harmonized between the data collectors: differences between questions on the same object should be avoided or at least explained
…the conditions (2) Metadata should be published Data already provided by a respondent to a data collector should be collected by other data collectors by secondary data collection
…the conditions (3) Confidentiality of data (content during e-transport) Available in the automated information systems of the businesses
An important question to be answered is: “What do we expect from the politicians in the field of e-Government, from the NSI’s and Eurostat? ” to facilitate an effective electronic data collection process minimizing the administrative burden for the respondents. To reach this goal the following areas in which further work and research is necessary:
WHAT IT IS NEEDED (1) A national portal for EDI (electronic data interchange) for reporting by businesses and households to government agencies: to be used by all governmental institutions collecting data from households, enterprises or other institutions.
WHAT IT IS NEEDED(2) Standard software modules for operation of administrative base registers: The project should cover primary EDI for establishing and operating administrative base registers for person, workplace and dwelling (land/property/building/dwelling/ address).
WHAT IT IS NEEDED (3) Central metadata register of reporting obligations for businesses: to establish a central metadata register where all government institutions must register their metadata before they are permitted to capture the associated data from enterprises, including the organization around the metadata register (responsibilities for set up and maintenance, etc. ).
WHAT IT IS NEEDED (4) Metadata model for data from individuals: To establish a metadata model which all government agencies must use when they collect data directly from individuals or request such data from other government agencies.
WHAT IT IS NEEDED (5) Central metadata register of reporting for individuals: To establish a central metadata register where all government agencies must register their metadata before they request associated data from individuals.
WHAT IT IS NEEDED (6) A common EDI system for reporting enterprise accounts: further initiative to create a European system for enterprise accounts, internal procedures of the enterprise and direct reporting to government agencies such as tax, register of accounts for limited companies and the national statistical institute.
WHAT IT IS NEEDED (7) An integrated EDI system for reporting from local to central government: development of a European integrated EDI system for reporting of local and regional governments to central government agencies.
WHAT IT IS NEEDED (8) Standard modules for operating longitudinal databases: the databases are to be the source for statistics, analysis and administrative purposes. Standard modules for automatic editing and imputation for administrative and statistical sources. Furthermore…
WHAT IT IS RECOMMENDED it is recommended that an active participation of the NSI’s, stimulated by Eurostat, in e-Government on European and National level to promote statistics; further support to extend the domain of the SDMX initiative becomes indispensable;
WHAT IT IS RECOMMENDED the support of open source principles for software distribution and maintenance of common statistical software is needed; the NSIs and other data collectors should commit themselves to an active participation in the IDABC program to promote statistics.
The integrated solutions Integrated solution has at least three aspects: Integration at data level - this should be considered as a primary goal in primary and secondary data collectors. Tools for supporting the data integration are standards both for data and metadata;
The integrated solutions Integration at process level - is supported by data and metadata standards and technology standards. Important are standards on classifications and code lists, less important are standards of technology used;
The integrated solutions Integration at state level - is supported by the standards mentioned above and complemented by legislation and state level regulations. Mutual agreement among the institutions at state administration level on data mining or data transfer could substitute the legislation in case of lack of it. Partly data protection and other regulation may also prohibit non-legislated data sharing.
CODACMOS Models: Primary
CODACMOS Models: Secondary
The operational model of an integrated solution at state level
XML is becoming a standard, not only for Primary but also for Secondary Data Collection. A combination of XML/EDI combines the advantages of both.
Relation between XML/EDI, EDI and XML
Main inputs for the integrated metadata model Metadata items and models considered (STAT FIN; ONS; DESAN; NSSG; ISTAT, INFOSTAT) Metadata-enabled systems and technologies in use (COSSI, SIDI, SDOSIS, EXPLORIS, METIS, SIDP, IMP)
The metadata model considers the main stages of statistical process The model considers the main stages of statistical data processing. i)data collection and analysis (main part) including harmonization, ii)data processing and iii)dissemination /output process.
The metadata model considers the main stages of statistical process Semantic and documentation metadata as well as process metadata are considered in order to avoid compromising data quality as a result of a data-metadata mismatches by considering a set of operators that are defined over the common domain of data and metadata.
The metadata model considers the main stages of statistical process Logistic metadata are taken into account for the location and format of data.
CODACMOS Demonstrations The difficult problems are to do with content, not technology. For example: Is useful secondary data available? Can parties agree to cooperate, or should the most important one impose a solution? Can parties agree on definitions? What do we do when existing definitions do not correspond?
What it is demonstrated Data Quality Improvement Administrative data can be of high quality, particularly where individual records are used regularly and the data subject or the data user has a reason to be concerned about correctness (as with bank and tax records). Using CAI techniques for data capture, where responses can be checked directly with the respondent, reduces errors, as does the use of prior (secondary) data about the respondent.
What it is demonstrated Technology improves timeliness and response rates, by making it easier for respondents to respond. Quality information can be disseminated in the same way as metadata, so improving the quality of subsequent analysis. Measurement standards for quality data have been agreed elsewhere.
What it is demonstrated Standards Understanding of the breadth and depth of requirements for statistical metadata is spreading. It is important to think about processes that use metadata, where the processes can have inputs and outputs that are metadata.
What it is demonstrated A number of useful proposals for metadata structures already exist, and the extensions in the CODACMOS Common Metadata Model are useful. The idea of operators acting on data and metadata together is new. BUT… There is a continuing need to explicitly recognise different levels of abstraction within metadata proposals, and a continuing need for integration efforts, particularly to ensure that the needs of NSIs are covered.
What it is demonstrated New Technologies Using CAI and modern modes of data capture does work, BUT… Introduces new issues, so careful planning and design is needed. Introduces some new costs, while reducing existing ones. Has impacts on response rate and answers (generally improvements) and on sample coverage, so may introduce discontinuity in results.
What it is demonstrated Security issues covered by requirements of other domains, and these include integrity and confidentiality. Data transfer works if latest developments are followed XML solves the plumbing problem for exchange of complex information structures. Technology provides tools, not solutions.
CODACMOS FOLLOW-UP in ISTAT Connection of statistical data collection process to e. Government program Common intra-sector project (Ministry of Finance and Social Security) on the feasibility of “Data collection by linking existing administrative sources (including registers) and combined use of administrative sources and statistical surveys (integration of primary and secondary)”
CODACMOS FOLLOW-UP in ISTAT Automated data collection strategy and integration (SISSIEI project experience should be analysed and become a starting point of a future integration strategy in ISTAT) Development of a strategy for metadata Changes on IT infrastructure
CODACMOS FOLLOW-UP in ISTAT Standards: criteria for the evaluation of the use of standards (and developed models – metadata in particular) feasibility study of the use of standardized formats: XBRL
Further information CODACMOS WEB SITE: www. codacmos. eu. org Dipartimento della produzione statistica e il coordinamento tecnico scientifico, ISTAT TEL: +39 06 4673 2576/2575 FAX: +39 06 4673 2956 E-MAIL: [email protected] it
Thank you for your interest Co. RD Meeting, Luxembourg on 11. 05. 2005