- Количество слайдов: 12
End-to-End Data Services A Few Personal Thoughts Unidata Staff Meeting 2 September 2009
Vision “Unidata’s vision calls for providing comprehensive, wellintegrated and end-to-end data services for the geosciences. These include an array of functions for collecting, finding, and accessing data; data management tools for generating, cataloging, and exchanging metadata; and submitting or publishing, sharing, analyzing, visualizing, and integrating data. ” What does this vision statement mean to each of us?
Background • Providing real-time weather data (and related tools) was the primary reason why Unidata was created and it has been the bread and butter of Unidata’s mission for more than two decades. • But as our work has evolved, along with our community, it has become clear that just provision of real-time data or facilitating data access is not enough. • Hence the vision statement. • Let’s think about how some of the capabilities available on Amazon/Ebay/You Tube/Flickr can be facilitated for geosciences “data”.
Objectives 1. Create “integrated” data services across all stages of data life cycle, beginning with observations and ending with data curation/archiving. • A) Observations/Sensors Ingest Data collection systems Data providers Disseminate Users (both end users and data archival systems) • B) From beginning till end of a workflow (LEAD example): • Observations Ingest Analysis/Assimilation Prediction Output Dissemination Users (both end users and data repositories)
Imperatives • Integrated services does not imply a monolithic system, but a set of modular services that are configurable, flexible, extensible, and scalable. • Need think what [essential] services are needed by our users and the use cases. – Users include students, faculty, scientists, data providers, outreach providers, field project personnel – Use cases include class room & lab use, research studies, weather websites, field projects, projects like LEAD, portals, and data centers – Both programmatic and interactive invocation • We may not work on all of the functionalities ourselves but we need to facilitate as many of the as possible.
Strategies and Tactics • Integration achieved via both loosely and tightly coupled components and services • Incrementalism is the only practical option for a program like Unidata where many technologies already exist and resources are scarce. • Leverage as much as possible both our own technologies as well as what is available from the outside.
What do I mean by Data? • Scientific data (binary, ASCII, net. CDF, HDF, XML, GRIB, …and XML) • Metadata (ASCII, XML, etc. ) • Data in data bases (e. g. , SQL) • GIS data (Shapefiles, KML, etc. ) • Derived products from scientific data • Ancillary data objects (images, videos, documents – pdf, Word, html, ppt, etc.
Integration Capabilities • Different data types (feeds, obs. , platforms, model output, and GIS information) • Different data formats • Data on different projections • Distributed data holdings • Data operation (e. g. , GDS, net. CDF operators) • Metadata addition • Integration of scientific data with metadata content, documents, and other information
Not develop ourselves but perhaps provide hooks to • • • Collaboration tools Wikis Forums Blogs Chat and IM/SMS Social network apps (Facebook, Twitter, etc. ) • RSS, email and other notifications
A list of possible data services • • • • • Data collection service (routine ingest via LDM, FTP, etc. ) Data submission service Metadata service for submitting, editing, and exchanging metadata Cataloging service Data discovery service Monitoring and notification service for new data, metadata, and products Data access services Data delivery/transport services, including copying/moving data to other servers and personal space and streaming data on demand Security and authentication services Subsetting service, including capability for progressive disclosure Aggregation services for data and metadata Services for CF conformance checking Decoding and data translation services Unit conversion services Visualization and product generation services Data fusion and data manipulation services (e. g. , net. CDF operator services) GIS services Output handling services
IMO, Beyond Unidata’s Scope • • • Data mining Ontologies Brokering and workflow orchestration Federation and mediation Provenance Curation and stewardship
Final Thoughts • It is important that we develop consensus on what we mean by integrated, end-to-end data services. Therefore, we need to hear your thoughts. • Once we have an idea of what it is that we want to build, we need to agree on how to go about building it. • Again, I believe in an incremental approach, but there may be other ideas. With RAMADDA, THREDDS, net. CDF, and LDM, many of the pieces already exist upon which to build E 2 E data services. • We need to identify the next steps and a concrete project in this potentially long journey. Develop a pilot effort? A prototype?