Скачать презентацию The Earth System Grid ESG The Community Скачать презентацию The Earth System Grid ESG The Community

e37bfd4dbecc02c2a245f2be7cab25f1.ppt

  • Количество слайдов: 41

The Earth System Grid (ESG) & The Community Data Portal (CDP) (NCAR’s Data & The Earth System Grid (ESG) & The Community Data Portal (CDP) (NCAR’s Data & Gri. D Efforts) for COMMISSION FOR BASIC SYSTEMS INFORMATION SYSTEMS and SERVICES INTERPROGRAMME TASK TEAM ON THE FUTURE WMO INFORMATION SYSTEM KUALA LUMPUR, 20 - 24 OCTOBER 2003 Courtesy: Don Middleton NCAR Scientific Computing Division NCAR

“Atkins Report” l “A new age has dawned…” “The Panel’s overarching recommendation is that “Atkins Report” l “A new age has dawned…” “The Panel’s overarching recommendation is that the National Science Foundation should establish and lead a large-scale, interagency, and internationally coordinated Advanced Cyberinfrastructure Program (ACP) to create, deploy, and apply cyberinfrastructure in ways that radically empower all scientific and engineering research and allied education. We estimate that sustained new NSF funding of $1 billion per year is needed to achieve critical mass and to leverage the coordinated co-investment from other federal agencies, universities, industry, and international sources necessary to empower a revolution. The cost of not acting quickly or at a subcritical level could be high, both in opportunities lost and in increased fragmentation and balkanization of the research. ” NCAR Atkins Report, Executive Summary

The Earth System Grid http: //www. earthsystemgrid. org l l U. S. DOE Sci. The Earth System Grid http: //www. earthsystemgrid. org l l U. S. DOE Sci. DAC funded R&D effort - a “Collaboratory Pilot Project” Build an “Earth System Grid” that enables management, discovery, distributed access, processing, & analysis of distributed terascale climate research data Build upon Globus Toolkit and Data. Grid technologies and deploy (Rubber on the road) Potential broad application to other areas NCAR

ESG Team l ANL – – Ian Foster (PI) Veronika Nefedova (John Bresenhan) (Bill ESG Team l ANL – – Ian Foster (PI) Veronika Nefedova (John Bresenhan) (Bill Allcock) l LBNL l ORNL – Arie Shoshani – Alex Sim – David Bernholdte – Kasidit Chanchio – Line Pouchard NCAR l LLNL/PCMDI – – l Bob Drach Dean Williams (PI) USC/ISI – Anne Chervenak – Carl Kesselman – (Laura Perlman) l NCAR – – – David Brown Luca Cinquini Peter Fox Jose Garcia Don Middleton (PI) Gary Strand

NCAR NCAR

Baseline Numbers l T 42 CCSM (current, 280 km) – 7. 5 GB/yr, 100 Baseline Numbers l T 42 CCSM (current, 280 km) – 7. 5 GB/yr, 100 years ->. 75 TB l T 85 CCSM (140 km) – 29 GB/yr, 100 years -> 2. 9 TB l T 170 CCSM (70 km) – 110 GB/yr, 100 years -> 11 TB NCAR

Capacity-related Improvements Increased turnaround, model development, ensemble of runs Increase by a factor of Capacity-related Improvements Increased turnaround, model development, ensemble of runs Increase by a factor of 10, linear data l Current T 42 CCSM – 7. 5 GB/yr, 100 years ->. 75 TB * 10 = 7. 5 TB NCAR

Capability-related Improvements Spatial Resolution: T 42 -> T 85 -> T 170 Increase by Capability-related Improvements Spatial Resolution: T 42 -> T 85 -> T 170 Increase by factor of ~ 10 -20, linear data Temporal Resolution: Study diurnal cycle, 3 hour data Increase by factor of ~ 4, linear data CCM 3 at T 170 (70 km) NCAR

Capability-related Improvements Quality: Improved boundary layer, clouds, convection, ocean physics, land model, river runoff, Capability-related Improvements Quality: Improved boundary layer, clouds, convection, ocean physics, land model, river runoff, sea ice Increase by another factor of 2 -3, data flat Scope: Atmospheric chemistry (sulfates, ozone…), biogeochemistry (carbon cycle, ecosystem dynamics), middle Atmosphere Model… Increase by another factor of 10+, linear data NCAR

Model Improvement Wishlist Grand Total: Increase compute by a Factor O(10000) NCAR Model Improvement Wishlist Grand Total: Increase compute by a Factor O(10000) NCAR

ESG Scenario l l l End 2002: 1. 2 million files comprising ~75 TB ESG Scenario l l l End 2002: 1. 2 million files comprising ~75 TB of data at NCAR, ORNL, LANL, NERSC, and PCMDI End 2007: As much as 3 PB (3, 000 TB) of data (!) Current practice is already broken – the future will be even worse if something isn’t done… NCAR

ESG Scenario (cont. ) l l Data – Different formats are converted to net. ESG Scenario (cont. ) l l Data – Different formats are converted to net. CDF – net. CDF is not standardized to the CF model – Different sites require knowledge of different methods of access Metadata – Most kept in online files separate from data and unsearchable unless one is “in the know” – Some kept in people’s brains Access control – Manual – Not formalized Data requests – Beginnings of a formal process (e. g. , the PCMDI model) – Beginnings of web portals – Far too much done by hand – Logging nearly non-existent NCAR

ESG: Challenges l l l Enabling the simulation and data management team Enabling the ESG: Challenges l l l Enabling the simulation and data management team Enabling the core research community in analyzing and visualizing results Enabling broad multidisciplinary communities to access simulation results We need integrated scientific work environments that enable smooth WORKFLOW for knowledge development: computation, collaboration & collaboratories, data management, access, distribution, analysis, and visualization. NCAR

ESG: Strategies l Move data a minimal amount, keep it close to computational point ESG: Strategies l Move data a minimal amount, keep it close to computational point of origin when possible – Data access protocols, distributed analysis l When we must move data, do it fast and with a minimum amount of human intervention – Storage Resource Management, fast networks l Keep track of what we have, particularly what’s on deep storage – Metadata and Replica Catalogs l Harness a federation of sites, web portals – Globus Toolkit -> The Earth System Grid -> The Ultra. Data. Grid NCAR

Storage/Data Management HRM Tera/Peta-scale Archive Tools for reliable staging, transport, and replication Server Selection Storage/Data Management HRM Tera/Peta-scale Archive Tools for reliable staging, transport, and replication Server Selection Control Monitoring HRM Server Tera/Peta-scale Archive NCAR Client HRM

HRM aka “Data. Mover” l l Running well across DOE/HPSS systems New component built HRM aka “Data. Mover” l l Running well across DOE/HPSS systems New component built that abstracts NCAR Mass Storage System Defining next generation of requirements with climate production group First “real” usage “The bottom line is that it now works fine and is over 100 times faster than what I was doing before. As important as two orders of magnitude increase in throughput is, more importantly I can see a path that will essentially reduce my own time spent on file transfers to zero in the development of the climate model database” – Mike Wehner, LBNL NCAR

OPe. NDAP An Open Source Project for a Network Data Access Protocol (originally DODS, OPe. NDAP An Open Source Project for a Network Data Access Protocol (originally DODS, the Distributed Oceanographic Data System) NCAR

Distributed Data Access Services Typical Application OPe. NDAP-g -Transparency -Performance -Security -Authorization -(Processing) Distributed Distributed Data Access Services Typical Application OPe. NDAP-g -Transparency -Performance -Security -Authorization -(Processing) Distributed Application net. CDF lib OPe. NDAP Client ESG client data OPe. NDAP Via http OPe. NDAP Via Grid ESG + DODS Open. DAP Server Data (local) NCAR ESG Server Data (remote) Big Data (Multiple remotes)

ESG: Nc. ML Core Schema l l l For XML encoding of metadata (and ESG: Nc. ML Core Schema l l l For XML encoding of metadata (and data) of any generic net. CDF file Objects: net. CDF, dimension, variable, attribute Beta version reference implementation as Java Library (http: //www. scd. ucar. edu/vets/luca/netcdf/extract_metadata. htm) nc: net. CDFType nc: dimension nc: Variable. Type nc: attribute net. CDF nc: variable nc: values nc: attribute NCAR

is. A Object [1] id Person [0, 1] first. Name [0, 1] last. Name is. A Object [1] id Person [0, 1] first. Name [0, 1] last. Name [0, 1] contact LEGEND works. For participant role= Institution [0, 1] name [0, 1] type [0, 1] contact Abstract. Class is. A Project [0, n] topic type= [0, 1] funding Activity [0, 1] name [0, 1] description [0, 1] rights [0, n] date type= [0, n] note [0, n] participant role= [0, n] reference uri= is. A inheritance association Service [0, 1] name [0, 1] description is. Part. Of Campaign service. Id Investigation Ensemble is. A is. Part. Of has. Parent has. Child has. Sibling Observation Simulation [0, n] simulation. Input type= [0, n] simulation. Hardware generated By NCAR Experiment Dataset [0, 1] type [0, 1] conventions [0, n] date type= [0, n] format type= uri= [0, 1] time. Coverage [0, 1] space. Coverage Analysis is. Part Of

ESG Metadata Progress l Co-developed Nc. ML with Unidata – CF conventions in progress, ESG Metadata Progress l Co-developed Nc. ML with Unidata – CF conventions in progress, almost done l l Developed & evaluated a prototype metadata system Finalized an initial schema for PCM/CCSM – Address interoperability with federal standards and NASA/GCMD via the generation of DIF/FGDC/ISO – Address interoperability with digital libraries via the creation of Dublin Core l l l Testing relational and native XML databases, and OGSA-DAI Exploratory work for first-generation ontology Authoring of discovery metadata in progress NCAR

ANL ESG Topology CAS LBNL grid. FTP SERVER HRM NCAR HPSS visualize grid. FTP ANL ESG Topology CAS LBNL grid. FTP SERVER HRM NCAR HPSS visualize grid. FTP SERVER DISK MSS grid. FTP cross-update RLS query RLS GRAM GATEKEEPER ESG WEB PORTAL Tomcat/Struts authenticate ISI OGSA-DAI My. SQL RDBMS NCAR query My. Proxy HPSS RLS execute HRM LLNL grid. FTP SERVER LAS SERVER DISK cache RLS HRM ORNL submit HRM

Collaborations & Relationships l l l l CCSM Data Management Group The Globus Project Collaborations & Relationships l l l l CCSM Data Management Group The Globus Project Other Sci. DAC Projects: Climate, Security & Policy for Group Collaboration, Scientific Data Management ISIC, & Highperformance Data. Grid Toolkit OPe. NDAP/DODS (multi-agency) NSF National Science Digital Libraries Program (UCAR & Unidata THREDDS Project) U. K. e-Science and British Atmospheric Data Center NOAA NOMADS and CEOS-grid Earth Science Portal group (multi-agency, intnl. ) NCAR

Immediate Directions l l l Broaden usage of Data. Mover and refine Continue building Immediate Directions l l l Broaden usage of Data. Mover and refine Continue building metadata catalogs Revisit overall security model and consider simplified approaches Redesign and implement user interface Alpha version of OPe. NDAPg – Test and evaluate with client applications l Develop automation for data publishing (GT 3) l Deploy for IPCC runs NCAR

The Community Data Portal (CDP) “The dataportal has changed my life…” Ben Kirtman, COLA The Community Data Portal (CDP) “The dataportal has changed my life…” Ben Kirtman, COLA l l l Provide a common portal to NCAR, UCAR, and university data Provide a sustainable cyberinfrastructure that dramatically lowers the cost of sharing data (there is HUGE interest in this) Directly couple to simulation systems and Data. Monster Begin capturing rich metadata and catalog our scientific experiments for the world MSS -> A Petascale Mass Knowledge System Federate internationally (ESG, THREDDS, U. K. e-Science, NOMADS, PRISM, GEON, etc. ) NCAR

Foster Revolutionary Change Mass Storage System (1. 5 PB) Petascale Knowledge Repository Establish a Foster Revolutionary Change Mass Storage System (1. 5 PB) Petascale Knowledge Repository Establish a new paradigm for managing and accessing scientific data based on semantic organization. NCAR

Community Data Portal l Purpose: ü Build an infrastructure using different methods for data Community Data Portal l Purpose: ü Build an infrastructure using different methods for data exploration and delivery ü Web-based retrieval and interactive analysis for MSS collections ü Data sharing for multi-institution cooperative studies ü Browse, select, compare, download data sets, & specify data subsets using – graphical, text entry, choice of output format Components: ü User interface, Live Access Server (LAS) ü Middleware, Ferret, NCL, Gr. ADS ü File service, local, or DODS Status: ü Pilot working (2 years), more middleware testing NCAR

Data Access Live Access Client Live Access Server Ferret NCL Other Engines DODS Data Data Access Live Access Client Live Access Server Ferret NCL Other Engines DODS Data Collections NCAR Massive Data Simulation & Retrospective CSM, PCM, DSS, MM 5, WRF, MICOM, CMIWG

Example … Data Analysis NCAR Example … Data Analysis NCAR

Live Access Server + NCL (Grib Data) NCAR Live Access Server + NCL (Grib Data) NCAR

Interface and Reanalysis 2 Sea Level Pressure NCAR Interface and Reanalysis 2 Sea Level Pressure NCAR

Community Data Portal architecture user interface UI core services UI UI Struts GDS DODS Community Data Portal architecture user interface UI core services UI UI Struts GDS DODS aggregation server LAS Tomcat middleware UI Tomcat catalogs parsing & metadata ingestion data search & discovery catalogs browsing MSS data retrieval data access (OPe. NDAP, FTP, HTTP) dataportal. ucar. edu hardware NCAR raid disks MSS data visualization (NCL, Ferret)

Community Data Portal Metadata Software ESG metadata DC metadata Nc. ML metadata other metadata Community Data Portal Metadata Software ESG metadata DC metadata Nc. ML metadata other metadata parses THREDDS catalog parser application reference stores full XML doc THREDDS catalogs XML native DB (Xindice displays XML viewer web application future advanced query (Xpath, Xquery) schemaspecific stylesheets shreds XML doc into tables relational DB (My. SQL) simple query (SQL) uses THREDDS catalogs browser Web application NCAR links to Search & Discovery web application Results: list of triplets (dataset id, metadata schema, metadata URL)

CDP Data/Catalog Contributors l l l l l ACD: MOZART v 2. 1 standard CDP Data/Catalog Contributors l l l l l ACD: MOZART v 2. 1 standard run (Louisa Emmons) ATD: Radar almost ready for today! CGD: CAS satellite data example (Lesley Smith) CGD: CDAS and VEMAP data (Steve Aulen. Bach, Nan Rosenbloom, Dave Schimmel) CGD: CCSM 1000 year run (Lawrence Buja) CGD: PCM 16 top datasets (Gary Strand) SCD: DSS full data holdings (Bob Dattore, Steve Worley) SCD: VETS example visualization catalog (Markus Stobbs, Luca Cinquini) COLA: Jennifer Adams, Jim Kinter, Brian Doty NCAR

Next Steps l Recruiting (!) – – l l l One student for data Next Steps l Recruiting (!) – – l l l One student for data ingest One software engineer Systems Expanding storage by 20 TB (SCD cosponsor) Ongoing publication of datasets Publishing documents on plans, design, how to partner, standard services, and management procedures Building partnerships, DMWG meeting August NCAR

Closing Thoughts l Building a sustainable infrastructure for the long-term ü Difficult, expensive, and Closing Thoughts l Building a sustainable infrastructure for the long-term ü Difficult, expensive, and time-consuming ü Requires longer-term projects l Team-building is a critical process ü Collaboration technologies really help l Managing all the collaborations is a challenge ü But extremely valuable l Good progress, first real usage NCAR

Links l Earth System Grid – www. earthsystemgrid. org l Community Data Portal – Links l Earth System Grid – www. earthsystemgrid. org l Community Data Portal – dataportal. ucar. edu NCAR

END NCAR END NCAR

We Will Examine Practically Every Aspect of the Earth System from Space in This We Will Examine Practically Every Aspect of the Earth System from Space in This Decade Longer-term Missions - Observation of Key Earth System Interactions Aqua Terra Landsat 7 Quik. Scat Aura ICEsat Jason-1 Exploratory - Explore Specific Earth System Processes and Parameters and Demonstrate Technologies Triana GRACE VCL SRTM Cloudsat NCAR PICASSO Courtesy of Tim Killeen, NCAR EO-1

Characteristics of Infrastructure l Essential – So important that it becomes ubiquitous l Reliable Characteristics of Infrastructure l Essential – So important that it becomes ubiquitous l Reliable – Example: the built environment of the Roman Empire l Expensive – Nothing succeeds like excess (e. g. Interstate system – Inherently one-off (often, few economies of scale) l Clear factorization between research and practice – Generally deploy what provably works NCAR

CDP Interactions & Opportunities l l l l l COLA CGD/VEMAP ACD, HAO/WACCM CGD/CCSM, CDP Interactions & Opportunities l l l l l COLA CGD/VEMAP ACD, HAO/WACCM CGD/CCSM, CAM CGD/CAS MMM/WRF UCAR/JOSS UCAR/Unidata CGD, SCD, CU/Grid. BGC NOAA/NOMADS NCAR l l l l l GODAE HAO/TIEGCM, MLSO ATD/Radar, HIAPER ACD/Mozart, BVOC, Aqua proposal Bio. Geo/CDAS SCD/DSS DOE/Earth System Grid DLESE GIS Initiative