Скачать презентацию NPP Atmosphere PEATE Climate Data Processing Made Easy Скачать презентацию NPP Atmosphere PEATE Climate Data Processing Made Easy

eedfd9a0aa6559ce601c2e07d0cb5001.ppt

  • Количество слайдов: 23

NPP Atmosphere PEATE Climate Data Processing Made Easy Scott Mindock Atmosphere PEATE Team Space NPP Atmosphere PEATE Climate Data Processing Made Easy Scott Mindock Atmosphere PEATE Team Space Science and Engineering Center University of Wisconsin-Madison 10 July 2008 Space Science and Engineering Center University of Wisconsin-Madison

Space Science Engineering Center (SSEC) The NPP Atmosphere PEATE is implemented within the framework Space Science Engineering Center (SSEC) The NPP Atmosphere PEATE is implemented within the framework and facilities of the Space Science and Engineering Center (SSEC) at the University of Wisconsin-Madison. SSEC has been successfully supporting operational, satellite-based remotesensing missions since 1967, and its capabilities continue to evolve and expand to meet the demands and challenges of future missions. 1. Employs ~ 250 scientists, engineers, programmers, administrators and IT support staff. 2. Satellite missions currently supported: GEO: GOES 10/11/12/R; Meteosat 7/9; MTAT-1 R; FY 2 C/2 D; Kalpana LEO: NOAA 15/16/17/18, Terra, Aqua, NPP, NPOESS, FY 3, Met. Op Space Science and Engineering Center University of Wisconsin-Madison

Funding and Related Work Atmosphere PEATE is funded under NASA Grant NNG 05 GN Funding and Related Work Atmosphere PEATE is funded under NASA Grant NNG 05 GN 47 A • Award Date: 10/07/2005 • Grant Period: 08/15/2005 to 8/14/2008 (renewal in progress) Related Work at SSEC: • Cr. IS SDR Cal/Val and Characterization (Revercomb, IPO) • VIIRS SDR and Cloud Cal/Val (Menzel, IPO) • VIIRS Algorithm Assessment (Heidinger, IPO) • International Polar Orbiter Processing Package (Huang, IPO) • VIIRS Instrument Characterization (Moeller, NASA) Space Science and Engineering Center University of Wisconsin-Madison

Creating Climate Data Products (CDR) is hard! Products track global trends • Calibration must Creating Climate Data Products (CDR) is hard! Products track global trends • Calibration must be accurate. (No calibration artifacts) • Algorithms must be fully verified with global data (No regional artifacts) Data sets are large and hard to manage Developing the CDRs is an iterative process Large processing clusters are required • Programming requires different skill set • Distributed systems hard to test On going process • Requirements change • Technology changes • Staff changes Space Science and Engineering Center University of Wisconsin-Madison

The process requires multiple computing systems Single machine can be used for initial development The process requires multiple computing systems Single machine can be used for initial development but cluster computing needed to verify performance over full globe. Space Science and Engineering Center University of Wisconsin-Madison

CDR development is an iterative process Initial development occurs on single machine Product verification CDR development is an iterative process Initial development occurs on single machine Product verification requires data sets of increasing size Increasing data set size increase computation time Space Science and Engineering Center University of Wisconsin-Madison

Strategies of processing simplification Reduce or remove the “Move to Cluster” step • Make Strategies of processing simplification Reduce or remove the “Move to Cluster” step • Make executions environments similar • Make data access patterns similar Results in faster iterations Space Science and Engineering Center University of Wisconsin-Madison

Strategies for managing processing system Use well defined interfaces between subsystems • Decouples systems Strategies for managing processing system Use well defined interfaces between subsystems • Decouples systems which reduces learning curve • Allows evolution of subsystems • Simplifies test and verification of software Create configuration driven subsystems • Simplifies deployment of subsystems • Allows operations to modify system behavior Leverage automated testing technologies • Reduces learning curve • Provides continuous test coverage • Captures requirements in executable form Space Science and Engineering Center University of Wisconsin-Madison

The system: Atmosphere PEATE Ingest : ING • Brings data into the Atmosphere PEATE The system: Atmosphere PEATE Ingest : ING • Brings data into the Atmosphere PEATE • Supports FTP, HTTP and RSYNC Data Management System : DMS • Stores data in the form of files. • Provides a Web Service to locate, store and retrieve files. Computational Resource Grid : CRG • Provides Web Service to locate, store and retrieve jobs Algorithm : ALG • Consumes jobs • Runs algorithms in form of binaries Algorithm Rule Manager: ARM • Combines data with algorithms to produce jobs • Provides Web Service interface to locate, store and retrieve rules Space Science and Engineering Center University of Wisconsin-Madison

ING: Ingest, bring data into system Configuration File • Allows operations to add new ING: Ingest, bring data into system Configuration File • Allows operations to add new sites • Allows operations to maintain existing sites Customization allowed in form of scripts (BASH, PYTHON) • QC • Quick Look • Metadata extraction Notices missing or late data Space Science and Engineering Center University of Wisconsin-Madison

DMS: Stores Data and Products Relives Scientist of having to manage data. • Simple DMS: Stores Data and Products Relives Scientist of having to manage data. • Simple put and get functionality Configuration file • Specify fileservers and directories • Operations can Add/Remove fileservers File system - hold files Database - holds file information Public Access - DMS interface Worker - manages file system Space Science and Engineering Center University of Wisconsin-Madison

CRG : Provide nodes with jobs Provides well-defined interface deployed as a web service CRG : Provide nodes with jobs Provides well-defined interface deployed as a web service Accepts job requests Provides Job Status Monitors Job State Allows processing nodes to be added or removed from system Space Science and Engineering Center University of Wisconsin-Madison

Alg. Host: Runs software the produces products Recreates development environment • Retrieves data from Alg. Host: Runs software the produces products Recreates development environment • Retrieves data from DMS • Retrieves and runs software packages • Saves results to DMS, includes products, stdout and stderr Space Science and Engineering Center University of Wisconsin-Madison

Algorithm Script Structure Cluster executes bash script Script is passed arguments • Software Package Algorithm Script Structure Cluster executes bash script Script is passed arguments • Software Package Directory • Working / Output directory • Static Ancillary Directory • Dynamic Ancillary Directory • Inputs files • Outputs files Software Package is called from the script Results are stored by the process that started script. Space Science and Engineering Center University of Wisconsin-Madison

ARM: Bind data to software packages Provides well-defined interface deployed as a web service. ARM: Bind data to software packages Provides well-defined interface deployed as a web service. Assigns jobs to CRG Monitors data in DMS Monitors the status of jobs in CRG Production rules can be added or removed dynamically by operations Volatile logic lives here Space Science and Engineering Center University of Wisconsin-Madison

Strategies for managing processing system (revisited) Use well defined interfaces between subsystems • Decouples Strategies for managing processing system (revisited) Use well defined interfaces between subsystems • Decouples systems which reduces learning curve • Allows evolution of subsystems • Simplifies test and verification of software Create configuration driven subsystems • Simplifies deployment of subsystems • Allows operations to modify system behavior Leverage automated testing technologies • Reduces learning curve • Provides continuous test coverage • Captures requirements in executable form Space Science and Engineering Center University of Wisconsin-Madison

Development Process: Spiral method Design Implement Build = Deploy to Operations Test Space Science Development Process: Spiral method Design Implement Build = Deploy to Operations Test Space Science and Engineering Center University of Wisconsin-Madison Deploy

Testing Strategy Employ standard software industry practices • Automate with ANT, Make like, XML Testing Strategy Employ standard software industry practices • Automate with ANT, Make like, XML based • Test with JUNIT, Java Unit Test Increases system quality • Tests are reproducible • Tests are run more often than they would be if they were manual • Tests are improved over time • Tests are configurable We don’t just build, the process includes testing and verification Space Science and Engineering Center University of Wisconsin-Madison

Nightly Builds system Tests subsystems Tests scenarios Updates repositories Logs results Scenarios demonstrate requirements Nightly Builds system Tests subsystems Tests scenarios Updates repositories Logs results Scenarios demonstrate requirements Space Science and Engineering Center University of Wisconsin-Madison

Unit and Regression Testing May use internal knowledge interfaces for testing Test and exercise Unit and Regression Testing May use internal knowledge interfaces for testing Test and exercise public interfaces Stress test interfaces Evolve to test and verify bugs Fixed defects have specific tests added Tests run in nightly build Tests verify release Layered approach to testing Everything tested, Every Night Space Science and Engineering Center University of Wisconsin-Madison

Testing Scenarios (1 of 2) Test ingest function Test forward and redo functions Reflect Testing Scenarios (1 of 2) Test ingest function Test forward and redo functions Reflect CDR development process Space Science and Engineering Center University of Wisconsin-Madison

Test Scenarios (2 of 2) Documents • 3600 -0003. 080402. doc - Level 4 Test Scenarios (2 of 2) Documents • 3600 -0003. 080402. doc - Level 4 requirements • 3600 -0004. 060911. doc - Operations Concepts Test plans are implemented as scenario tests • Tests correspond to Use Cases outlined in Ops. Con • At least one test for each requirement set • Successful completion of test verifies requirements by demonstration Factors that determine success • Generation of expected products • Ability to track product heritage • Ability to reproduce results • Ability to uniquely identify products Space Science and Engineering Center University of Wisconsin-Madison

Conclusion: Climate Data Processing Is Easy Ingest system makes it easy to add and Conclusion: Climate Data Processing Is Easy Ingest system makes it easy to add and manage data sources • Operators can control system • Operator can monitor system The DMS makes it easy to maintain large data sets • Scientists can find data • Operators can add and remove servers • Operators can add and remove sites The CRG and Alg. Host make it easy to transfer CDR production the development to the cluster environment You still have to get the product correct! Space Science and Engineering Center University of Wisconsin-Madison