eedfd9a0aa6559ce601c2e07d0cb5001.ppt
- Количество слайдов: 23
NPP Atmosphere PEATE Climate Data Processing Made Easy Scott Mindock Atmosphere PEATE Team Space Science and Engineering Center University of Wisconsin-Madison 10 July 2008 Space Science and Engineering Center University of Wisconsin-Madison
Space Science Engineering Center (SSEC) The NPP Atmosphere PEATE is implemented within the framework and facilities of the Space Science and Engineering Center (SSEC) at the University of Wisconsin-Madison. SSEC has been successfully supporting operational, satellite-based remotesensing missions since 1967, and its capabilities continue to evolve and expand to meet the demands and challenges of future missions. 1. Employs ~ 250 scientists, engineers, programmers, administrators and IT support staff. 2. Satellite missions currently supported: GEO: GOES 10/11/12/R; Meteosat 7/9; MTAT-1 R; FY 2 C/2 D; Kalpana LEO: NOAA 15/16/17/18, Terra, Aqua, NPP, NPOESS, FY 3, Met. Op Space Science and Engineering Center University of Wisconsin-Madison
Funding and Related Work Atmosphere PEATE is funded under NASA Grant NNG 05 GN 47 A • Award Date: 10/07/2005 • Grant Period: 08/15/2005 to 8/14/2008 (renewal in progress) Related Work at SSEC: • Cr. IS SDR Cal/Val and Characterization (Revercomb, IPO) • VIIRS SDR and Cloud Cal/Val (Menzel, IPO) • VIIRS Algorithm Assessment (Heidinger, IPO) • International Polar Orbiter Processing Package (Huang, IPO) • VIIRS Instrument Characterization (Moeller, NASA) Space Science and Engineering Center University of Wisconsin-Madison
Creating Climate Data Products (CDR) is hard! Products track global trends • Calibration must be accurate. (No calibration artifacts) • Algorithms must be fully verified with global data (No regional artifacts) Data sets are large and hard to manage Developing the CDRs is an iterative process Large processing clusters are required • Programming requires different skill set • Distributed systems hard to test On going process • Requirements change • Technology changes • Staff changes Space Science and Engineering Center University of Wisconsin-Madison
The process requires multiple computing systems Single machine can be used for initial development but cluster computing needed to verify performance over full globe. Space Science and Engineering Center University of Wisconsin-Madison
CDR development is an iterative process Initial development occurs on single machine Product verification requires data sets of increasing size Increasing data set size increase computation time Space Science and Engineering Center University of Wisconsin-Madison
Strategies of processing simplification Reduce or remove the “Move to Cluster” step • Make executions environments similar • Make data access patterns similar Results in faster iterations Space Science and Engineering Center University of Wisconsin-Madison
Strategies for managing processing system Use well defined interfaces between subsystems • Decouples systems which reduces learning curve • Allows evolution of subsystems • Simplifies test and verification of software Create configuration driven subsystems • Simplifies deployment of subsystems • Allows operations to modify system behavior Leverage automated testing technologies • Reduces learning curve • Provides continuous test coverage • Captures requirements in executable form Space Science and Engineering Center University of Wisconsin-Madison
The system: Atmosphere PEATE Ingest : ING • Brings data into the Atmosphere PEATE • Supports FTP, HTTP and RSYNC Data Management System : DMS • Stores data in the form of files. • Provides a Web Service to locate, store and retrieve files. Computational Resource Grid : CRG • Provides Web Service to locate, store and retrieve jobs Algorithm : ALG • Consumes jobs • Runs algorithms in form of binaries Algorithm Rule Manager: ARM • Combines data with algorithms to produce jobs • Provides Web Service interface to locate, store and retrieve rules Space Science and Engineering Center University of Wisconsin-Madison
ING: Ingest, bring data into system Configuration File • Allows operations to add new sites • Allows operations to maintain existing sites Customization allowed in form of scripts (BASH, PYTHON) • QC • Quick Look • Metadata extraction Notices missing or late data Space Science and Engineering Center University of Wisconsin-Madison
DMS: Stores Data and Products Relives Scientist of having to manage data. • Simple put and get functionality Configuration file • Specify fileservers and directories • Operations can Add/Remove fileservers File system - hold files Database - holds file information Public Access - DMS interface Worker - manages file system Space Science and Engineering Center University of Wisconsin-Madison
CRG : Provide nodes with jobs Provides well-defined interface deployed as a web service Accepts job requests Provides Job Status Monitors Job State Allows processing nodes to be added or removed from system Space Science and Engineering Center University of Wisconsin-Madison
Alg. Host: Runs software the produces products Recreates development environment • Retrieves data from DMS • Retrieves and runs software packages • Saves results to DMS, includes products, stdout and stderr Space Science and Engineering Center University of Wisconsin-Madison
Algorithm Script Structure Cluster executes bash script Script is passed arguments • Software Package Directory • Working / Output directory • Static Ancillary Directory • Dynamic Ancillary Directory • Inputs files • Outputs files Software Package is called from the script Results are stored by the process that started script. Space Science and Engineering Center University of Wisconsin-Madison
ARM: Bind data to software packages Provides well-defined interface deployed as a web service. Assigns jobs to CRG Monitors data in DMS Monitors the status of jobs in CRG Production rules can be added or removed dynamically by operations Volatile logic lives here Space Science and Engineering Center University of Wisconsin-Madison
Strategies for managing processing system (revisited) Use well defined interfaces between subsystems • Decouples systems which reduces learning curve • Allows evolution of subsystems • Simplifies test and verification of software Create configuration driven subsystems • Simplifies deployment of subsystems • Allows operations to modify system behavior Leverage automated testing technologies • Reduces learning curve • Provides continuous test coverage • Captures requirements in executable form Space Science and Engineering Center University of Wisconsin-Madison
Development Process: Spiral method Design Implement Build = Deploy to Operations Test Space Science and Engineering Center University of Wisconsin-Madison Deploy
Testing Strategy Employ standard software industry practices • Automate with ANT, Make like, XML based • Test with JUNIT, Java Unit Test Increases system quality • Tests are reproducible • Tests are run more often than they would be if they were manual • Tests are improved over time • Tests are configurable We don’t just build, the process includes testing and verification Space Science and Engineering Center University of Wisconsin-Madison
Nightly Builds system Tests subsystems Tests scenarios Updates repositories Logs results Scenarios demonstrate requirements Space Science and Engineering Center University of Wisconsin-Madison
Unit and Regression Testing May use internal knowledge interfaces for testing Test and exercise public interfaces Stress test interfaces Evolve to test and verify bugs Fixed defects have specific tests added Tests run in nightly build Tests verify release Layered approach to testing Everything tested, Every Night Space Science and Engineering Center University of Wisconsin-Madison
Testing Scenarios (1 of 2) Test ingest function Test forward and redo functions Reflect CDR development process Space Science and Engineering Center University of Wisconsin-Madison
Test Scenarios (2 of 2) Documents • 3600 -0003. 080402. doc - Level 4 requirements • 3600 -0004. 060911. doc - Operations Concepts Test plans are implemented as scenario tests • Tests correspond to Use Cases outlined in Ops. Con • At least one test for each requirement set • Successful completion of test verifies requirements by demonstration Factors that determine success • Generation of expected products • Ability to track product heritage • Ability to reproduce results • Ability to uniquely identify products Space Science and Engineering Center University of Wisconsin-Madison
Conclusion: Climate Data Processing Is Easy Ingest system makes it easy to add and manage data sources • Operators can control system • Operator can monitor system The DMS makes it easy to maintain large data sets • Scientists can find data • Operators can add and remove servers • Operators can add and remove sites The CRG and Alg. Host make it easy to transfer CDR production the development to the cluster environment You still have to get the product correct! Space Science and Engineering Center University of Wisconsin-Madison