7dd1b92bd30a82975ea7c5f8113ad003.ppt
- Количество слайдов: 4
Climate-SDM (1) • Climate analysis use case – Described by: Marcia Branstetter • Use case description – Data obtained from ESG – Using a sequence steps in analysis, each running scripts in Ferret, CDAT, matlab, …, etc. – Need to run the same sequence of steps over many files • sometimes changing the scripts • Sometimes adding/removing a step • Problem: Need workflow to run and track analysis process • Need to collect provenance • Provenance should be rich enough to have another person run the same analysis • Analysis scripts can be using various codes, such as Ferret, CDAT, matlab • Need to keep audit trail, and interaction with external tools • Task: workflow of steps of software versions, scripts, input files, etc. • Goal: repeatedly running workflows to be constructed. Each workflow run will write into a database a record of it, so anyone can reproduce the results or add to that, not necessarily on the same machine. – Tools to be used • Kepler – for composing workflow, and writing provenance to database • Vistrails – for keeping track of evolution of workflows and associated provenance data
Climate-SDM (2) • Scaling analysis process – Described by Marcia Branstetter • Use case description – – Need to analyze 6 -hourly data over 100 year for atmosphere component In T 85 grid resolution – total volume is in 10 -100 TB range, Data resides on HPSS, order of 12, 00 files, a few GBs each Few TBs for limited number of variables needed in the analysis • Problem: extracting one or a few of the variables from HPSS • Can this process be automated? – Task (longer term): automate process using workflow tools • Problem: Parallelize analysis of large data – Task: use parallel statistics tools • Goal: use Parallel R for such jobs • Task already in progress
Climate-SDM (3) • Earth System Grid – Described by: Dean Williams and Don Middleton • Use case description – 2 modes of getting data to users • Sets of files (using Data. Mover-Lite (DML)) • Using tools that perform aggregation on server side (Open. Dap, CDAT, GRADS, LAS) – Currently only simple statistics needed on server side – Aggregation – hiding file structures on gateway searches is essential – Future needs as data scales • composite product across multiple data nodes • aggregation over multiple data nodes • Compare model runs from different sites – Tracking of precise provenance of how data was generated is needed • Task: using Pnet. CDF – CCSM 4 on top of Pnet. CDF (already taking place) – net. CDF 4 has a new extended features – may require similar feature supported in Pnet. CDF – Pnet. CDF for post-processing (users still to be identified) – Other I/O bound groups?
Climate-SDM (4) • Earth System Grid (cont’d) – Described by: Dean Williams and Don Middleton • Tasks: improve DML + SRM – Improve DML interface – Use of Grid. FTP-ssh in DML to speed transfers to client – Explore use of Grid. FTP-ssh for SRMs • Potential task: Value-based searches – – – Very Large communities performing impact studies New community yet to be introduced to ESG E. g. No of days of temp > 120 F in some region Currently they use GIS tools on highly summarized data Potential for need to perform value-based searched at server side as data scales • Potential task: compare simulated to observed data – Currently, ARM data is being converted to be CF (Climate and Forecast) convention compliant in order to be added to ESG holdings – Need to move data to a single site for comparison will require large scale automated data movement