- Количество слайдов: 23
Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue. edu) Preston Smith (psmith@purdue. edu) Rosen Center for Advanced Computing Purdue University
Purdue Condor Pools • Rosen Center Clusters – Condor backfills among idle nodes in PBS clusters • Provided 5. 5 million CPU-hours in 2006, all from idle nodes in clusters • Nature of Purdue pools makes for non-trivial chance of job eviction. More on this later. • Campus – Idle labs – Departments around campus
Purdue Tera. Grid All in all, 6400 CPUs available! • Use on Tera. Grid – 2. 4 million hours in 2006 spent Building a database of hypothetical zeolite structions – Solving the Football Pool Problem • Already in 2007: 5. 5 million hours allocated – 4 th largest single award in March allocations meeting • Condor provides Tera. Grid unparalleled price/cycle – Similar throughput in terms of hours serviced with Cray XT 3, Data. Star, etc. , for much less cost
Purdue Tera. Grid - Challenges • Usage reporting – Tera. Grid uploads per-job usage nightly. This proved challenging to collect with Condor – Perl scripting and data massaging to process history files and inject data into a database. – Usage reporting infrastructure (AMIE) unable to keep up with the deluge of job records. • … But that’s Tera. Grid’s issue, not Condor’s. • TG implemented temporary solution - usage reporting up to date
Purdue Teragrid - Usage Reporting • Detective Work learning how to determine accurate job time – Remote. Wall. Clock. Time - Cumulative. Suspension. Time • Not so useful for “charged time” on (such as Tera. Grid) • Require manually computing difference of completion time and last start time. • Occasional bugs – Negative walltime numbers (or really large ones) • Usually in a job that has been condor_rm’d
More on Usage Reporting • RCAC tracks job-level history. Similar history processing scripts used in campus grid as for Tera. Grid • Difficult to locate every schedd and grab history from it – Even more complicated when some schedds that we want to account for usage are under different administration. – Skate aroung with ssh-key to collect history files • We would love a centralized method to gather or record job history – Or condor_history outputting XML or GGF usage records. .
Tera. Grid Projects Prof. Keith Cherkauer (Purdue University) Hydrologic Simulations, continuing enterprise, reasonably predictable impact. I/O to CPU on the order of 50 -200 MB/hour. File system saturation a strong possibility. Prof. M. W. Deem (Rice University) Prof. D. J. Earl (University of Pittsburgh) Hypothetical Zeolite Structures Monte Carlo computation. Average time per set of ~1 hour with broad variance. I/O to Time on the order of 1 -2 MB/hour on average.
Lesson Learned Early - Cherkauer Reality of leverage. 500+ jobs at 50 -200 MB/Hour can keep a single file system very busy. The Cherkauer application was identifiably a problem for a particular parallel filesystem that shall not be named, at more than ~200 jobs in simultaneous execution (10 GB/Hour minimum). Problems were resolved by conversion to standard universe to enable longer duration and fewer jobs. • Eliminated system() calls. • Added code to locate data files per search path. Resulting code was usable under both vanilla and standard universes. Production runs presently being done in standard universe with file transfer.
Outcome - Cherkauer • Procedures developed to set up submissions allow for jobs to be queued in digestible batches. • Procedures in ‘full automatic’ mode could be used to complete an entire problem while handling remote archive of results to avoid file system issues. • Computation known to require a month or more of time now completes in less than a day.
Database of Hypothetical Zeolite Structures Prototyping: • Trial group of ~100 parameter sets was used to prototype. • Initial live data group was 6707 parameter sets. • Set processed by executing program ~100 times (cycles). • Execution of application performed by script in vanilla universe. Script allowed self checkpoint capability and duration control. Prototyping Observations: • Early delivery rates of ~7200 hours/day easily achieved. • Ultimate number of sets to process was not well known. First estimate of ~500, 000 grew to ~2, 900, 000 by 2007/02. • Eviction rates were unacceptably high (see Figure 1).
Database of Hypothetical Zeolite Structures Prototyping Observations: • Compute times per set variable from minutes to several hours. (see Figures 2 and 3). • Execution speed strongly related to compiler. Intel compilers were known in advance to produce significantly faster code.
Database of Hypothetical Zeolite Structures Adaptation Issues: • Limiting job duration to eliminate runaways and limit eviction. • Increasing small job duration to lower overhead of handling. • Preemption tolerance (self checkpoint). • Fault tolerance. • Many issues were initially addressed via execution script. Adaptation to standard universe was thought to be a must.
Database of Hypothetical Zeolite Structures Workflow: • Groups delivered via HTTP from Prof. Earl’s web site. • Sets per group ranged from ~7000 to 30, 000. • Results returned to Prof. Earl via ‘drop zone’ in archival storage for post-analysis until approximately 10/2006. Post-analysis was subsequently handled at Purdue in Condor. Processing at Purdue: • Steward procedures developed to feed jobs to Condor, monitor progress, validate results, resubmit unanticipated failure cases, and archive results for group. • Stewards were designed to process group in batches of ~2000 sets to allow processing within 6 -8 GB of volatile storage.
Database of Hypothetical Zeolite Structures Adapting for the Purdue Condor Pools • Eliminate need for execution side script to pave the way for standard universe execution. • Incorporate repetitive execution within core application. Address overhead of execution side script, multiple loads of core application, enable transition to standard universe. • Introduce self imposed timing controls. Address inability to identify runaways among 1000 s of jobs. • Embed reasonable self checkpoint capability. Address both preemption and fault tolerance. • Introduce ability to tune average job duration to Condor pool conditions. Address eviction rate problem. • Any other code work required to achieve the points above. Some memory management work was expected.
Database of Hypothetical Zeolite Structures Notes on Condor Adaptation • Written adaptation plan reviewed by all concerned parties. • Adaptation work undertaken while production continued. • Modifications to existing code plus new code ~30 routines. • Several hundred lines of non-commentary written. • Code revisions validated periodically by textual comparison of result files for 100 parameter sets from control case. • Adaptation period spanned compiler version changes. • Adapted code became production version 09/2006. • Approximately 325, 000 sets completed before production. • Code adaptation mandated changes to steward procedures
Database of Hypothetical Zeolite Structures After adapting to Condor? • Execution times became manageable (see Figures 4 and 5). • Eviction rates fell to more controllable ratios (Figure 6). • Workflow became more automatic with ability to limit job duration (and exposure to various system hiccups). Recovering loss of a few hundred ‘short’ jobs was easier than recovering loss of the same number of ‘long’ jobs. • Application could run in either of standard or vanilla universe equally well due to duration control. • Ultimate choice was to remain in vanilla universe to continue using Intel V 9 compiler suite. • Front end load due to steward procedures was reduced due to less handling of intermediate semaphore and lock files.
Figure 2 Figure 3 Figure 4 Figure 5
Database of Hypothetical Zeolite Structures Figure 6
Database of Hypothetical Zeolite Structures Project to-date: • 1. 5 million sets processed since 2006/02 including dry spells due to delays in workflow, exhaustion of allocation, and processing of renewal. • 2. 4 million hours ‘officially’ delivered to the project since 2006/02 or ~250 hours per hour excluding dry spells. • Most recent throughput delivered 96, 000 hours in 226 hour time span or ~400 processor hours per hour. • Entire collaboration continues exclusively via e-mail. • Approximately 1. 4 million sets remain.
Database of Hypothetical Zeolite Structures Miscellany: • Throughout the project, emphasis was given to designing stewards to return results to Prof. Earl in the same arrangement as they were delivered to ease post-analysis. Revision of the data structure was never seriously undertaken, nor seen to be necessary. • While getting jobs into execution was initial primary concern, bulk of work in stewards ultimately centered on automating handling of results. • Various working file systems were tried during production. The present procedures operate using volatile storage for active computation, high capacity storage for staging, and long term (tape robot) storage for archival.
Database of Hypothetical Zeolite Structures More Miscellany: • Core of steward procedures composed of less than 10 scripts. • Many additional scripts written to gather post mortem data w. r. t. job cost, fault statistics, exhaustive result validation, and summaries. • DAGs were explored as a job metering tool but deferred due to problems not well understood and demands of production. Since the workflow didn’t implicitly require DAG features, the production methods were retained until a solid reason for using DAGs could be discerned. Additionally, ‘pre’ and ‘post’ procedures were known to be an undertaking as demanding as developing the stewards. • Adaptation of the stewards to other batch systems was done more easily than expected. Batch systems for which prototypes were done included PBS, LSF, and Load. Leveler.
Required Plug Tera. Grid ‘ 07 • Right here at UW! – June 4 -8, 2 2007 • Full analysis of Zeolite application, plus other Condor work from Purdue in proceedings • Condor tutorial and demonstrations • Come join us or even help!