Скачать презентацию SGE Training NASA La RC ASDC Delivered May Скачать презентацию SGE Training NASA La RC ASDC Delivered May

0906965fd0cd09510c8bf559f46d5b5e.ppt

  • Количество слайдов: 26

SGE Training NASA La. RC ASDC Delivered May 5, 6, 7 2009 Chris Dwan SGE Training NASA La. RC ASDC Delivered May 5, 6, 7 2009 Chris Dwan Bioteam cdwan@bioteam. net

Bioteam Inc. n Independent Consulting Shop n n Staffed by: n n n Vendor/technology Bioteam Inc. n Independent Consulting Shop n n Staffed by: n n n Vendor/technology agnostic Scientists forced to learn High Performance IT Many years of industry & academic experience Our specialty: Bridging the gap between Science & IT cdwan@bioteam. net

Session Goals n n Introduce ASDC systems Detailed introduction to the IBM system Deliver Session Goals n n Introduce ASDC systems Detailed introduction to the IBM system Deliver Sun Grid Engine Training Encourage follow up cdwan@bioteam. net

Interactive / Small Group Goals n n n 1 - 2 hours 1 – Interactive / Small Group Goals n n n 1 - 2 hours 1 – 5 people Users log into systems. Users type examples, run jobs. If code is available, bring it. If specific use cases exist, bring them. cdwan@bioteam. net

Selected ASDC Systems cdwan@bioteam. net Selected ASDC Systems cdwan@bioteam. net

Selected ASDC Systems n Apple Cluster n n Magneto n n n Online and Selected ASDC Systems n Apple Cluster n n Magneto n n n Online and in use at SCF since 2007 ~40 dual processor OS X systems (80+ CPUs) Access through manila and corregidor ~28 quad core linux servers (100+ CPUs) Online and in production use since 2006 New Magneto (ORR May 15) n n Large, mixed purpose Linux cluster / file store 176 CPUs dedicated to SCF 576 CPUs dedicated to production Disk based archive: 1. 1 PB cdwan@bioteam. net

Apple Cluster n Access: n n LDAP account manila or corregidor cdwan@bioteam. net Apple Cluster n Access: n n LDAP account manila or corregidor cdwan@bioteam. net

NASA La. RC Science Directorate n n n Picture taken 9/2/08 1. 2 PB NASA La. RC Science Directorate n n n Picture taken 9/2/08 1. 2 PB usable space Fibre connected (384+ fibre ports) 2, 560 individual disk drives n 16 disks per chassis n 10 chassis per rack n 16 racks of disks IBM Linux servers, mixed P 6 and x 86 CPUs to support legacy codes Filesystem: IBM GPFS cdwan@bioteam. net

Operational Readiness Review Mid May 2009 Stay Tuned cdwan@bioteam. net Operational Readiness Review Mid May 2009 Stay Tuned cdwan@bioteam. net

cdwan@bioteam. net cdwan@bioteam. net

cdwan@bioteam. net cdwan@bioteam. net

cdwan@bioteam. net cdwan@bioteam. net

Interactive hosts n n n n bc 201: instrument 1 -blue bc 202: instrument Interactive hosts n n n n bc 201: instrument 1 -blue bc 202: instrument 2 -blue bc 203: erbe-blue bc 204: tisa 1 -blue bc 205: srb 1 -blue bc 206: srb 2 -blue bc 207: power 1 -blue bc 208: power 2 -blue bc 209: sarba-blue bc 210: consodine-blue bc 211: sofa-blue bc 212: cloudsa-blue bc 213: cloudsb-blue bc 214: inversion-blue cdwan@bioteam. net

Sun Grid Engine Technical Introduction cdwan@bioteam. net Sun Grid Engine Technical Introduction cdwan@bioteam. net

Most “grids” look like this on paper… Local Area Network Dedicated File services Portal Most “grids” look like this on paper… Local Area Network Dedicated File services Portal node(s) Private Network Compute Nodes Please do not copy, put online or redistribute info@bioteam. net

… and in reality: Please do not copy, put online or redistribute info@bioteam. net … and in reality: Please do not copy, put online or redistribute info@bioteam. net

… and in reality: Please do not copy, put online or redistribute info@bioteam. net … and in reality: Please do not copy, put online or redistribute info@bioteam. net

… and in reality: Please do not copy, put online or redistribute info@bioteam. net … and in reality: Please do not copy, put online or redistribute info@bioteam. net

Sun Grid Engine History http: //blogs. sun. com/templedf/entry/a_little_history_lesson n 1996: n n n 2000: Sun Grid Engine History http: //blogs. sun. com/templedf/entry/a_little_history_lesson n 1996: n n n 2000: n n SGE 5. 2. Sun acquires Gridware Inc. 2001: n n n Codine 4. 02 Grid Resource Director (GRD) 1. 0 SGE 5. 3. Sun releases source code Last version called GRD 2004: n SGE(EE) vs. SGE N 1 GE vs. SGE cdwan@bioteam. net

Sun Grid Engine References n http: //gridengine. sunsource. net/ n n Generally, the user Sun Grid Engine References n http: //gridengine. sunsource. net/ n n Generally, the user manuals are awful http: //gridengine. info/ n Very useful blog run by Chris Dagdigian n My slides / examples are going to be online in-house. n Deep, in house expertise. cdwan@bioteam. net

Compute Farm Logical View User 1 User N Distributed Resource Manager Cluster Network Please Compute Farm Logical View User 1 User N Distributed Resource Manager Cluster Network Please do not copy, put online or redistribute info@bioteam. net

Grid Engine does the following: n n n Accept work requests (jobs) from users Grid Engine does the following: n n n Accept work requests (jobs) from users Puts jobs in a pending area Sends jobs from the pending area to the best available machine Manages the job while it runs Returns results, logs accounting data when the job is finished Please do not copy, put online or redistribute info@bioteam. net

Huh? n What you need to know: n n n Don’t worry about queues Huh? n What you need to know: n n n Don’t worry about queues or specific machines. All you need to do when submitting a job is describe the resources your job will need to run successfully. Grid Engine will take care of the rest The ‘default’ settings are good enough for most cases Please do not copy, put online or redistribute info@bioteam. net

Most useful SGE commands n qsub / qdel n n qstat & qhost n Most useful SGE commands n qsub / qdel n n qstat & qhost n n Summary info and reports on completed job qrsh n n n Status info for queues, hosts and jobs qacct n n Submit jobs & delete jobs Get an interactive shell on a cluster node Quickly run a command on a remote host qmon n Launch the X 11 GUI interface Please do not copy, put online or redistribute info@bioteam. net

Examples cdwan@bioteam. net Examples cdwan@bioteam. net

Live Examples n n n Single job with resource requirements Job dependency Task array Live Examples n n n Single job with resource requirements Job dependency Task array job Demand a whole compute node Consumable resources cdwan@bioteam. net