
d146b1ebcb28d0b6fd3b1aaace403c3f.ppt
- Количество слайдов: 61
Enabling Grids for E-scienc. E Ab initio electronic structure calculations on the Grid M. Sterzel – ACC “Cyfronet” AGH COST Training School on Molecular and Material Science Grid Applications Trieste, 30 March 2010 www. eu-egee. org EGEE-III INFSO-RI-031688 EGEE and g. Lite are registered trademarks
Outlook Enabling Grids for E-scienc. E • Aim of this talk – To demonstrate benefits of Grid computing in chemistry • Parts of the talk topics – Work on the Grid vs. cluster – comparison – Chemical Software packages available on the Grid § Parallel execution – Typical Chemical Computations on the Grid § Geometry optimizations § Numerical Frequencies § Chemical reactions – In silico Lab – Final Remarks EGEE-III INFSO-RI-031688 1
What is a/the Grid ? Enabling Grids for E-scienc. E • A Grid is NOT – The Next generation Internet – A new Operating System – Just § (a) a way to exploit unused cycles § (b) a new model of parallel computing § (c) a new model of P 2 P networking • Definition (Vaidy Sunderam) – A paradigm/infrastructure that enables the sharing, selection, & aggregation of geographically distributed resources (computers, software, data(bases), people) [share (virtualized) resources] –. . . depending on availability, capability, cost –. . . for solving large-scale problems/applications –. . . within virtual organizations [multiple administrative domains] • Grid Checklist (Ian Foster): A Grid is a system that – Coordinates resources that are not subject to centralized control – Uses standard, open, general purpose protocols and interfaces – Delivers nontrivial qualities of service EGEE-III INFSO-RI-031688 2
Authorization Enabling Grids for E-scienc. E • On a cluster - password okrucyusz: ~ sterzel$ ssh ymsterze@ui Last login: Wed Mar 5 18: 40: 24 2008 from ha 2 rtr. agh. edu. pl [ymsterze@ui ymsterze]$ • On the grid - certificate (Grid identity card) – User has to be a member of Virtual Organization (at least one) – Virtual Organization determines the resources user can use – To access grid resources user needs to obtain proxy: [ymsterze@ui ymsterze]$ voms-proxy-init -voms gaussian Your identity: /C=PL/O=GRID/O=Cyfronet/CN=Mariusz Sterzel Enter GRID pass phrase: Creating temporary proxy. . . . Done Contacting voms. cyf-kr. edu. pl: 15001[/C=PL/O=GRID/O=Cyfronet/ CN=voms. cyf-kr. edu. pl] "gaussian". . . . . Done Creating proxy. . . . . Done Your proxy is valid until Thu Mar 6 07: 20: 50 2008 EGEE-III INFSO-RI-031688 3
EGEE Enabling Grids for E-scienc. E Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences … EGEE-III INFSO-RI-031688 >250 sites 48 countries >150, 000 CPUs >50 Peta. Bytes >15, 000 users >150 VOs >150, 000 jobs/day Mariusz Sterzel CGW'08 Kraków, 13 October 2008 4 4
Job file Enabling Grids for E-scienc. E • Grid JDL file Executable = Arguments = Job. Type = Node. Number = Std. Output = Std. Error = Input. Sandbox = Output. Sandbox= "/bin/bash"; $VO_GAUSSIAN_SW_DIR/g 03/gaussian. x water. com"; "MPICH"; • PBS file 4; "water. out"; #!/bin/bash "water. err"; #PBS -l ncpus=4 {"water. com"}; #PBS -q long {"water. out", #PBS -o water. out "water. log", #PBS -e water. err "water. err" #PBS -N my_job_name }; #PBS -M my@email #PBS -m e export g 03 root=/somewhere. $g 03 root/g 03/bsd/g 03. profile $g 03 root/g 03 water. com EGEE-III INFSO-RI-031688 5
Job management Enabling Grids for E-scienc. E • PBS – qsub – qdel – qstat – submit job to a queue – delete job from a queue – show job status in a queue • EGEE Grid – – glite-wms-job-submit – submit job to the Grid glite-wms-job-delete – remove job from the Grid glite-wms-job-status – show status of the job on the Grid glite-wms-job-output – retrieve job files from the Grid … just a few new commands… EGEE-III INFSO-RI-031688 6
Chemical software Enabling Grids for E-scienc. E • Freely available packages on EGEE Grid: – – – – GAMESS DALTON CPMD Newton X DL_POLY NAMD RWAVEP – – – – GROMACS Autodock Tinker Solvate PIC-DMSC MCGBgrid QMC – – – – ABCtraj VENUS CRBS LM COLUMBUS DINX Abinit • Commercial packages on EGEE Grid: – Gaussian – Turbomole – Wien 2 k EGEE-III INFSO-RI-031688 7
Gaussian VO Enabling Grids for E-scienc. E • Why Gausian? – – – Large number of computational methods implemented One of the first ab initio codes The most popular among communities User friendly Available for many platforms along with GUI • Gaussian VO – – Invented and operated by ACC CYFRONET All license issues confirmed with Gaussian Inc, Open for every EGEE user Any computing centre with site Gaussian license may support it (4 supporting centres, another 3 in the line) – 30+ users since the start in September 2006 – VO manager – Mariusz Sterzel (m. sterzel@cyfronet. pl) – Enabled for parallel execution up to 8 processors EGEE-III INFSO-RI-031688 8
Turbomole – an alternative Enabling Grids for E-scienc. E Advantages: – – – Probably the fastest B 3 LYP implementation Analytical gradients for excited states at DFT and CC 2 levels Variety of fitting approaches speeding up calculations Very well scalability during parallel execution Extremely fast and very well parallelised (ri)CC 2 and (ri)MP 2 Disadvantages: – – Limited number of DFT functionals (only “good” ones available) Lack of parallel version of analytical second derivatives Lack of parallel version of TDDFT Only NMR chemical shifts implemented, no spin-spin couplings EGEE-III INFSO-RI-031688 9
Gaussian VO – participation Enabling Grids for E-scienc. E As a user: • Register at: https: //voms. cyf-kr. edu. pl: 8443/voms/gaussian • Wait for VOMS admin acceptance • voms-proxy-init --voms gaussian and you are ready to use the program. . . As a participating centre: • • Just sent an e-mail concerning participation to VO manager After confirmation of the license status at your centre with Gaussian Inc, detailed information concerning set-up will be sent back to you More details at: http: //egee. grid. cyfronet. pl/Applications/gaussian-vo/ EGEE-III INFSO-RI-031688 10
Grid and parallel execution Enabling Grids for E-scienc. E Past: – Serial jobs only – Job of MPICH type always enforced execution of mpirun Present – mpirun no longer enforced – Instead a wrapper script mpistart can be executed which will automatically set up environment for required MPI flavour – No possibility to request desired # of processors on a WN … Unfortunately not all sites are set up… Current work – Support for Open. MP jobs EGEE-III INFSO-RI-031688 11
Chemical software Enabling Grids for E-scienc. E • “Old codes” – mostly written in FORTRAN • Serial – parallel execution added later (with exceptions) • Different parallelization models used • Low scalability in many cases • Only selected computational methods parallelized … all that makes parallel grid ports of chemical software even more complicated EGEE-III INFSO-RI-031688 12
Selected cases Enabling Grids for E-scienc. E • Gaussian – Parallelization via Open. MP or Linda – Open. MP – SMP machines or multiprocessor/core clusters up to # of processors/cores on WN – Linda – allows the parallel execution between nodes. Requires equal # of processors for each WN – For Linda additional expenses required (commercial package), available only for specific platforms • Turbomole – Uses MPI – currently HPMPI – No specific requirements • GAMESS – Uses sockets but MPI execution possible (a little slower but more convenient for execution on a grid) • ADF – Uses MPI (MPICH, Open. MPI, HPMPI, …) – One of the best parallelized QC codes // to my knowledge ; -) EGEE-III INFSO-RI-031688 13
Grid solution Enabling Grids for E-scienc. E • Gaussian – Parallel execution via Open. MP on a single WN up to # of processors/cores available on that worker node – Necessarily queue system set-up requires a Site admin help – Torque set-up: § Modification of /var/spool/pbs/torque. cfg to: SUBMITFILTER /var/spool/pbs/submit_filter. pl – Other settings -- typical § Job has to be of MPICH type § # of processors controlled via Node. Number variable § Gaussian %Nproc route is automatically set-up by script executing Gaussian – Execution with 8 processors per job possible. EGEE-III INFSO-RI-031688 14
Sample script Enabling Grids for E-scienc. E Executable Arguments Job. Type Node. Number Input. Sandbox Std. Out Std. Err Output. Sandbox Requirements EGEE-III INFSO-RI-031688 = = = = “/bin/sh”; “$GAUSSIAN_SW_DIR/gaussian. run myfile. com”; “MPICH”; 8; {“myfile. com”}; “myfile. out”; “myfile. err”; {“myfile. log”, “myfile. chk”, “myfile. out”, “myfile. err”}; = other. Glue. CEUnique. ID==“ce. cyf-kr. edu. pl” 15
Grid Solution cont. Enabling Grids for E-scienc. E • Turbomole – No special set up except shared directory needed, # of processors automatically discovered by Turbomole scripts • NAMD – Similar to Turbomole. If the NAMD executing script was set-up properly during installation the necessarily “node file” is created every time program is executed • GAMESS – Depends on Grid port – In case of MPI no additional input needed – DDI case – may require WN reconfiguration especially if large DDI memory is requested by a job EGEE-III INFSO-RI-031688 16
PL-Grid – Polish Grid Infradtructure Enabling Grids for E-scienc. E • In addition to above: – – – – – CPMD Cfour ACES III Amber OOMMF ADF Amber Cadence Fluent. . . any aplication needed by users. . . EGEE-III INFSO-RI-031688 17
Execution benchmarks Enabling Grids for E-scienc. E Scheduling time – MPI jobs § 4 proc. /job -- usually less than hour § 8 proc. /job -- waiting time even 3 -4 hour – Open. MP jobs § Job waiting time much longer, heavily depends on site overload • 4 proc. /job -- from less than hour up to 6 hours • 8 proc. /job -- in some cases job waiting time exceeds 12 hours Parallel job execution can be inefficient in case of short (less than 24 h) jobs EGEE-III INFSO-RI-031688 18
Grid application in chemistry Enabling Grids for E-scienc. E • Tasks to which Grid can be applied directly: – – Conformational analysis Numerical frequency computations Zero Point Vibrational Averaging Property computations for series of geometries from Molecular Dynamics simulation – Determination of chemical reaction paths – Determination of potential energy surfaces (PES) … all kind of “brute force” tasks, or tasks which operate on huge data sets • Other tasks – Computations need to to be planned in order to maximize benefits from the grid computing EGEE-III INFSO-RI-031688 19
An example Enabling Grids for E-scienc. E Geometry optimization: Start – Steep potential § Few steps needed Stop – Flat potential § Many steps needed § Energy and gradient convergence have to be increased to high values EGEE-III INFSO-RI-031688 20
PCP complex Enabling Grids for E-scienc. E EGEE-III INFSO-RI-031688 21
PCP cont. Enabling Grids for E-scienc. E EGEE-III INFSO-RI-031688 22
PCP cont. Enabling Grids for E-scienc. E EGEE-III INFSO-RI-031688 23
PCP cont. Enabling Grids for E-scienc. E EGEE-III INFSO-RI-031688 24
PCP cont. Enabling Grids for E-scienc. E EGEE-III INFSO-RI-031688 25
PCP cont. Enabling Grids for E-scienc. E EGEE-III INFSO-RI-031688 26
PCP cont. Enabling Grids for E-scienc. E EGEE-III INFSO-RI-031688 27
Plan Enabling Grids for E-scienc. E • Computations of the whole molecule are not possible • System needs to be modeled – For this we need: § Chlorophyll A and peridinin ground and excited states geometries and normal modes § The geometry of whole complex § A little programming (tetramer model, energy transfer model) EGEE-III INFSO-RI-031688 28
Plan Enabling Grids for E-scienc. E • Computations of the whole molecule are not possible • System needs to be modeled – For this we need: § Chlorophyll A and peridinin ground and excited states geometries and normal modes § The geometry of whole complex § A little programming (tetramer model, energy transfer model) … a lot of luck EGEE-III INFSO-RI-031688 29
First step – Peridinin Enabling Grids for E-scienc. E EGEE-III INFSO-RI-031688 30
First step – Peridinin Enabling Grids for E-scienc. E EGEE-III INFSO-RI-031688 31
Peridinin geometry Enabling Grids for E-scienc. E • A long peridinin chain makes usual gradient based optimization inefficient • Instead we propose following scheme: – MD for peridinin (force field level) – Geometry preoptimization for series of MD snapshots (semi empirical or ~ RHF/STO-3 G level) – “Final” geometry optimization for few lowest energy MD snapshots (CASSCF/PT 2 level) – Verification of the minima by frequency calculations – Excited state geometry optimization with ground state as a starting point – Again, minima verification via vibrational analysis – Verification of obtained data by comparison with experiment EGEE-III INFSO-RI-031688 32
Chemical reactions Enabling Grids for E-scienc. E . . . a process that results interconversion of molecules H—CN EGEE-III INFSO-RI-031688 CN—H 33
Chemical reactions Enabling Grids for E-scienc. E Points of interest: – Structure of substrates and products – Structure of active complex at TS – Reaction path EGEE-III INFSO-RI-031688 34
Chemical reactions Enabling Grids for E-scienc. E Points of interest: – Structure of substrates and products – Structure of active complex at TS – Reaction path(s) Transition State (TS) Energy substrate(s) EGEE-III INFSO-RI-031688 product(s) Reaction path 35
Chemical reactions Enabling Grids for E-scienc. E Energy TS 1 TS 2 A+B C Intermediate product substrate(s) D product(s) E(TS 1) > E(TS 2) Reaction path EGEE-III INFSO-RI-031688 36
Chemical reactions Enabling Grids for E-scienc. E Energy Possible reaction paths; More intermediate products Reaction path EGEE-III INFSO-RI-031688 37
Chemical reactions Enabling Grids for E-scienc. E Main problem – TS determination EGEE-III INFSO-RI-031688 38
Chemical reactions Enabling Grids for E-scienc. E Main problem – TS determination Artur Michalak EGEE-III INFSO-RI-031688 39
Chemical reactions Enabling Grids for E-scienc. E Main problem – TS determination EGEE-III INFSO-RI-031688 40
Chemical reactions Enabling Grids for E-scienc. E Main problem – TS determination E ‘reaction coordinate’ EGEE-III INFSO-RI-031688 41
Chemical reactions Enabling Grids for E-scienc. E Main problem – TS determination E ‘reaction coordinate’ EGEE-III INFSO-RI-031688 42
Chemical reactions Enabling Grids for E-scienc. E TS verification – vibrational analysis – one imaginary frequency! i 1203 cm-1 EGEE-III INFSO-RI-031688 43
Sample calculations Enabling Grids for E-scienc. E N 2 O braking on oxide surfaces – possible mechanisms e- transfer • Electron transfer 1. N 2 O + X 4. X+--N 2 O- X+--(O-) + N 2 Filip Zasada EGEE-III INFSO-RI-031688 2. N 2 O(s) + X X+--N 2 O- 5. (O-)--X+--(O-) 6. (O-)--X+--(O-) X + O 2 44
Sample calculations Enabling Grids for E-scienc. E N 2 O braking on oxide surfaces – possible mechanisms • An Oxygen transfer 1. N 2 O(g) N 2 O(s) 4. O-X-O EGEE-III INFSO-RI-031688 2. N 2 O(s) + X X--O--N 2 5. O--O-X 4. X-O 6. X + O 2 45
Sample calculations Enabling Grids for E-scienc. E Oxide slab Adsorbed N 2 O Vacuum Matrix generation Oxide slab Slab construction EGEE-III INFSO-RI-031688 46
Sample calculations Enabling Grids for E-scienc. E • Reaction paths: EGEE-III INFSO-RI-031688 47
Sample calculations Enabling Grids for E-scienc. E • Reaction paths: EGEE-III INFSO-RI-031688 48
Sample calculations Enabling Grids for E-scienc. E Computational details: – Gaussian 03 D. 01 – BP 86 functional – Basis set of double-ζ quality Timings: – CPU time for SP calculation – approx 7 hours – 15 paths – 10 -50 energy points on each path – In total about 250 energy points calculated EGEE-III INFSO-RI-031688 49
Conformational searchers Enabling Grids for E-scienc. E • To determine lowest energy structure (1 R, 2 R)-1, 2 -bis-(N’-cyklohexy-1’, 4’, 5’, 8’naphthalenetetracarboxydiimide)cyklohexane • Possible issues – For short time jobs it is convenient to group several geometries in to one job to minimize average job scheduling time EGEE-III INFSO-RI-031688 50
Conformational searches Enabling Grids for E-scienc. E EGEE-III INFSO-RI-031688 51
Harmonic frequencies Enabling Grids for E-scienc. E Numerical frequency computations for lycopene – – – 96 atoms, 2· 3· 96+1=577 independent computation steps Software: GAMESS version June 2005 VOCE VO resources used Methodology: B 3 LYP, cc-p. VDZ basis set Computations done on approx. 100 processors (ia 64 and i 386) CPU time for single computation: § Intel Xeon 2. 8 Gh - 14 h 39’ § Intel Itanium 2 1. 3 Gh - 12 h 8’ – Total time: § single CPU - 330 days (estimated) § EGEE Grid - 3 days EGEE-III INFSO-RI-031688 52
Enabling Grids for E-scienc. E • Computational chemists use first principles chemistry packages available on the Grid (Gaussian, GAMESS, TURBOMOLE) • Grid is still mainly available through command line interfaces (voms-proxyinit, glite-wms-job-submit, glite-wmsjob-status) • Adoption to the Grid environment by non-experts is difficult • Lets build a web-based problem solving environment facilitating the use of Grid … but not only that EGEE-III INFSO-RI-031688 In silico Lab
State-of-the-art Enabling Grids for E-scienc. E • • Web. MO – Desktop and Web access – Supports main computational chemistry applications – Does not support Grid or local queues infrastructures ECCE – Extensible Computational Chemistry Environment – Only desktop access – Supports main computational chemistry applications – Supports many queue management systems (PBS, LSF, NQE, etc. ) – Does not use Grid infrastructures CCG – Computational Chemistry Grid – A virtual organization which is a part of Teragrid – Grid. Chem – Java client which can be run as a Web. Start application P-GRADE – Framework for Development and Execution of Parallel Applications – A Grid-oriented solution EGEE-III INFSO-RI-031688 54
Description of the solution requirements Enabling Grids for E-scienc. E • • Supports main chemistry packages – Gaussian – GAMESS – TURBOMOLE –. . . Is accessible through a web interface – All Grid-related operations should be embedded (proxy generation, job submission and status monitoring, LFC catalog operation) – Persists information about executed jobs between web sessions Enables user-centric processing rather than grid-centric – It should not be yet another Grid job submission tool – Supports inter-application geometry passing – Provides automated report summaries – Covers Grid complexity Supports annotations through user free-text tagging EGEE-III INFSO-RI-031688 55
Description of the solution – architecture (1/2) Enabling Grids for E-scienc. E • Uses g. Lite for Grid job management – A set of Grid APIs is used (LFC, WMS, VOMS) • Job submission and monitoring implemented as a separate layer for better error handling – Grid. Space platform used – Each application backed by a separate script • Application model realized by using SINT (Semantic Integration Tool) – Basic concepts such as geometry, basic and detailed reports, input parameters, annotations are modeled • Interactive graphical user interfaces are used – GWT (Google Web Toolkit) is used as the user front-end technology EGEE-III INFSO-RI-031688 56
Description of the solution – architecture (2/2) Enabling Grids for E-scienc. E EGEE-III INFSO-RI-031688 57
Enabling Grids for E-scienc. E Demo Switching to demo. . . EGEE-III INFSO-RI-031688 58
Enabling Grids for E-scienc. E References T. Gubala, M. Bubak, P. M. A. Sloot: Semantic Integration of Collaborative Research Environments, In: M. Cannataro (Ed. ) Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare, Information Science Reference, 2009, IGI Global Marian Bubak et al. , : Virtual Laboratory for Collaborative Applications, In: M. Cannataro (Ed. ) Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare, Information Science Reference, 2009, IGI Global EGEE-III INFSO-RI-031688 59
Summary Enabling Grids for E-scienc. E • Access to the Grid is easy, does not differ to much from queue system usage • EGEE Grid offers variety of software packages for chemical computations. A parallel execution can be made as simple for the user as a serial execution is now • It is always possible to find solution for parallel execution even if computational platform does not directly support certain parallelization model • With a little of planning every computational chemistry task may benefit from the Grid platform EGEE-III INFSO-RI-031688 60
d146b1ebcb28d0b6fd3b1aaace403c3f.ppt