Sci DAC Progress Report Algorithms and Parallel Methods

Скачать презентацию Sci DAC Progress Report Algorithms and Parallel Methods

8cc466f2aff905990904a6e262b85af4.ppt

Количество слайдов: 33

Sci. DAC Progress Report: Algorithms and Parallel Methods for Reactive Atomistic Simulations 05. 07. 2009

Project Accomplishments Novel algorithms (solvers, data structures) for reactive simulations Comprehensive validation Parallel formulation, implementation, performance characterization and optimization Software release over public domain Incorporation of solvers into LAMMPS

Project Accomplishments: Algorithms and Data Structures Optimal dynamic data structures for 2 -, 3 -, and 4 -body interactions Krylov subspace solvers for Charge Equilibriation Effective preconditioners (Block Jacobi) Reusing subspaces, selective orthogonalization Effective initialization strategies

Project Accomplishments: Comprehensive Validation In-house validation on Bulk water Silica Other hydrocarbons (hexane, cyclohexane) Collaborative validation on a number of other systems (please see software release)

Project Accomplishments: Parallel Implementation Highly optimized parallel formulation validated on bgl (BG/L), Jaguar (XT 4), and Ranger (Sun), among others. Optimizations to other platforms under way. Parallel code in limited release (to Purdue, MIT, and NIST).

Project Accomplishments: Software Release Code Release (limited public release) Purdue (Strachan et al. , Si/Ge/Si Nanorods) Cal. Tech (Goddard et al. , Force field development) MIT (Buehler et al. , Silica/water) PSU (van Duin et al. , Force field development) USF (Pandit et al. , Silica/water interface) UIUC (Aluru et al. ) Sandia (Thompson, LAMMPS development) Norwegian Institute for Science and Technology (IBM/AIX optimization)

Project Accomplishments: LAMMPS Development Charge equilibriation implemented as Fix to LAMMPS Fully validated for accuracy and performance Preliminary implementation of Reax. FF into LAMMPS Student at Sandia to complete implementation over summer

Project Accomplishments: Details • The dominant computational cost is associated with the following force field computations – Bonded potential – Non-bonded potential – Neighbor potential – Charge equilibriation (q. Eq)

Project Accomplishments: Details • Bonded, non-bonded, and neighbor potentials require efficient (dynamic) data structures. Their computation is also typically highly optimized through lookups. • Charge equilibriation minimizes electrostatic energy to compute partial charges on atoms. This can be linearized and solved at each timestep using iterative solvers such as CG and GMRES.

Accurate Charge Equilibriation is Essential to Modeling Fidelity

Computational Cost of Charge Equilibriation • Charge equilibriation dominates overall cost for required (low error tolerance) and for larger systems. • Efficient solvers for charge equilibriation are critical.

Algorithms for Charge Equilibriation • At required tolerances and for larger systems (106 atoms and beyond), charge equilibriation can take over 75% of total simulation time. • Efficient algorithms for solving the linear system are essential. • We implement a number of techniques to accelerate the solve: – Effective preconditioners (nested, Block Jacobi) – Reuse of Krylov subspaces (solution spaces are not likely to change significantly across timesteps) – Selective reorthogonalization (orthogonalization is the major bottleneck for scalability) – Initial estimates through higher order extrapolation.

Algorithms for Charge Equilibriation • Accelerating GMRES/CG for charge equilibriation – The kernel for the matrix is shielded electrostatics – The electrostatics is cutoff, typically at 7 – 10 A – An implicit Block Jacobi accelerator can be constructed from a near-block (say 4 A neighborhood) – The inverse block can be explicitly computed and reused – Alternately, an inner-outer scheme successively increases cutoff and uses the shorter cutoff to precondition the outer, longer cutoff – Both schemes implemented in parallel and show excellent scaling. Relative performance is system dependent. o o

Serial and Parallel Performance

Single Processor Performance Profiling • Memory usage and runtimes (NVE water, 648, 6540, 13080, 26160 atoms). • Relative cost of various phases

Single Processor Performance Profiling • Our code is extremely efficient/optimized. In comparison to traditional (non-reactive) MD simulations (Gromacs), our code was only 3 x slower (tested on water and hexane) • Our code has a very low memory footprint. This is essential since it allows us to scale problems to larger instances, facilitating scalability to large machine configurations

Parallel Performance • A number of optimizations have been implemented – Trading off redundant computations for messages – Efficient use of shadow domains and the midpoint method for minimizing redundant computations – Reducing number of orthogonalizations in charge equilibriation – Platform-specific optimizations

Parallel Performance • Performance characterized primarily on two platforms – Code achieved 81% efficiency on 1024 cores of ranger at approximately 6100 atoms/core (1. 9 s/timestep for a 6. 2 M atom system) – Code achieved 77% efficiency on 8 K cores of a BG/L at approximately 600 atoms/core (1. 1 s/timestep for a 4. 8 M atom system)

Ongoing Work Near Term (12 months) – Integrating out reactive atomistic framework into LAMMPS (Graduate student Metin Aktulga spending summer with Aidan Thompson and Steve Plimpton at Sandia) – Parallelizing the GMRES q. Eq fix to LAMMPS – Sampling techniques force-field optimization

Ongoing Work Medium to Long Term (24 -36 months) – Advanced accelerators for q. Eq (multipole-type hierarchical preconditioners) – Platform-specific optimizations (Tesla/GPU, Road. Runner) – Supporting hybrid force-fields (reactive and nonreactive force fields) – Novel solvers, in particular, SPIKE-based techniques

Additional Material

Charge Equilibration (QEq) Method • Expand electrostatic energy as a Taylor series in charge around neutral charge. • Identify the term linear in charge as electronegativity of the atom and the quadratic term as electrostatic potential and self energy. • Using these, solve for self-term of partial derivative of electrostatic energy.

Qeq Method We need to minimize: where subject to:

Qeq Method

Qeq Method From charge neutrality, we get:

Qeq Method Let where or

Qeq Method Substituting back, we get: We need to solve 2 n equations with kernel H for si and ti.

Qeq Method Observations: H is dense. The diagonal term is Ji The shielding term is short-range Long range behavior of the kernel is 1/r

Validation: Water System

Hexane (@200 K) and cyclohexane (@300 K) - liquid phase ~10000 atoms randomly placed around lattice points in a cube NVT (@200 K for hexane, @300 K for cyclohexane), cube is shrunk by 1 A on each side after every 7500 steps another way to measure density.

Validation: Silica-Water