Task-Farming Parallelisation of Py-ChemShell for Nanomaterials
eCSE08-014Key Personnel
PI/Co-Is:Thomas Keal, Paul Sherwood - STFC Daresbury; Alexey Sokol, Richard Catlow - University College London
Technical: You Lu - STFC Daresbury, Matthew Farrow - University College London
Relevant Documents
eCSE Technical Report: Task-Farming Parallelisation of Py-ChemShell for Nanomaterials: an ARCHER eCSE Project
Project summary
ChemShell is a software package focusing on the method of combining quantum mechanical and molecular mechanical (QM/MM) calculations for the simulation of chemical reactions in complex systems. QM/MM is a state of the art approach to computational chemistry which was acknowledged in the awarding of the 2013 Nobel Prize to its pioneers. This project aims to parallelise QM/MM calculations in the newly-developed version of ChemShell written in the Python programming language. During the project we have successfully completed the code development for performing "task-farmed" parallel calculations which allows us to model very large scale chemical systems. We have carried out benchmark calculations to demonstrate that the computational efficiency can be greatly improved using our implemented code. For example, an MgO cluster of many thousands of atoms (magnesium oxide, a very common compound also used for relief of heartburn and stomach pain) computed with the QM/MM method has been sped up by 5.7 times without employing any more computational resources. We have also probed the maximum size of nanoparticles that we can feasibly calculate with the new code on the ARCHER supercomputer. Thanks to the work carried out in this project, we are now able to simulate over 160,000 atoms of ZrO2 (zirconium dioxide, please see Figure A below) - an inorganic material widely used in ceramics that has even been patented by Apple Inc. for use in millions of their portable devices.
The resulting implementation will enable researchers to address scientific problems that are beyond the reach of existing software, for example, studies of reactions catalysed by realistic nanomaterials and prediction of properties of novel nanoparticles. In addition, the new code will benefit more members of the scientific community because it is completely free-of-charge to use, while the old Tcl-based ChemShell is free only to users based in the UK. Because the new ChemShell's source code is available to everyone (we call it "open-source") and is developed in Python - one of the top popular and easy-to-learn programming languages nowadays, we expect it will also draw in more developers to explore further scientific possibilities or use the code for teaching purposes.
Achievement of objectives
We have successfully implemented a task-farming parallelisation framework in the Python-based version of ChemShell, including new features and support for codes that goes beyond the original Tcl-based version of the code. Our specific achievements against the original objectives are as follows:
Objective A: Implementation of a task-farming parallel framework in the Python-based version of ChemShell using workgroups defined by MPI communicators. Sharing of the MPI workgroup environment with external codes such as GAMESS-UK and GULP to perform multiple energy and gradient calculations simultaneously, to allow task-farming of common chemical tasks such as finite-difference gradients and nudged elastic band (NEB) optimisation in DL_FIND. Demonstration of a speedup factor of over 4 for finite-difference gradient evaluation in typical QM/MM calculations.
Achievement A: We have completed the implementation of task-farming parallelism in Py-ChemShell as part of the general-purpose parallel module. We have also implemented a module to perform task-farmed finite-difference gradients. The parallel scaling performance reported in the technical report demonstrates a speedup factor of over 4 at the theory level of GAMESS-UK/GULP.
Objective B: Parallelisation of the ChemShell routines that set up QM/MM model clusters from a periodic input structure, including parallel computation of electrostatic potential, field and field gradients on centres using Ewald summation and fitting of point charges around the cluster using this data to reproduce the missing periodic electrostatic interactions. Support for task-farming the cluster set up process to enable QM/MM calculations with multiple QM regions. Demonstration of parallel scaling on a target system containing 100,000 atoms, with a speedup factor of over 2 compared to the sequential calculation.
Achievement B: We built up a QM/MM ZrO2 nanoparticle containing over 1.6x105 atoms with electrostatic potential, well beyond the capability of Tcl-ChemShell. The parallel capability for cluster setup has been first implemented and is under optimisation and testing in a prototype CONSTRUCT code (version 90), which includes support for computation intensive Ewald summations over 3D and 2D periodic systems. A bottleneck in fitting of point charges is removed by using appropriate linear algebra routines and the code is further parallelised at the stage of calculation of nanoparticle electrostatic potential and its derivatives at sites of interest. This work enables a QM/MM model set-up for systems with massive unit cells and / or large active QM and MM regions. The code will be transferred to Py-ChemShell on completion of validation tests.
Objective C: Support for task-farming with codes that use the Global Arrays (GA) library such as NWChem, implemented by creating GA processor groups corresponding to the MPI workgroups and passing these to NWChem as a directly-linked library. Demonstration of a comparable speedup factor (>4) to GAMESS-UK for finite-difference gradient evaluation using NWChem.
Achievement C: We have fully enabled setting up a GA-based task-farming parallel computational environment within the Py-ChemShell parallel module. The benchmarks at the level of NWChem/GULP show that task-farmed calculations with GA-powered NWChem can shorten the computational time, although the speedup factor is smaller compared to GAMESS-UK/GULP because the CPU time for NWChem is dominant in the total QM/MM time.
Objective D: Creation of an interface to DL_POLY 4 in ChemShell including the facility to directly link the code and share the MPI environment. Demonstration of successful task-farming using ChemShell/NWChem/DL_POLY 4 with speedup factor comparable to ChemShell/NWChem/GULP above.
Achievement D: We have created a Py-ChemShell interface to DL_POLY 4, which is not available in the original Tcl-based version of ChemShell, and successfully linked it in as a shared object library to support parallel execution of DL_POLY. NWChem/DL_POLY 4 benchmarks have been carried out for which the observed scalability is similar to NWChem/GULP and lower than GAMESS-UK/GULP, again because the performance of NWChem is the dominant factor.
Objective E: Enabling of QM/MM catalysis studies of nanomaterials using ChemShell/NWChem/DL_POLY 4. Demonstration of benchmark task-farming calculations beyond the capabilities of the previous Tcl-based version of ChemShell (systems of larger than 100,000 atoms in the MM environment), with single and multiple QM regions.
Achievement E: We have built up a QM/MM ZrO2 nanoparticle containing over 1.6x105 atoms and successfully run a hybrid QM/MM calculation including the electrostatic potential of the whole particle at the NWChem/DL_POLY 4 level of theory. Our parallel benchmarks against the sequential calculation show that this type of calculation is highly scalable. The work is in progress to test task-farming parallelism of the hybrid QM/MM model set-up to support models with multiple independent and interacting QM regions, using the CONSTRUCT prototype at present.
Summary of the software
ChemShell (www.chemshell.org) is a computational chemistry environment for multiscale modelling. While it supports standard quantum chemical or force field calculations, its main strength lies in hybrid QM/MM calculations. The concept is to leave the time-consuming energy evaluation to external specialised codes, while ChemShell takes over higher level tasks, communication and data handling.
The original Tcl-based version of ChemShell is a well-established module on ARCHER, regularly appearing in the Top 10 codes by usage published on the ARCHER website, and was the 4th most used materials chemistry code in 2017. Py-ChemShell is a major redevelopment of the code to use Python as the user interface. The redevelopment work was completed in 2017 and a citable publication is in preparation.
The Py-ChemShell code is hosted in a Git repository on CCPForge (ccpforge.cse.rl.ac.uk/gf/project/chemsh-py). Users can access the code via a download portal at www.chemshell.org. Compared to Tcl-ChemShell, which is closed-source and free to use only for academics based in the UK, Py-ChemShell is free and open-source software released under the GNU Lesser General Public License version 3 (LGPL v3). The latest version of Py-ChemShell can be downloaded by registered users via the ChemShell website. Note that Py-ChemShell does not come with the licenced external quantum or molecular mechanics programs (for example, GAMESS-UK, NWChem, GULP, etc.), which need to be obtained separately, except for the libraries under the similar open-source licence, such as DL-FIND.
The first alpha release of Py-ChemShell was made in December 2017. With the first full release planned in Spring 2018, Py-ChemShell will then be made available as a module for ARCHER users.