ARCHER White Papers
White Papers produced by the ARCHER Service.
Analysis of parallel I/O use on the UK national supercomputing service, ARCHER using Cray's LASSi and EPCC SAFE
Version 1.0, May 21, 2019
Andrew Turner, EPCC, The University of Edinburgh
Dominic Sloan-Murphy, EPCC, The University of Edinburgh
Karthee Sivalingam, Cray European Research Lab
Harvey Richardson, Cray European Research Lab
Julian Kunkel, Department of Computer Science, University of Reading
In this paper, we describe how we have used a combination of the LASSi tool (developed by Cray) and the SAFE software (developed by EPCC) to collect and analyse Lustre I/O performance data for all jobs running on the UK national supercomputing service, ARCHER; and to provide reports on I/O usage for users in our standard reporting framework. We also present results from analysis of parallel I/O use on ARCHER and analysis on the potential impact of different applications on file system performance using metrics we have derived from the LASSi data. We show that the performance data from LASSi reveals how the same application can stress different components of the file system depending on how it is run, and how the LASSi risk metrics allow us to identify use cases that could potentially cause issues for global I/O performance and work with users to improve their I/O use. We use the IO-500 benchmark to help us understand how LASSi risk metrics correspond to observed performance on the ARCHER file systems. We also use LASSi data imported into SAFE to identify I/O use patterns associated with different research areas, understand how the research workflow gives rise to the observed patterns and project how this will affect I/O requirements in the future. Finally, we provide an overview of likely future directions for the continuation of this work.
Performance of HPC Application Benchmarks across UK National HPC services: single node performance
Version 1: March 29, 2019
DOI: 10.5281/zenodo.2616549
Andrew Turner, EPCC, THe University of Edinburgh
In this report compare the performance of different processor architectures for different application benchmarks. To reduce the complexity of the comparisons, we restrict the results in this report to single node only. This allows us to compare the performance of the different compute node architectures without the additional complexity of also comparing different interconnect technologies and topologies. Multi-node comparisons will be the subject of a future report. Architectures compared in this report cover three generations of Intel Xeon CPUs, Marvell Arm ThunderX2 CPUs and NVidia GPUs.
Benefits of the ARCHER eCSE Programme
Version 1.0, July 30, 2018
Lorna Smith, Alan Simpson, Chris Johnson, Xu Guo, Neelofer Banglawala, EPCC, The University of Edinburgh
The eCSE programme has allocated funding to the UK computational science community through a series of funding calls over a period of 5 years. The goal throughout has been to deliver a funding programme that is fair, transparent, objective and consistent. The projects funded through this programme were selected to contribute to the following broad aims:
- To enhance the quality, quantity and range of science produced on the ARCHER service through improved software;
- To develop the computational science skills base, and provide expert assistance embedded within research communities, across the UK;
- To provide an enhanced and sustainable set of HPC software for UK science.
The eCSE programme is a significant source of funding for the Research Software Engineering community and all UK Higher Education Institutions are able to apply for funding. This document provides more detail on the programme, looking at how the funding has been spent and examining the various benefits realised from the programme.
A survey of application memory usage on a national supercomputer: an analysis of memory requirements on ARCHER
Version 1.0, October 3, 2017
Andy Turner, EPCC, The University of Edinburgh
Simon McIntosh-Smith, Department of Computer Science, University of Bristol
In this short paper we set out to provide a set of modern data on the actual memory per core and memory per node requirements of the most heavily used applications on a contemporary, national-scale supercomputer. This report is based on data from the UK national supercomputing service, ARCHER, a 118,000 core Cray XC30, in the 1 year period from 1 July 2016 to 30 June 2017 inclusive. Our analysis shows that 80% of all usage on ARCHER has a maximum memory use of 1 GiB/core or less (24 GiB/node or less) and that there is a trend to larger memory use as job size increases. Analysis of memory use by software application type reveals differences in memory use between periodic electronic structure, atomistic N-body, grid-based climate modelling, and grid-based CFD applications. We present an analysis of these differences, and suggest further analysis and work in this area. Finally, we discuss the implications of these results for the design of future HPC systems, in particular the applicability of high bandwidth memory type technologies.
Source data (CSV format):
- Overall Memory Usage Statistics
- VASP Memory Usage Statistics
- CASTEP Memory Usage Statistics
- CP2K Memory Usage Statistics
- GROMACS Memory Usage Statistics
- LAMMPS Memory Usage Statistics
- NAMD Memory Usage Statistics
- Met Office UM Memory Usage Statistics
- MITgcm Memory Usage Statistics
- SBLI Memory Usage Statistics
- OpenFOAM Memory Usage Statistics
Parallel I/O Performance Benchmarking and Investigation on Multiple HPC Architectures
Version 1.4, June 29, 2017
Andy Turner, Xu Guo, Dominic Sloan-Murphy, Juan Rodriguez Herrera, EPCC, The University of Edinburgh
Chris Maynard , Met Office, United Kingdom
Bryan Lawrence, The University of Reading
Solving the bottleneck of I/O is a key consideration when optimising application performance, and an essential step in the move towards exascale computing. Users must be informed of the I/O performance of existing HPC resources in order to make best use of the systems and to be able to make decisions about the direction of future software development effort for their application. This paper therefore presents benchmarks for the write capabilities for ARCHER, comparing them with those of the Cirrus, COSMA, COSMA6, UK-RDF DAC, and JASMIN systems, using MPI-IO and, in selected cases, the HDF5 and NetCDF parallel libraries.
Using Dakota on ARCHER
Version 1.0, March 22, 2017
Gordon Gibb, EPCC, The University of Edinburgh
Dakota[1] is a toolkit that automates running a series of simulations whose input param- eters can be varied in order to determine their effects on the simulation results. In par- ticular, Dakota can be used to determine optimal parameter values, or quantify a model’s sensitivity to varying parameters.
This white paper describes how to use Dakota on ARCHER.
UK National HPC Benchmarks
Version 1.0, March 16, 2017
Andy Turner, EPCC, The University of Edinburgh
This paper proposes an updated set of benchmarks for the UK National HPC Service based on historical use patterns and consultations with users.
Implementation of Dual Resolution Simulation Methodology in LAMMPS
Version 1.2, October 19, 2016
Iain Bethune, EPCC, The University of Edinburgh
Sophia Wheeler, Sam Genheden and Jonathan Essex, The University of Southampton
This white paper describes the implementation in LAMMPS of the Dual Resolution force-field ELBA. In particular, symplectic and time-reversible integrators for coarse-grained beads are provided for NVE, NVT and NPT molecular dynamics simulations and a new weighted load balancing scheme allows for improved parallel scaling when multiple timestepping (r-RESPA) is used. The new integrators are available in the 30th July 2016 release of LAMMPS and the load balancer in the lammps-icms branch. A version of LAMMPS with all of this functionality has been installed on ARCHER as the module 'lammps/elba' and is available to all users.
Invertastic: Large-scale Dense Matrix Inversion
Version 1.0, June 15, 2016
Alan Gray, EPCC, The University of Edinburgh
This white paper introduces Invertastic, a relatively simple application designed to invert an arbitrarily large dense symmetric positive definite matrix using multiple processors in parallel. This application may be used directly (e.g. for genomic studies where the matrix represents the genetic relationships between multiple individuals), or instead as a reference or template for those wishing to implement large-scale linear algebra solutions using parallel libraries such as MPI, BLACS, PBLAS, ScaLAPACK and MPI-IO. The software is freely available on GitHub and as a centrally available package on ARCHER (at /work/y07/y07/itastic)
VOX-FE: New functionality for new communities
Version 1.1, June 10, 2016
Neelofer Banglawala and Iain Bethune, EPCC, The University of Edinburgh
Michael Fagan and Richard Holbrey, The University of Hull
This white paper describes new functionality implemented in the VOX-FE finite element bone modelling package funded by ARCHER eCSE project 04-11. In particular, we describe new features in the GUI for setting up realistic muscle-wrapping boundary conditions, improvements to the performance of the solver by using ParMETIS to generate optimal partitioning of the model, and better automation for the dynamic remodelling process. The VOX-FE code is freely available from https://sourceforge.net/projects/vox-fe/ under a BSD licence.
Using NETCDF with Fortran on ARCHER
Version 1.1, January 29, 2016
Toni Collis, EPCC, The University of Edinburgh
This paper explains one particular approach to parallel IO based on the work completed in an ARCHER funded eCSE on the TPLS software package [2]: using NetCDF. There are multiple resources available online for using NetCDF, but the majority focus on software written in C. This guide aims to help users of ARCHER who have software written in modern Fortran (90/95 onwards) to take advantage of NetCDF using parallel file reads and writes.
Voxel-based finite element modelling with VOX-FE2
Version 1.0, 20 May 2015
Neelofer Banglawala and Iain Bethune, EPCC, The University of Edinburgh
Michael Fagan and Richard Holbrey, The University of Hull
This white paper summarises the work of an ARCHER eCSE project to redevelop the VOX-FE voxel finite element modelling package to improve its capabilities, performance and usability. We have developed a new GUI, implemented as a Paraview plugin, a new solver which uses PETSc and demonstrated how iterative remodelling simulations can be run on ARCHER.
Parallel Software usage on UK National HPC Facilities
Version 1.0, 23 Apr 2015
Andy Turner, EPCC, The University of Edinburgh
Data and analysis of parallel applications on UK National HPC facilities HECToR and ARCHER including:
- Trends in application usage over time: which applications have declined in use and which have become more important to particular research communities; and why might this be?
- Trends in the sizes of jobs: which applications have been able to increase their scaling properties in line with architecture changes and which have not? Can we identify why this is the case?
- Changes in research areas on the systems: which areas have appeared/increased and which have declined?
Supplementary Data
- HECToR Phase 2a Usage Data (txt)
- HECToR Phase 2b Usage Data (txt)
- HECToR Phase 3 Usage Data (txt)
- ARCHER Usage Data (txt)
Using RSIP Networking with Parallel Applications on ARCHER Phase 2
Version 1.0, 7 Apr 2015
Iain Bethune, EPCC, The University of Edinburgh
Instructions on how to use RSIP to enable TCP/IP communications between parallel jobs on the compute nodes and the login nodes (and beyond). Two case study applications are shown: Parallel visualisation using ParaView, and Path Integral Molecular Dynamics with i-PI and CP2K.
Monitoring the Cray XC30 Power Management Hardware Counters
Version 1.3, 19 Dec 2014
Michael Bareford, EPCC, The University of Edinburgh
Monitoring code power usage using the Cray XC30 hardware counters. Impact of compiler and parallel programming model on power consumption on ARCHER.
Performance of Parallel IO on ARCHER
Version 1.1, 15 Jun 2015
David Henty, Adrian Jackson, EPCC, The University of Edinburgh
Charles Moulinec, Vendel Szeremi, STFC, Daresbury
Performance benchmarchs and advice for parallel IO on ARCHER. This is work in progress and will be continually updated.
What's with all this Python import scaling, anyhow?
Version 1.0, 17 Dec 2014
Nick Johnson, EPCC, The University of Edinburgh
Addressing the issue of poor scaling performance on Python import. This is work in progress and will be continually updated.