Nektar++ on the Imperial College HPC cluster
Basic info about the HPC cluster
Imperial College’s HPC system contains computing resources suitable for a broad range of jobs. Node sizes and interconnects therefore vary. This cluster is useful for serial runs, small and large parallel runs, parameter sweep studies, etc. There are also some dedicated private queues which some users may have access.
The official documentation for Imperial College’s HPC systems can be found here:
Above is a diagram explaining the layout of the Imperial College HPC machine. It is different to your personal computer or workstation as the interaction, the computation and the storage are disaggregated and distributed across a number of different machines or “nodes”.
When you log into to the cluster you are allocated a shared resource called a login node. There can be many users assigned to the same node – so it is imperative that no resource intensive jobs are run here – this will slow the machine down for all users on the same login node.
The computes nodes themselves are headless – meaning we cannot visually interact with them. Use of them is dictated by a scheduler that allocates resources to jobs according to the availability of nodes and priority and sizing of jobs.
The Research Data Store holds the majority of the storage for the cluster. You are assigned a specific allocation, additional project allocations can be purchased or created through the RCS website.
Access to the cluster
To get access, you should contact your supervisor (see request access for details).
Before you start using the HPC cluster, you are also expected to be familiar with the Linux command line, SSH, and the PBSPro job submission system.
Once your supervisor has given you access, you can log in with your Imperial College credentials by typing the following command into a Linux shell or MacOS terminal
ssh username@login.hpc.imperial.ac.uk
Imperial College’s HPC cluster is only accessible from computers that are connected to the Imperial College network. If this is not the case, see remote access for more information on how to access the HPC cluster when you are outside campus.
When you access the HPC cluster, you will connect to a login node. The login nodes are shared by all users, and you should therefore not run any compute intensive tasks on these nodes. Instead, the login nodes should only be used to
- Download and compile Nektar++ (
git
works for cloning) - Prepares a PBS script to submit and run jobs
- Submits the job to the queue
Compilation instructions
Before you start compiling Nektar++, you need to load the following modules
module load intel-suite/2017.6 \
cmake/3.18.2 \
mpi/intel-2019.8.254 \
flex \
fftw/3.3.3-double \
hdf5/1.8.15-parallel \
scotch/6.0.3-nothread \
boost/1.66.0
It is advised that you install Nektar++ in your $HOME
directory, which is the directory you enter when you access the cluster using ssh
. To organise all your files, you can create a new directory for all your programs
mkdir $HOME/Programs
You can now download and install Nektar++ in your Programs
directory by following these instructions
- Go to the
Programs
directory
cd $HOME/Programs
- Clone the Nektar++ repository
git clone https://gitlab.nektar.info/nektar/nektar.git nektar++
- Create a directory where Nektar++ will be installed
cd nektar++
mkdir build
cd build
- If you plan to run Nektar++ on CX1 or CX2, use the following command to configure the build
CC=mpicc CXX=mpicxx
cmake \
-DNEKTAR_USE_FFTW=ON \
-DNEKTAR_USE_MKL=ON \
-DNEKTAR_USE_MPI=ON \
-DNEKTAR_USE_HDF5=ON \
-DNEKTAR_USE_SYSTEM_BLAS_LAPACK=OFF \
-DNEKTAR_TEST_FORCEMPIEXEC=ON \
-DCMAKE_CXX_FLAGS=-std=c++11\ -O3\ -xSSE4.2\ -axAVX,CORE-AVX-I,CORE-AVX2 \
-DCMAKE_C_FLAGS=-O3\ -xSSE4.2\ -axAVX,CORE-AVX-I,CORE-AVX2 \
..
If you only want to compile a specific solver, or further configure the build, you write ccmake [FLAGS] ../
. For details on how to configure Nektar++, see the user guide available on https://www.nektar.info/getting-started/documentation.
- If you plan to run Nektar++ on the new CX3 nodes, which use AMD processors, use the following command to configure the build
CC=mpicc CXX=mpicxx cmake \
-DNEKTAR_USE_FFTW=ON \
-DNEKTAR_USE_MKL=ON \
-DNEKTAR_USE_MPI=ON \
-DNEKTAR_USE_HDF5=ON \
-DNEKTAR_USE_SYSTEM_BLAS_LAPACK=OFF \
-DNEKTAR_TEST_FORCEMPIEXEC=ON \
-DCMAKE_CXX_FLAGS=-std=c++11\ -O3\ -mavx \
-DCMAKE_C_FLAGS=-O3\ -mavx \
..
- After the build has been configured using
cmake
, you can compile the code
make -j4 install
Note: Do not run regression tests as usual on the login node, as the parallel tests will not be allowed to run. Instead, follow the instructions below.
Running regression tests
It is not possible to start the regression tests on the login node. Instead you must submit a job using the following script
#!/bin/sh
#PBS -l walltime=0:30:00
#PBS -l select=1:ncpus=2:mpiprocs=2:mem=1024mb
# Path to Nektar++ installation
NEKTAR_BUILD=$HOME/Programs/nektar++/build
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${NEKTAR_BUILD}/ThirdParty/dist/lib
module load intel-suite/2017.6 cmake/3.18.2 mpi/intel-2019.8.254 flex fftw/3.3.3-double hdf5/1.8.15-parallel scotch/6.0.3-nothread boost/1.66.0
cd $PBS_O_HOME/${NEKTAR_BUILD}
ctest | tee reg_output.txt
cp reg_output.txt $HOME/
Assuming the script file is runtests.pbs, you can submit a job to run the test as follows
qsub runtests.pbs
This submits the regression tests job to an execution queue. Check the status of your job(s) using
qstat
You will find the results of the regression tests in the reg_output.txt
file which has been copied in your $HOME
directory.
The PBS queue system will execute your job once resources are available.
- Resource allocation: memory, runtime, number of nodes
- $HOME, $WORK, …
Running jobs
To execute Nektar++ in parallel on the HPC cluster, you need to create and submit a PBS script. The PBC script tells the queue system or scheduler which resources you need, how long your simulation will take, and how to execute your code. Depending on your job size, there are six main queues that your job will be assigned to:
- High Throughput
- High End
- High Memory
- GPU
- Large
- Capability
Several other queues are available, depending on department and research undertaken. Running qstat with the -q flag will list all available queues. Under state S means the queue is suspended whereas R means the queue is operational and running.
Your job is automatically assigned to one of these queues based on the resources you request at the top of your PBS script.
#!/bin/bash
#PBS -l select=N:ncpus=X:mem=Ygb
#PBS -l walltime=HH:MM:SS
Here, N
denotes the number of nodes, and X
the number of processors per node (X=24
or 28
). The total number MPI processes that will be launched is therefore Y = N*X
. Your selection of N
and X
, the amount of memory requested, and the requested wall clock time, determines which queue your job is put into. For further documentation, see job size guidance. Most jobs as suggested will use either 24 or 28 core nodes. For capability jobs (i.e. 72+ nodes) 28 core nodes should be specified. Otherwise, specifying 24 is preferred to not limit the hardware the scheduler will allocate.
After you specified the resources you need, you need to make sure all modules are loaded. To do this, simply add the following lines to your PBC script
# Load modules
module load intel-suite/2017.6 cmake/3.18.2 mpi/intel-2019.8.254 flex fftw/3.3.3-double hdf5/1.8.15-parallel scotch/6.0.3-nothread boost/1.66.0
These are the same modules as we loaded before when compiling Nektar++.
Next, we will create symbolic links to the input files
# Link input files to temporary directory
ln -f -s ${PBS_O_WORKDIR}/*.xml ${TMPDIR}
ln -f -s ${PBS_O_WORKDIR}/*.rst ${TMPDIR}
ln -f -s ${PBS_O_WORKDIR}/*.nekg ${TMPDIR}
The environment variables $PBS_O_WORKDIR
and $TMP_DIR
point to the directory where you submitted the job and the local directory where the cluster executes your code, respectively. Therefore, if you submit the job from the directory where you store all the input files to Nektar++, you can make sure that the cluster knows where to find your input files without having to specify absolute paths.
Finally, we will add the actual command for executing Nektar++
# Setup Environment Variables
NEK_PATH=$HOME/Programs/nektar++
INC_SOLVER=$NEK_PATH/build/dist/bin/IncNavierStokesSolver
COMP_SOLVER=$NEK_PATH/build/dist/bin/CompressibleFlowSolver
JOB_NAME=name-of-input-file
# Export Third Party Libraries
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$NEK_PATH/build/ThirdParty/dist/lib
# Execute Nektar++
mpiexec $INC_SOLVER ${JOB_NAME}.xml --io-format Hdf5 > $PBS_O_WORKDIR/output.txt
Note that you don’t have to specify the number of MPI processes. If you don’t use the HDF5 I/O format, you can also remove the associated flag.
Once the job is finished, you need to move results back into the directory where you submitted Nektar++. To do this, add the following lines to your PBC script
# Copy results back
rsync -aP ${JOB_NAME}*.chk $PBS_O_WORKDIR/
rsync -aP ${JOB_NAME}.fld $PBS_O_WORKDIR/
To submit your job, change directory to the directory where the input files for Nektar++ are stored. After this, run the following command
qsub ~/job_script.pbs
Here, we assume that the PBS script file is called job_script.pbs
, and that it is stored under the $HOME
directory.
Additional documentation for PBS job scripts
Specifying number of MPI processes/node
Specifying a particular number of MPI processes/node is trivial through the use of the mpiprocs
PBS option:
#!/bin/bash
#PBS -l select=3:ncpus=12:mpiprocs=2:mem=1024MB:icib=true
#PBS -l walltime=00:30:00
This will request three 12-core nodes with infiniband, but spawn only two MPI processes on each of the nodes for a total of 6 processes. This is useful for when shared-memory parallelism is implemented.
Specifying email to be send when job’s status changes
It is possible to receive emails when the job status changes in the queue via `m’ setting:
#!/bin/bash
#PBS -m abe
#PBS -M user_email_address
Here an argument to -m setting is one or more of the characters {n,a,b,e}:
- a — send mail when job is aborted,
- b — send mail when job begins,
- e — send mail when job terminates.
The default is “a” if not specified.
Checking on the status of your job
To check on the status of your job you need to log to the node that is running your job. To find this out you first type
qstat -u myusername
To identify the job ID MyJobID
then type
qstat -f MyJobID
This will return details of your running job. In this information there is an item called exec_host
which tells you which nodes your job is running on, i.e. cx1-135-19-1
To check the estimated start time of larger jobs – the -T and -w flags may be useful. For example:
qstat -T -w
Next ssh into this run and cd /tmp/pbs.myJobID.cx1/
and you find your job executing on this node.
MPI Debugging
To check what resources are being allocated and which ranks to which nodes, etc, specify the following before calling mpirun
.
export I_MPI_DEBUG=100
Not the use of pbsdsh2
is a distributed shell command to ensure that each node gets a copy of the appropriate input files. The \ prior to * is necessary to escape the asterisk and ensure it is not expanded by the shell when passed through pbsdsh2
.