Nektar++ on ARCHER2

The ARCHER2 national supercomputer is a world class advanced computing resource and is the successor to ARCHER. This guide is intended to provide basic instructions for compiling the Nektar++ stable release or master branch on the ARCHER2 system.

To log into ARCHER2 you should use the address:

ssh [userID]@login.archer2.ac.uk

Compilation Instruction

ARCHER2 uses module based system to load various system modules. For compiling Nektar++ on ARCHER2 we need to choose the GNU compiler suite and load required modules. Note that git is automatically available on the system.

A brief summary of module commands on ARCHER2 (from ARCHER2 documentations):

module list [name] – List modules currently loaded in your environment, optionally filtered by [name]
module avail [name] – List modules available, optionally filtered by [name]
module spider [name][/version] – Search available modules (including hidden modules) and provide information on modules
module load name – Load the module called name into your environment
module remove name – Remove the module called name from your environment
module swap old new – Swap module new for module old in your environment
module help name – Show help information on module name
module show name – List what module name actually does to your environment

Basic module commands are briefly explained here.

export CRAY_ADD_RPATH=yes
module swap PrgEnv-cray PrgEnv-gnu 
module load cray-fftw
module load cray-hdf5
module load cmake

These options can be put in the file to avoid typing them for each session. One approach is to put these lines in the .profile file in the home directory. If it was not present there, you can create one. Note the dot in front of the .profile. A better way is to create a bash scriptas shown below. lets name it loadMyModules. To creat it type touch loadMyModules in the terminal and press enter. it will creat an empty file loadMyModules. Open the file using your prefered text editor and put the following as well as any other module you need to load in the loadMyModules file and save it. Note that the shebang #! character must be on the first line

#!/bin/bash

export CRAY_ADD_RPATH=yes
module swap PrgEnv-cray PrgEnv-gnu 
module load cray-fftw
module load cray-hdf5
module load cmake

after saving the file, make it executable by running chmod +x loadMyModules. Now you can load your modules by running the file as: ./loadMyModules

Note that after running system may prints several warnings and information messages about system environment variables which are being unloaded and newly loaded. You can simply ignore these messages. Just type q to get to the end of messages.

To clone the repository, first create a public/private ssh key-pair and add it to the gitLab. Instructions on creating ssh key can be found at Generating a new SSH key pair . If the ssh keys have already been set up, this step can be skipped.

The code must be compiled and run from work directory, which is at /work/project_code/project_code/user_name . For example, for the project code e01 and username mlahooti, the work directory can be accessed at /work/e01/e01/mlahooti. You can also echo $HOME which in this example will prints /home/e01/e01/mlahooti, and change the /home/ part to /work/ to access your work directory.

Enter the work directory and clone the Nektar++ code into a folder, e.g. nektarpp

cd /work/e01/e01/mlahooti
git clone https://gitlab.nektar.info/nektar/nektar.git nektarpp

After the code is cloned, enter the nektarpp folder, make a build directory and enter it

cd nektarpp
mkdir build
cd build

The above three steps can be done with a single line command too
cd nektarpp && mkdir build && cd build

From within the build directory, run the configure command. Note the use of CC and CXX to select the special ARCHER-specific compilers.

CC=cc CXX=CC cmake -DNEKTAR_USE_SYSTEM_BLAS_LAPACK=OFF -DNEKTAR_USE_MPI=ON -DNEKTAR_USE_HDF5=ON -DNEKTAR_USE_FFTW=ON -DTHIRDPARTY_BUILD_BOOST=ON ..

cc and CC are the C and C++ wrappers for the Cray utilities and determined by the PrgEnv module.
SYSTEM_BLAS_LAPACK is disabled since, by default, we can use the libsci package which contains an optimized version of BLAS and LAPACK and not require any additional arguments to cc.
HDF5 is a better output option to use on ARCHER2 since often we run out of the number of files limit on the quota. It is better to use the cray-hdf5 module available on the ARCHERE2. You need to load the module as explained above (module load cray-hdf5) to use it. If for any reason you prefer not to use the ARCHER2 hdf5, you can use the one available with Nektar++ as a Third-party library by including -DTHIRDPARTY_BUILD_HDF5=ON in the command line above or running the ccmake .. after first configuration to activate it.

At this point you can run ccmake .. to e.g. disable unnecessary solvers. Now run make as usual to compile the code

make -j 4 install

NOTE: Do not try to run regression tests – the binaries at this point are cross-compiled for the compute nodes and should not execute on the login nodes.

Building using the newer compiler than the default

Using the above instructions, you can build Nektar++ with gcc/11.2.0, It is possible to build the Nektar++ using a newer or older version of GCC. To find the available versions one can run module -r spider '.*gcc.*' and for a specific version, for example gcc/12.2.0 module spider gcc/12.2.0. Follow the specific instructions printed on the terminal to load the version you need. Generally you need to do the followings

module swap PrgEnv-cray PrgEnv-gnu module load <any other required modules here> export LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH

Do not forget to add the GCC to the path in the job script. Generally, you can find the information about your compiler by running the following command

module show gcc

this will print several lines out to the screen including the following information, for example for the default gcc (gcc/11.2.0):

module show gcc or module show gcc/11.2.0

whatis("Defines the system paths and environment variables needed for the GNU Compiling Environment.") prepend_path("MODULEPATH","/opt/cray/pe/lmod/modulefiles/compiler/gnu/8.0") prepend_path("MODULEPATH","/opt/cray/pe/lmod/modulefiles/mix_compilers") prepend_path("PATH","/opt/cray/pe/gcc/11.2.0/bin") prepend_path("MANPATH","/opt/cray/pe/gcc/11.2.0/snos/share/man") prepend_path("INFOPATH","/opt/cray/pe/gcc/11.2.0/snos/share/info") prepend_path("LD_LIBRARY_PATH","/opt/cray/pe/gcc/11.2.0/snos/lib64") setenv("GCC_PATH","/opt/cray/pe/gcc/11.2.0") setenv("GCC_PREFIX","/opt/cray/pe/gcc/11.2.0") setenv("GCC_VERSION","11.2.0") setenv("GNU_VERSION","11.2.0") setenv("CRAY_LMOD_COMPILER","gnu/8.0") prepend_path("MODULEPATH","/opt/cray/pe/lmod/modulefiles/comnet/gnu/8.0/ofi/1.0") prepend_path("LMOD_CUSTOM_PATHS","COMPILER/work/y07/shared/archer2-lmod/utils/compiler/gnu/8.0") prepend_path("MODULEPATH","/work/y07/shared/archer2-lmod/utils/compiler/gnu/8.0")

From the above information, the following is the information we need to add our version of GCC to the path, hence we need to add the following line to our jub script:

LD_LIBRARY_PATH=/opt/cray/pe/gcc/11.2.0/snos/lib64:$LD_LIBRARY_PATH

Running job on ARCHER2

ARCHER2 uses slurm for job submission which is different from PBS used in Imperial College CX1 and CX2. Nektar++ must be build in the work directory and jobs also must be submitted from work directory.
ARCHER2 supports three different Quality of Service (QoS) which is the type of job that can be run: standard, short, long, highmem, taskfarm, largescale, lowpriority and serial. Except the highmem and serial QoSs, the rest of theses QoSs are on standard partition. lowpriority QoS can be run on both highmem and standard partitions dependenging on the jog. Detailed description of these QoSs, the maximum job allowed in queue, maximum number of nodes and the wall for each can be found in ARCHER2 documentation on running jobs on ARCHER2. Summary of the most common QoS is provided below too:

Standard : standard QoS allows maximum of 1024 nodes where each node can support 128 task (processes). The maximum wall time for this category is 24 hours. 64 jobs of this type can be queued and 16 jobs are allowed to be running simultaneously. This is the most commonly used QoS
Short : Short Qos allows maximum of 32 nodes with maximum wall time of 20 minutes. 16 jobs allowed to be queued and maximum of running jobs are 4. Jobs with short QoS can only be submitted during Monday-Friday.
Long : Long QoS allows maximum of 64 nodes with maximum wall time of 48 hours and minimum wall clock greater than 24 hours. 16 jobs of this type can be queued and 16 job can be running simultanously.

Slurm job script must contains number of nodes, number of task per node, number of cpus per task, wall time, budget ID, partition type, quality of service (QoS), number of OpenMp threads, job environment and execution command. It can also optioanlly have the user supplied job name for easier identification of the job.

The job script can be produced using the bolt module as follows, note that the arguments should be replaced with the program executable and its arguments. For more help you can run bolt -h in the terminal.

module load bolt
bolt -n [parallel tasks] -N [parallel tasks per node] -d [number of threads per task] -t [wallclock time (h:m:s)] -o [script name] -j [job name] -A [project code]  [arguments...]

For an example consider if Nektar++ is installed in /work/e01/e01/mlahooti/nektarpp and the simulation is a 3D homogeneous 1D (2.5D) simulation with HomModesZ=8. We want to do the simulation on 256 processors which is 2 nodes each 128 processes for 14 hours and 20 minutes with Hdf5 output format. Also, we want to assign a name for the job, e.g. firstTest. Also suppose that we are using the budget with project project_id
Here is an example of slurm script for a standard job

#!/bin/bash

# Slurm job options (job-name, compute nodes, job time)
#SBATCH --job-name=firstTest
#SBATCH --time=14:20:0
#SBATCH --nodes=2
#SBATCH --tasks-per-node=128
#SBATCH --cpus-per-task=1

# Replace [budget code] below with your budget code (e.g. t01)
#SBATCH --account=project_id
#SBATCH --partition=standard
#SBATCH --qos=standard
#SBATCH --distribution=block:block 
#SBATCH --hint=nomultithread

# Setup the job environment (this module needs to be loaded before any other modules)
module load epcc-job-env

# Set the number of threads to 1
#   This prevents any threaded system libraries from automatically 
#   using threading.
export OMP_NUM_THREADS=1

export NEK_DIR=/work/e01/e01/mlahooti/nektar-master/build
export NEK_BUILD=$NEK_DIR/dist/bin
export LD_LIBRARY_PATH=/opt/cray/pe/gcc/11.2.0/snos/lib64:$LD_LIBRARY_PATH:$NEK_DIR/ThirdParty/dist/lib:$NEK_DIR/dist/lib64:$LD_LIBRARY_PATH

# Launch the parallel job

srun $NEK_BUILD/IncNavierStokesSolver naca0012.xml session.xml --npz 4 -i Hdf5 &> runlog

Further, for more convenient the script contains two export commands which defines NEK_DIR and NEK_BUILD environment variables, the former is the path to Nektar++ build directory and the latter to the solver executable location. Additionally, the third export is to add the libraries location to the system path, where each library path is separated from others by colon :

To submit the job, assuming the above script is saved in a file named myjob.slurm run the following command
sbatch myjob.slurm

The job status can be monitored using squeue -u $USER
running this command prints the following information on the screen, where ST is the status of the job, here PD means the job is waiting for resource allocation, other common status are R, F, CG, CD and CA where means running, failed, in the process of completing, completed and cancelled respectively.

JOBID     PARTITION   NAME         USER       ST       TIME     NODES NODELIST(REASON)
121062    standard     myJob-1   mlahooti     PD       0:00        4 (Priority)
121064    standard     myJob-2   mlahooti     PD       0:00        4 (Priority)

Cancelling a job can be using scancell job-ID command, where the job-ID, is the id of the job. for example, the job id for the first job above is 121062.
Further, detailed information about a particular job, including the estimation for start time can be obtained via
scontrol show job -dd job-ID

NOTE: It is highly recommended that the job script checked to be error free before submiting to the system. Using checkScript command checks for integrity of the job scrip, shows the errors and estimate the budget it will consume. Run the following command in the directory you want to submit the job for checking the script

checkScript myjob.slurm