Nektar++ on ARCHER2
The ARCHER2 national supercomputer is a world class advanced computing resource and is the successor to ARCHER. This guide is intended to provide basic instructions for compiling the Nektar++ stable release or master branch on the ARCHER2 system.
ARCHER2 uses module based system to load various system modules. For compiling Nektar++ on ARCHER we need to choose the GNU compiler suite and load required modules. Note that by logging to ARCHER2, the system automatically loads cmake which its default version at the time of writing this instruction is 3.18.4. Further, git is also automatically available.
Basic module commands are briefly explained here.
export CRAY_ADD_RPATH=yes module restore PrgEnv-gnu module load cray-fftw
These options can be put in the file to avoid typing them for each session. Note that after running system prints several warnings and information messages about system environment variables which are being unloaded and newly loaded. You can simply ignore these messages. Just type
q to get to the end of messages and then load fftw.
To clone the repository, first create a public/private ssh key-pair and add it to the gitLab. Instructions on creating ssh key can be found at Generating a new SSH key pair . If the ssh keys have already been set up, this step can be skipped.
The code must be compiled and run from work directory, which is at
/work/project_code/project_code/user_name . For example, for the project code
e01 and username
mlahooti, the work directory can be accessed at
/work/e01/e01/mlahooti. You can also
echo $HOME which in this example will prints
/home/e01/e01/mlahooti, and change the
/home/ part to
/work/ to access your work directory.
Enter the work directory and clone the Nektar++ code into a folder, e.g. nektarpp
cd /work/e01/e01/mlahooti git clone https://gitlab.nektar.info/nektar/nektar.git nektarpp
After the code is cloned, enter the nektarpp folder, make a build directory and enter it
cd nektarpp mkdir build cd build
The above three steps can be done with a single line command too
cd nektarpp && mkdir build && cd build
From within the build directory, run the configure command. Note the use of CC and CXX to select the special ARCHER-specific compilers.
CC=cc CXX=CC cmake -DNEKTAR_USE_SYSTEM_BLAS_LAPACK=OFF -DNEKTAR_USE_MPI=ON -DNEKTAR_USE_HDF5=ON -DNEKTAR_USE_FFTW=ON -DTHIRDPARTY_BUILD_BOOST=ON -DTHIRDPARTY_BUILD_HDF5=ON -DTHIRDPARTY_BUILD_FFTW=ON ..
CCare the C and C++ wrappers for the Cray utilities and determined by the
SYSTEM_BLAS_LAPACKis disabled since, by default, we can use the libsci package which contains an optimized version of BLAS and LAPACK and not require any additional arguments to cc.
- HDF5 is a better output option to use on ARCHER2 since often we run out of the number of files limit on the quota. Setting this option from within ccmake has led to problems however so make sure to specify it on the cmake command line as above. Further, the HDF5 version on the ARCHER2 is not supported at the moment, so here it is built as a third-party library.
- We are currently not using the system boost since it does not appear to be using C++11 and so causing compilation errors.
At this point you can run
ccmake .. to e.g. disable unnecessary solvers. Now run make as usual to compile the code
make -j 4 install
NOTE: Do not try to run regression tests – the binaries at this point are cross-compiled for the compute nodes and will not execute properly on the login nodes.
Running job on ARCHER2
ARCHER2 uses slurm for job submission which is different from PBS used in Imperial College CX1 and CX2. Nektar++ must be build in the work directory and jobs also must be submitted from work directory.
ARCHER2 supports three different Quality of Service (QoS) which is the type of job that can be run: standard, short and long. All of theses QoSs are on standard partition. A brief overview of these QoS is provided below and detailed description can be found in ARCHER2 documentation on running jobs on ARCHER2.
- Standard : standard QoS allows maximum of 940 nodes where each node can support 128 task (processes). The maximum wall time for this category is 24 hours. This is the most commonly used QoS
- Short : Short Qos allows maximum of 8 nodes with maximum wall time of 20 minutes. Jobs with short QoS can only be submitted during Monday-Friday.
- Long : Long QoS allows maximum of 64 nodes with maximum wall time of 48 hours. The minimum wall time for Long QoS jobs must be 24 hours.
Slurm job script must contains number of nodes, number of task per node, number of cpus per task, wall time, budget ID, partition type, quality of service (QoS), number of OpenMp threads, job environment and execution command. It can also optioanlly have the user supplied job name for easier identification of the job.
The job script can be produced using the bolt module as follows, note that the arguments should be replaced with the program executable and its arguments. For more help you can run
bolt -h in the terminal.
module load bolt bolt -n [parallel tasks] -N [parallel tasks per node] -d [number of threads per task] -t [wallclock time (h:m:s)] -o [script name] -j [job name] -A [project code] [arguments...]
For an example consider if Nektar++ is installed in /work/e01/e01/mlahooti/nektarpp and the simulation is a 3D homogeneous 1D (2.5D) simulation with HomModesZ=8. We want to do the simulation on 256 processors which is 2 nodes each 128 processes for 14 hours and 20 minutes with Hdf5 output format. Also, we want to assign a name for the job, e.g. firstTest. Also suppose that we are using the budget with project project_id
Here is an example of slurm script for a standard job
#!/bin/bash # Slurm job options (job-name, compute nodes, job time) #SBATCH --job-name=firstTest #SBATCH --time=14:20:0 #SBATCH --nodes=2 #SBATCH --tasks-per-node=128 #SBATCH --cpus-per-task=1 # Replace [budget code] below with your budget code (e.g. t01) #SBATCH --account=project_id #SBATCH --partition=standard #SBATCH --qos=standard # Setup the job environment (this module needs to be loaded before any other modules) module load epcc-job-env # Set the number of threads to 1 # This prevents any threaded system libraries from automatically # using threading. export OMP_NUM_THREADS=1 export NEK_DIR=/work/e01/e01/mlahooti/nektar-master/build export NEK_BUILD=$NEK_DIR/dist/bin export LD_LIBRARY_PATH=/opt/gcc/10.1.0/snos/lib64:$NEK_DIR/ThirdParty/dist/lib:$NEK_DIR/dist/lib64:$LD_LIBRARY_PATH # Launch the parallel job srun --distribution=block:block --hint=nomultithread $NEK_BUILD/IncNavierStokesSolver naca0012.xml session.xml --npz 4 -i Hdf5 &> runlog
In the above script note the
module load epcc-job-env which exports the job environment and must be present in the scrip.
Further, for more convenient the script contains two
export commands which defines NEK_DIR and NEK_BUILD environment variables, the former is the path to Nektar++ build directory and the latter to the solver executable location. Additionally, the third export is to add the libraries location to the system path, where each library path is separated from others by colon : . I also exported the library path for gfortran since when I tried to run, the run terminated with error that cannot find gfortran.
To submit the job, assuming the above script is saved in a file named myjob.slurm run the following command
The job status can be monitored using
squeue -s $USER
running this command prints the following information on the screen, where
ST is the status of the job, here
PD means the job is waiting for resource allocation, other common status are
CA where means
running, failed, in the process of completing, completed and cancelled respectively.
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 121062 standard myJob-1 mlahooti PD 0:00 4 (Priority) 121064 standard myJob-2 mlahooti PD 0:00 4 (Priority)
Cancelling a job can be using
scancell job-ID command, where the job-ID, is the id of the job. for example, the job id for the first job above is 121062.
Further, detailed information about a particular job, including the estimation for start time can be obtained via
scontrol show job -dd job-ID
NOTE: It is highly recommended that the job script checked to be error free before submiting to the system. Using checkScript command checks for integrity of the job scrip, shows the errors and estimate the budget it will consume. Run the following command in the directory you want to submit the job for checking the script