Nektar++ on ARCHER2
The ARCHER2 national supercomputer is a world class advanced computing resource and is the successor to ARCHER. This guide is intended to provide basic instructions for compiling the Nektar++ stable release or master branch on the ARCHER2 system.
To log into ARCHER2 you should use the address:
ARCHER2 uses module based system to load various system modules. For compiling Nektar++ on ARCHER2 we need to choose the GNU compiler suite and load required modules. Note that git is automatically available on the system.
A brief summary of module commands on ARCHER2 (from ARCHER2 documentations):
module list [name]– List modules currently loaded in your environment, optionally filtered by
module avail [name]– List modules available, optionally filtered by
module spider [name][/version]– Search available modules (including hidden modules) and provide information on modules
module load name– Load the module called
nameinto your environment
module remove name– Remove the module called
namefrom your environment
module swap old new– Swap module
oldin your environment
module help name– Show help information on module
module show name– List what module
nameactually does to your environment
Basic module commands are briefly explained here.
export CRAY_ADD_RPATH=yes module swap PrgEnv-cray PrgEnv-gnu module load cray-fftw module load cmake
These options can be put in the file to avoid typing them for each session. One approach is to put these lines in the .profile file in the home directory. If it was not present there, you can create one. Note the dot in front of the .profile. A better way is to create a bash scriptas shown below. lets name it loadMyModules. To creat it type
touch loadMyModules in the terminal and press enter. it will creat an empty file loadMyModules. Open the file using your prefered text editor and put the following as well as any other module you need to load in the loadMyModules file and save it. Note that the shebang #! character must be on the first line
#!/bin/bash export CRAY_ADD_RPATH=yes module swap PrgEnv-cray PrgEnv-gnu module load cray-fftw module load cmake
after saving the file, make it executable by running
chmod +x loadMyModules. Now you can load your modules by running the file as:
Note that after running system may prints several warnings and information messages about system environment variables which are being unloaded and newly loaded. You can simply ignore these messages. Just type
q to get to the end of messages.
To clone the repository, first create a public/private ssh key-pair and add it to the gitLab. Instructions on creating ssh key can be found at Generating a new SSH key pair . If the ssh keys have already been set up, this step can be skipped.
The code must be compiled and run from work directory, which is at
/work/project_code/project_code/user_name . For example, for the project code
e01 and username
mlahooti, the work directory can be accessed at
/work/e01/e01/mlahooti. You can also
echo $HOME which in this example will prints
/home/e01/e01/mlahooti, and change the
/home/ part to
/work/ to access your work directory.
Enter the work directory and clone the Nektar++ code into a folder, e.g. nektarpp
cd /work/e01/e01/mlahooti git clone https://gitlab.nektar.info/nektar/nektar.git nektarpp
After the code is cloned, enter the nektarpp folder, make a build directory and enter it
cd nektarpp mkdir build cd build
The above three steps can be done with a single line command too
cd nektarpp && mkdir build && cd build
From within the build directory, run the configure command. Note the use of CC and CXX to select the special ARCHER-specific compilers.
CC=cc CXX=CC cmake -DNEKTAR_USE_SYSTEM_BLAS_LAPACK=OFF -DNEKTAR_USE_MPI=ON -DNEKTAR_USE_HDF5=ON -DNEKTAR_USE_FFTW=ON -DTHIRDPARTY_BUILD_BOOST=ON -DTHIRDPARTY_BUILD_HDF5=ON ..
CCare the C and C++ wrappers for the Cray utilities and determined by the
SYSTEM_BLAS_LAPACKis disabled since, by default, we can use the libsci package which contains an optimized version of BLAS and LAPACK and not require any additional arguments to cc.
- HDF5 is a better output option to use on ARCHER2 since often we run out of the number of files limit on the quota. Setting this option from within ccmake has led to problems however so make sure to specify it on the cmake command line as above. Further, the HDF5 version on the ARCHER2 is not supported at the moment, so here it is built as a third-party library.
- We are currently not using the system boost since it does not appear to be using C++11 and so causing compilation errors.
At this point you can run
ccmake .. to e.g. disable unnecessary solvers. Now run make as usual to compile the code
make -j 4 install
NOTE: Do not try to run regression tests – the binaries at this point are cross-compiled for the compute nodes and will not execute properly on the login nodes.
Building using the newer compiler than the default
Using the above instructions, you can build Nektar++ with gcc/10.2.0, It is possible to build the Nektar++ using a newer version of GCC , i.e. gcc/11.2.0 . You can switch to this compiler and its related build environment using the following commands:
module load cpe/21.09
module swap PrgEnv-cray PrgEnv-gnu
module load <any other required modules here>
Do not forget to add the GCC to the path in the job script. Generally, you can find the information about your compiler by running the following command
module show gcc
this will print several lines out to the screen including the following information:
whatis("Defines the system paths and environment variables needed for the GNU Compiling Environment.")
From the above information, the red blue is the information we need to add our version of GCC to the path, hence we need to add the following line to our jub script:
Running job on ARCHER2
ARCHER2 uses slurm for job submission which is different from PBS used in Imperial College CX1 and CX2. Nektar++ must be build in the work directory and jobs also must be submitted from work directory.
ARCHER2 supports three different Quality of Service (QoS) which is the type of job that can be run: standard, short, long, highmem, taskfarm, largescale, lowpriority and serial. Except the highmem and serial QoSs, the rest of theses QoSs are on standard partition. lowpriority QoS can be run on both highmem and standard partitions dependenging on the jog. Detailed description of these QoSs, the maximum job allowed in queue, maximum number of nodes and the wall for each can be found in ARCHER2 documentation on running jobs on ARCHER2. Summary of the most common QoS is provided below too:
- Standard : standard QoS allows maximum of 1024 nodes where each node can support 128 task (processes). The maximum wall time for this category is 24 hours. 64 jobs of this type can be queued and 16 jobs are allowed to be running simultaneously. This is the most commonly used QoS
- Short : Short Qos allows maximum of 32 nodes with maximum wall time of 20 minutes. 16 jobs allowed to be queued and maximum of running jobs are 4. Jobs with short QoS can only be submitted during Monday-Friday.
- Long : Long QoS allows maximum of 64 nodes with maximum wall time of 48 hours and minimum wall clock greater than 24 hours. 16 jobs of this type can be queued and 16 job can be running simultanously.
Slurm job script must contains number of nodes, number of task per node, number of cpus per task, wall time, budget ID, partition type, quality of service (QoS), number of OpenMp threads, job environment and execution command. It can also optioanlly have the user supplied job name for easier identification of the job.
The job script can be produced using the bolt module as follows, note that the arguments should be replaced with the program executable and its arguments. For more help you can run
bolt -h in the terminal.
module load bolt bolt -n [parallel tasks] -N [parallel tasks per node] -d [number of threads per task] -t [wallclock time (h:m:s)] -o [script name] -j [job name] -A [project code] [arguments...]
For an example consider if Nektar++ is installed in /work/e01/e01/mlahooti/nektarpp and the simulation is a 3D homogeneous 1D (2.5D) simulation with HomModesZ=8. We want to do the simulation on 256 processors which is 2 nodes each 128 processes for 14 hours and 20 minutes with Hdf5 output format. Also, we want to assign a name for the job, e.g. firstTest. Also suppose that we are using the budget with project project_id
Here is an example of slurm script for a standard job
#!/bin/bash # Slurm job options (job-name, compute nodes, job time) #SBATCH --job-name=firstTest #SBATCH --time=14:20:0 #SBATCH --nodes=2 #SBATCH --tasks-per-node=128 #SBATCH --cpus-per-task=1 # Replace [budget code] below with your budget code (e.g. t01) #SBATCH --account=project_id #SBATCH --partition=standard #SBATCH --qos=standard # Setup the job environment (this module needs to be loaded before any other modules) module load epcc-job-env # Set the number of threads to 1 # This prevents any threaded system libraries from automatically # using threading. export OMP_NUM_THREADS=1 export NEK_DIR=/work/e01/e01/mlahooti/nektar-master/build export NEK_BUILD=$NEK_DIR/dist/bin export LD_LIBRARY_PATH=/opt/gcc/10.2.0/snos/lib64:$NEK_DIR/ThirdParty/dist/lib:$NEK_DIR/dist/lib64:$LD_LIBRARY_PATH # Launch the parallel job srun --distribution=block:block --hint=nomultithread $NEK_BUILD/IncNavierStokesSolver naca0012.xml session.xml --npz 4 -i Hdf5 &> runlog
In the above script note the
module load epcc-job-env which exports the job environment and must be present in the scrip. On the full system, the gcc version is 10.2.0; on the 4-cabinet system, it is 10.1.0.
Further, for more convenient the script contains two
export commands which defines NEK_DIR and NEK_BUILD environment variables, the former is the path to Nektar++ build directory and the latter to the solver executable location. Additionally, the third export is to add the libraries location to the system path, where each library path is separated from others by colon : . I also exported the library path for gfortran since when I tried to run, the run terminated with error that cannot find gfortran.
To submit the job, assuming the above script is saved in a file named myjob.slurm run the following command
The job status can be monitored using
squeue -u $USER
running this command prints the following information on the screen, where
ST is the status of the job, here
PD means the job is waiting for resource allocation, other common status are
CA where means
running, failed, in the process of completing, completed and cancelled respectively.
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 121062 standard myJob-1 mlahooti PD 0:00 4 (Priority) 121064 standard myJob-2 mlahooti PD 0:00 4 (Priority)
Cancelling a job can be using
scancell job-ID command, where the job-ID, is the id of the job. for example, the job id for the first job above is 121062.
Further, detailed information about a particular job, including the estimation for start time can be obtained via
scontrol show job -dd job-ID
NOTE: It is highly recommended that the job script checked to be error free before submiting to the system. Using checkScript command checks for integrity of the job scrip, shows the errors and estimate the budget it will consume. Run the following command in the directory you want to submit the job for checking the script