[claix18-slurm-pilot] problems with parallelization of Wien2k DFT code

19 Feb 2019

      Dear admins and users,

I'm using the Wien2k DFT code, which is basically a suite of various
subprograms each one doing one specific thing glued together by various
c shell scripts. There are parallellized at a hybrid MPI/openMP level
(+ some parts just spawn multiple nonMPI processes with OpenMP
threads). The problem is that various subtasks need different number of
MPI/openMP processes/threads for optimal speed.

I call the software by csh script run_lapw which will in turn call
programs like lapw0, lapw1, lapw2 and mixer.

The Wien2k c shell glue is doing the parallelization of subprograms
dispatch based on some internal configuration files which I'm
generating on the fly based on the actual allocated nodes as obtained
from SLURM. It sets the OMP_NUM_THREADS properly and calls the mpirun
with the correct number of processes and proper -machinefile <file>
generated based on the input from SLURM and optimal configuration for
each subprogram.

Some specific examples of optimal parallelization for the 48cpu node
for different subprograms being called from indide the run_lapw script:

- 8 MPI processes with 12omp threads each
- 32 noncomunicating processes with 2omp threads
- 49 MPI processes without threading
- single process with 48 omp threads

However what I see sometime is that the mpirun calls (intelmpi) from
the c shell scripts are intercepted and modified resulting in multiple
CPU processes (threads) being bound to single CPU and suboptimal
performance.

So basically I need a way to tell the SLURM to just allocate me a full
node and not mess up the mpirun calls from inside the csh scripts or
don't do any CPU pinning...

Any advice would be appreciated.

Best regards
Pavel Ondračka