problems with parallelization of Wien2k DFT code
Dear admins and users, I'm using the Wien2k DFT code, which is basically a suite of various subprograms each one doing one specific thing glued together by various c shell scripts. There are parallellized at a hybrid MPI/openMP level (+ some parts just spawn multiple nonMPI processes with OpenMP threads). The problem is that various subtasks need different number of MPI/openMP processes/threads for optimal speed. I call the software by csh script run_lapw which will in turn call programs like lapw0, lapw1, lapw2 and mixer. The Wien2k c shell glue is doing the parallelization of subprograms dispatch based on some internal configuration files which I'm generating on the fly based on the actual allocated nodes as obtained from SLURM. It sets the OMP_NUM_THREADS properly and calls the mpirun with the correct number of processes and proper -machinefile <file> generated based on the input from SLURM and optimal configuration for each subprogram. Some specific examples of optimal parallelization for the 48cpu node for different subprograms being called from indide the run_lapw script: - 8 MPI processes with 12omp threads each - 32 noncomunicating processes with 2omp threads - 49 MPI processes without threading - single process with 48 omp threads However what I see sometime is that the mpirun calls (intelmpi) from the c shell scripts are intercepted and modified resulting in multiple CPU processes (threads) being bound to single CPU and suboptimal performance. So basically I need a way to tell the SLURM to just allocate me a full node and not mess up the mpirun calls from inside the csh scripts or don't do any CPU pinning... Any advice would be appreciated. Best regards Pavel Ondračka
participants (1)
-
Pavel Ondračka