Hi, as far as I understand your scenario, it seems somewhat similar to what I have been working on... We essentially have a long list of commands (different binaries run with different arguments) that we need to run and collect the outputs of. Our main restriction is that the array jobs only allow for 1000 jobs. What we do is the following: - Create a file of all the commands, one command per line - Create an array job that executes all commands in slices - Collect the results from the outputs Our batch file roughly looks like this: min=$SLURM_ARRAY_TASK_MIN max=$SLURM_ARRAY_TASK_MAX cur=$SLURM_ARRAY_TASK_ID tasks=`wc -l joblist` jobcount=$(( max - min + 1 )) slicesize=$(( (tasks + jobcount + 1) / jobcount )) start=$(( (cur - 1) * slicesize + min )) end=$(( start + slicesize - 1 )) for i in `seq ${start} ${end}`; do cmd=$(sed -n "${i}p" < joblist) echo "Executing $cmd" echo "# START ${i} #" ulimit -c 0 && ulimit -S -t 120 && $cmd ; rc=$? echo "# END ${i} #" done Note that time limits must be implemented manually (here via ulimit). We then submit this file with --wait. Does this help? (and cover your use case?) Best, Gereon On 2/15/19 3:45 PM, Johannes Sauer wrote:
I looked further. I think srun can not be used like this, as it also blocks when used inside sbatch I believe.
But I think I can just let the simulation controller run sbatch directly for each simulation and only use a single batch script which calls the controller again. Mapping to the correct simulation is then done via job name or other paramters. this is also very similar to how it works for LSF atm.
On 2/15/19 2:54 PM, Johannes Sauer wrote:
Hi,
for our simulations we have a simulation manager. For LSF this used to issue a bsub command for each simulation. We're not using array jobs for this as it has some more requirements.
I can not simply replace bsub with srun, I need to do a sbatch.
I believe this should work: Run the simulation manager with sbatch, then it should be able to do srun for the different simulations.
Will this work?
Best
Johannes
_______________________________________________ claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de
-- M.Sc. Johannes Sauer Researcher
Institut fuer Nachrichtentechnik RWTH Aachen University Melatener Str. 23 52074 Aachen Tel +49 241 80-27678 Fax +49 241 80-22196 sauer@ient.rwth-aachen.de http://www.ient.rwth-aachen.de
_______________________________________________ claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de
-- Gereon Kremer Lehr- und Forschungsgebiet Theorie Hybrider Systeme RWTH Aachen Tel: +49 241 80 21243