Dear list,
I am looking to run large numbers of jobs as an Array Job using SLURM.
I created a job file "jobs.txt", containing one configuration per line. Using SLURM_ARRAY_TASK_ID I select the appropriate line for the current task and execute the corresponding configuration in the batch script.
My test file currently holds 1308 configurations, which I was unable to submit using sbatch, as the MaxArrayTask variable seems to be set to 1001. What is the optimal/proposed way of scheduling large numbers of configurations? Yes, that is correct, and it is also the right way to submit such jobs. You could e.g. split the jobs.txt in two parts, or you could submit a second array job which chooses the configuration line by SLURM_ARRY_TASK_ID with an offset. This is by the way the same, as it has been with LSF for years now. I need one core (or thread?!) per configuration (#SBATCH --cpus-per-task 1 and #SBATCH --ntasks-per-core 1 (are both necessary?)), 20G of memory per configuration (#SBATCH --mem-per-cpu=20480M). As we are still within the acceptance phase of the cluster, the configuration is not yet the final one. We will ourselves have to test one or two things, before we will have a configuration for the
My hope was to build these large configuration lists and then later submit them to our own batch system so that it would use all available nodes to crunch all configurations blockwise serialised as fast as possible. Is that not what the SLURM Array Jobs are supposed to do? Yes, that is exactly what array jobs are supposed to do, but they are
Dear Philipp, first of all, please excuse the delay, but I had been ill the last two days. On 1/21/19 8:31 PM, Philipp Berger wrote: production cluster. The goal is to schedule by core, not by thread, since many applications (especially memory intensive ones) do not profit from hyperthreading. We would like to get cgroups created, which enclose the hyperthread of each core. Thus it would be possible to use hyperthreading with no more "costs" regarding the accounting, but by default, on a 20 core host at max 20 tasks will be scheduled. For the moment, I would just take #SBATCH --ntasks=1. limited at size.
Kind regards,
Philipp
Best Marcus
_______________________________________________ claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de
-- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wagner@itc.rwth-aachen.de www.itc.rwth-aachen.de