Dear Philipp,

first of all, please excuse the delay, but I had been ill the last two days.

On 1/21/19 8:31 PM, Philipp Berger wrote:
Dear list,

I am looking to run large numbers of jobs as an Array Job using SLURM.

I created a job file "jobs.txt", containing one configuration per line.
Using SLURM_ARRAY_TASK_ID I select the appropriate line for the current
task and execute the corresponding configuration in the batch script.

My test file currently holds 1308 configurations, which I was unable to
submit using sbatch, as the MaxArrayTask variable seems to be set to
1001. What is the optimal/proposed way of scheduling large numbers of
configurations?
Yes, that is correct, and it is also the right way to submit such jobs. You could e.g. split the jobs.txt in two parts, or you could submit a second array job which chooses the configuration line by SLURM_ARRY_TASK_ID with an offset.
This is by the way the same, as it has been with LSF for years now.
I need one core (or thread?!) per configuration (#SBATCH --cpus-per-task
1 and #SBATCH --ntasks-per-core 1 (are both necessary?)), 20G of memory
per configuration (#SBATCH --mem-per-cpu=20480M).
As we are still within the acceptance phase of the cluster, the configuration is not yet the final one. We will ourselves have to test one or two things, before we will have a configuration for the production cluster. The goal is to schedule by core, not by thread, since many applications (especially memory intensive ones) do not profit from hyperthreading. We would like to get cgroups created, which enclose the hyperthread of each core. Thus it would be possible to use hyperthreading with no more "costs" regarding the accounting, but by default, on a 20 core host at max 20 tasks will be scheduled. For the moment, I would just take #SBATCH --ntasks=1.
My hope was to build these large configuration lists and then later
submit them to our own batch system so that it would use all available
nodes to crunch all configurations blockwise serialised as fast as
possible. Is that not what the SLURM Array Jobs are supposed to do?
Yes, that is exactly what array jobs are supposed to do, but they are limited at size.

Kind regards,

Philipp

Best
Marcus



      
_______________________________________________
claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de
To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de

-- 
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner@itc.rwth-aachen.de
www.itc.rwth-aachen.de