Re: Starting large amounts of jobs
Hi Gereon, if you worry about load balancing in scenario 1 what you could do is use a central syncronization tool like a db where submitted jobs can fetch one task atomically and execute it. Once there are no more tasks to fetch from the DB the job ends. But I'm not sure what network requests the clusters firewall allows. And it would be more difficult to setup. Greetings, Eugen On Mon, Feb 25, 2019 at 6:14 PM Gereon Kremer <gereon.kremer@cs.rwth-aachen.de> wrote:
Hello,
following the discussion at the end of todays workshop I tried how the scheduler behaves when issuing a larger amount of jobs (Marcus essentially told me I could use approach 3 as detailed below). To frame my question, here is what want to do and how I try to do it (numbers just to get the magnitude):
# Problem 10 Binaries, 10k input files. Run every binary on every input file, and collect all the results (= parse stdout).
It seems array jobs are the tool for that, however the size of an array job is capped at 1000, apparently because larger jobs make the scheduler slow.
# Approach 1 - Create one file with 10*10k lines (./binary input-file) - Create one job with 1000 array jobs - Let ID be the id of the current array job - Identify the slice (10*10k) / 1000 * ID .. (10*10k) / 1000 * (ID + 1) - Execute all lines from the slice sequentially - Pro: Only one job, no scheduling hassle on the user side. - Con: weird script logic, 100 individual tasks in one scheduled array job, sometimes bad load balancing (i.e. one job takes way longer than the others)
# Approach 2 - Create (10*10k)/1000 files, each containing 1000 lines - Create as many jobs, one for each file - Load the ID'th line from the respective file and execute it - Push all these jobs to the scheduler - Pro: Easier logic in each script - Con: Multiple job, I have to take care of submitting and waiting for the results in parallel.
# Approach 3 - Create 10*10k jobs, let the scheduler deal with it - Every job executes one task (./binary input-file) - Pro: very simple jobs and scripts - Con: huge amount of jobs, can the scheduler handle that?
I'm using approach 1 already and it works somewhat fine. That being said the script logic is rather involved and load balancing is not that great. I routinely have a handful of jobs at the end that run for 10 minutes or so longer than all the others where a single task is capped at one minute. This is pretty annoying. Also, we are exploring what the best-practice should be here...
I just tried approach 2 and it did not go to well, even for only about 12k tasks. To try the scaling I made every array job 100 in size, so I tried to schedule about 120 jobs. While it went well for about 75 jobs, sbatch started to come back with the following afterwards:
sbatch: error: Slurm temporarily unable to accept job, sleeping and retrying
and quickly afterwards:
sbatch: error: Batch job submission failed: Resource temporarily unavailable
I then tried to "relax" a bit and added a one second delay between the calls to sbatch... and it does not change everything. Thus I don't have a lot of hope for approach 3...
Any comments or ideas?
Best, Gereon
-- Gereon Kremer Lehr- und Forschungsgebiet Theorie Hybrider Systeme RWTH Aachen Tel: +49 241 80 21243
_______________________________________________ claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de
Dear Eugen, while this would potentially solve our problem, we _/do not want to write our own scheduler/_! This is what SLURM should do. We are still a bit puzzled as to why our use case is so outlandish - our initial expectation was to find a matrix-job support in SLURM. Our Array-Job is already the result of us projecting our matrix job (solvers x problems x configurations) down into a single-column vector. Ideally, that would not be necessary. But okay, this we can deal with. This whole striping & scheduling business on the other hand... In my mind, "hiding" jobs (or rather, granularity) from the scheduler can only lead to problems -- and adds complexity to the user side which, again, can only lead to problems and sub-par performance. Kind regards, Philipp Am 25.02.2019 um 18:25 schrieb Eugen Beck:
Hi Gereon,
if you worry about load balancing in scenario 1 what you could do is use a central syncronization tool like a db where submitted jobs can fetch one task atomically and execute it. Once there are no more tasks to fetch from the DB the job ends. But I'm not sure what network requests the clusters firewall allows. And it would be more difficult to setup.
Greetings, Eugen
On Mon, Feb 25, 2019 at 6:14 PM Gereon Kremer <gereon.kremer@cs.rwth-aachen.de> wrote:
Hello,
following the discussion at the end of todays workshop I tried how the scheduler behaves when issuing a larger amount of jobs (Marcus essentially told me I could use approach 3 as detailed below). To frame my question, here is what want to do and how I try to do it (numbers just to get the magnitude):
# Problem 10 Binaries, 10k input files. Run every binary on every input file, and collect all the results (= parse stdout).
It seems array jobs are the tool for that, however the size of an array job is capped at 1000, apparently because larger jobs make the scheduler slow.
# Approach 1 - Create one file with 10*10k lines (./binary input-file) - Create one job with 1000 array jobs - Let ID be the id of the current array job - Identify the slice (10*10k) / 1000 * ID .. (10*10k) / 1000 * (ID + 1) - Execute all lines from the slice sequentially - Pro: Only one job, no scheduling hassle on the user side. - Con: weird script logic, 100 individual tasks in one scheduled array job, sometimes bad load balancing (i.e. one job takes way longer than the others)
# Approach 2 - Create (10*10k)/1000 files, each containing 1000 lines - Create as many jobs, one for each file - Load the ID'th line from the respective file and execute it - Push all these jobs to the scheduler - Pro: Easier logic in each script - Con: Multiple job, I have to take care of submitting and waiting for the results in parallel.
# Approach 3 - Create 10*10k jobs, let the scheduler deal with it - Every job executes one task (./binary input-file) - Pro: very simple jobs and scripts - Con: huge amount of jobs, can the scheduler handle that?
I'm using approach 1 already and it works somewhat fine. That being said the script logic is rather involved and load balancing is not that great. I routinely have a handful of jobs at the end that run for 10 minutes or so longer than all the others where a single task is capped at one minute. This is pretty annoying. Also, we are exploring what the best-practice should be here...
I just tried approach 2 and it did not go to well, even for only about 12k tasks. To try the scaling I made every array job 100 in size, so I tried to schedule about 120 jobs. While it went well for about 75 jobs, sbatch started to come back with the following afterwards:
sbatch: error: Slurm temporarily unable to accept job, sleeping and retrying
and quickly afterwards:
sbatch: error: Batch job submission failed: Resource temporarily unavailable
I then tried to "relax" a bit and added a one second delay between the calls to sbatch... and it does not change everything. Thus I don't have a lot of hope for approach 3...
Any comments or ideas?
Best, Gereon
-- Gereon Kremer Lehr- und Forschungsgebiet Theorie Hybrider Systeme RWTH Aachen Tel: +49 241 80 21243
_______________________________________________ claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de
claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de
participants (2)
-
Eugen Beck
-
Philipp Berger