Hi, I'm running a rather large job array on the integrated hosting part (in the moves account). In our understanding the whole hardware we contributed to the IH should be split among all jobs of this account, however way less (array) jobs are running than I would expect. Right now there is only a single job array running for this account. The job array has 6000 individual jobs, each needs a single core (I don't set any arguments affecting core selection) and is running for up to four minutes. Hence slurm should have a rather easy job to keep every core busy. Given that we should have 7 nodes with 48 cores each, I expect the number of running jobs to be at least 200-300 or so. (Depending on how many jobs terminate very quickly and how long slurm takes to start new ones). However I see from `squeue -A moves -t R` that the number ob jobs is usually around 20-30, sometimes below 10 and never seems to exceed 50. Are there any limits on how many jobs are run concurrently? If yes: What are these? Please increase them appropriately, at least for IH accounts, so that we can actually use our hardware... If no: What is going on here? I don't set any particular options in the job, constraints are -C hpcwork -C skx8160. sinfo tells me that the respective nodes are all available (mix or idle). Best, Gereon -- Gereon Kremer Lehr- und Forschungsgebiet Theorie Hybrider Systeme RWTH Aachen Tel: +49 241 80 21243
Hi, upon closer inspection I found the following: Slurm seems to schedule new jobs not when others terminate but every 75-90 seconds and when it does, it only seems to schedule 30 new jobs (per user? per account? globally?). Thus I'm limited to something like 1 Job per 3 seconds right now. Is there anything we can do about that? Best, Gereon On 5/8/19 10:35 AM, Gereon Kremer wrote:
Hi,
I'm running a rather large job array on the integrated hosting part (in the moves account). In our understanding the whole hardware we contributed to the IH should be split among all jobs of this account, however way less (array) jobs are running than I would expect. Right now there is only a single job array running for this account.
The job array has 6000 individual jobs, each needs a single core (I don't set any arguments affecting core selection) and is running for up to four minutes. Hence slurm should have a rather easy job to keep every core busy. Given that we should have 7 nodes with 48 cores each, I expect the number of running jobs to be at least 200-300 or so. (Depending on how many jobs terminate very quickly and how long slurm takes to start new ones).
However I see from `squeue -A moves -t R` that the number ob jobs is usually around 20-30, sometimes below 10 and never seems to exceed 50.
Are there any limits on how many jobs are run concurrently?
If yes: What are these? Please increase them appropriately, at least for IH accounts, so that we can actually use our hardware...
If no: What is going on here? I don't set any particular options in the job, constraints are -C hpcwork -C skx8160. sinfo tells me that the respective nodes are all available (mix or idle).
Best, Gereon
_______________________________________________ claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de
-- Gereon Kremer Lehr- und Forschungsgebiet Theorie Hybrider Systeme RWTH Aachen Tel: +49 241 80 21243
participants (1)
-
Gereon Kremer