Low number of running jobs

8 May 2019

      Hi,

I'm running a rather large job array on the integrated hosting part (in
the moves account). In our understanding the whole hardware we
contributed to the IH should be split among all jobs of this account,
however way less (array) jobs are running than I would expect. Right now
there is only a single job array running for this account.

The job array has 6000 individual jobs, each needs a single core (I
don't set any arguments affecting core selection) and is running for up
to four minutes. Hence slurm should have a rather easy job to keep every
core busy. Given that we should have 7 nodes with 48 cores each, I
expect the number of running jobs to be at least 200-300 or so.
(Depending on how many jobs terminate very quickly and how long slurm
takes to start new ones).

However I see from `squeue -A moves -t R` that the number ob jobs is
usually around 20-30, sometimes below 10 and never seems to exceed 50.

Are there any limits on how many jobs are run concurrently?

If yes: What are these? Please increase them appropriately, at least for
IH accounts, so that we can actually use our hardware...

If no: What is going on here? I don't set any particular options in the
job, constraints are -C hpcwork -C skx8160. sinfo tells me that the
respective nodes are all available (mix or idle).

Best,
Gereon

-- 
Gereon Kremer
Lehr- und Forschungsgebiet Theorie Hybrider Systeme
RWTH Aachen
Tel: +49 241 80 21243

Gereon Kremer

Gereon Kremer

tags

participants (1)