Hi,

is the multifactor priority plugin enabled? In this case the scheduling priority can be affected by several factors. What  I see atm is

sprio -w                
          JOBID PARTITION   PRIORITY        AGE  FAIRSHARE    JOBSIZE  PARTITION        QOS
        Weights                           10000      10000      10000     100000          1

I was wondering what affects the fairshare? I believe it is bad if my jobs take (much) less memory than requested. How about the requested time? Is it also influencing the fairshare?

Best

Johannes

PS: One can display his own calculated factors with sshare -l -a

On 5/23/19 9:02 AM, Pavel Ondračka wrote:
On Thu, 2019-05-23 at 08:20 +0200, Marcus Wagner wrote:
Hi,

we had (at least) yesterday a problem with the scheduler, resulting
in requeueing jobs, that it wanted to start.
This led to a cluster, which was only using one sixth of its
capacity.
I had to rewrite the whole prolog  part of the scheduler, now it is
performant again. 
This should also decrease the probability of the hatred "socket
send/receive" errors.

The queue is much smaller now
$> squeue -t pd | wc -l
1251

This can also be seen in the following picture:
OK, thank you for the fix.

Nonetheless, the length of the queue and therefore how long users
need to wait, is nothing we can influence. Its you, the users, who
submit jobs.
I can understand that and in no way I was suggesting that the long
queue is your fault, if you got this felling from my email, then I
apologize.

What about the start time estimates? Any chance to get this working?

I would also really appreciate some more info about the job scheduling
priority, but this has low priority ATM I guess.

Regarding the accounting, it might be misunderstood, that we do not
record the data. 
The problem is, that the tools needed to do the final accounting need
rewriting. But SLURM does not behave a way, we expected so I'm again
and again distracted from continueing my work on the accounting.
It is not simply switching on accounting.
So just to make this clear, you do record the used hours, it is just
nor possible to show them at the moment (e.g., with the
r_batch_submission)? So can I somehow tell if I'm ATM burning CPU hours
from last month/this month/next months quota (can I ATM use all of
my/projects CPU hours without knowing)?

BTW I share your sentiment towards SLURM, it also makes me distracted
from my real work way more than I would like to, I'm missing the old
scheduler already ;-)

Best regards
Pavel
_______________________________________________
claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de
To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de
-- 
M.Sc. Johannes Sauer
Researcher

Institut fuer Nachrichtentechnik
RWTH Aachen University
Melatener Str. 23
52074 Aachen
Tel +49 241 80-27678
Fax +49 241 80-22196
sauer@ient.rwth-aachen.de
http://www.ient.rwth-aachen.de