7 May
2019
7 May
'19
10:32 a.m.
Hi, I've lately noticed some of my jobs failing (timing out) with: srun: Job 1692770 step creation temporarily disabled, retrying srun: error: Unable to create step for job 1692770: Unable to contact slurm controller (connect failure) Any ideas what could be going wrong? I've been running similar jobs for a long time and this type of failures seem quite recent... Best regards Pavel