Hi Johannes, yes, we are also seeing these messages in the logfiles, yet we could not solve the issue up to now. Best Marcus On 5/3/19 10:49 AM, Johannes Sauer wrote:
Dear all,
from time to time I keep getting erros similar to this one when submitting jobs:
RuntimeError: Execution of 'sbatch -t 3600 --mem-per-cpu=10G --account=rwth0333 --job-name ChairliftRide_8192x4096_QP22_FTBE0to32 -o log/ChairliftRide_8192x4096_QP22_FTBE0to32.queue_out.log -e log/ChairliftRide_8192x4096_QP22_FTBE0to32.queue_out.log rz_start_anysim.sh' exited with status != 0 (1): sbatch: error: Batch job submission failed: Socket timed out on send/recv operation
Anyone else having this problem? Doing the same submission again works fine. Looks like the controller can not handle the load of submissions?
Best
Johannes
_______________________________________________ claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de
-- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wagner@itc.rwth-aachen.de www.itc.rwth-aachen.de