Hi Johannes,
yes, we are also seeing these messages in the logfiles, yet we could not solve the issue up to now.
Best Marcus
On 5/3/19 10:49 AM, Johannes Sauer wrote:
Dear all,
from time to time I keep getting erros similar to this one when submitting jobs:
RuntimeError: Execution of 'sbatch -t 3600 --mem-per-cpu=10G --account=rwth0333 --job-name ChairliftRide_8192x4096_QP22_FTBE0to32 -o log/ChairliftRide_8192x4096_QP22_FTBE0to32.queue_out.log -e log/ChairliftRide_8192x4096_QP22_FTBE0to32.queue_out.log rz_start_anysim.sh' exited with status != 0 (1): sbatch: error: Batch job submission failed: Socket timed out on send/recv operation
Anyone else having this problem? Doing the same submission again works fine. Looks like the controller can not handle the load of submissions?
Best
Johannes
claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de