March 2019 - claix18-slurm-pilot - lists.rwth-aachen.de

#BSUB -P abcd4321
by simon＠isf.rwth-aachen.de 24 Apr '19

24 Apr '19

Hi all, Is there an equivalent of the #BSUB -p command in SLURM as well? Best Wishes, Marek

4 3

CUDA on non-gpu machines
by Wilfried Michel 01 Apr '19

01 Apr '19

Hi all, I am using some piece of software that internally uses tensorflow to do some model training and evaluation. In the training phase I'd like to profit from the benefits of GPGPU and am therefore using a gpu version of tensorflow. In the inference phase, the gpu would be heavily underutilized and an evaluation of the models on cpu is totally acceptable. The gpu version of tensorflow in that case detects automatically that no gpu is present and runs all computation on cpu. It is however still linked against the cuda libraries and crashes if they are not present. My problem now is, that on the non-gpu part of the Claix18 cluster no cuda libraries seem to be available. steps to reproduce: $ ssh login18-1 ls /usr/local_rwth/sw/cuda/8.0.44 #No such file or directory $ ssh login18-g-1 ls /usr/local_rwth/sw/cuda/8.0.44 #Directory contents $ ssh login ls /usr/local_rwth/sw/cuda/8.0.44 #Directory contents $ sbatch --wrap="ls /usr/local_rwth/sw/cuda/8.0.44" #No such file or directory $ sbatch --gres=gpu:1 --wrap="ls /usr/local_rwth/sw/cuda/8.0.44"#Directory contents (I have taken this path from the environment variable LD_LIBRARY_PATH after loading the module cuda/80 on login18-g-1) I realize that there is the possibility to compile my software twice, one version using tensorflow-gpu and one using tensorflow(-cpu). But I think that having cuda module available also on cpu nodes is also usefull for other settings (e.g. compiling gpu-software without the need to block a gpu slot). So I would like to ask if cuda modules on cpu-nodes have been forgotten, or if this was a deliberate design decision. In the latter case I'd like to open the discussion to change this decision. Or am I missing some crucial point in the module system? Thank you very much and best regards Wilfried Michel

2 1

Bond to cores required?
by Julien Guénolé 29 Mar '19

29 Mar '19

Hi all, is the binding to cores automatic with SLURM? The command --cpu-bind=cores is not recognized. I tried to force the binding by using #SBATCH -B 2:24:1 but it says that there is no such nodes available. However #SBATCH -B 2:12:1 works unexpectedly fine! Thanks! Bests, Julien -- ************************************************** Dr. Julien GUÉNOLÉ Research Group Head ---------------------- Institute of Physical Metallurgy and Metal Physics RWTH Aachen University Kopernikusstrasse 14 52074 Aachen, GERMANY ---------------------- Room 202 Phone [office] +49 241 80 26866 Email guenole(a)imm.rwth-aachen.de Web http://www.julien-guenole.fr Twitter @nanouayeur **************************************************

2 2

pam_slurm_adopt
by Marcus Wagner 29 Mar '19

29 Mar '19

Dear users, we have reached another, important milestone. We implemented pam_slurm_adopt, which means, that you can ssh to a computenode, where your job is running. This does also mean, that you can ssh from one compute node in the jobs to another compute node in the job. * CFX users, please test multinode jobs again, they should be functional now. * starccm+ users, please test your jobs again, they should be functional now. * This milestone is also a prerequisite for full X11-Forwarding ** You can do something like that already now. Submit a job with e.g. "sleep 100000" as command. Do a "ssh -Y <nodename>". Then you can start there the GUI, might it be xterm or whatever else. ** Full X11-Forwarding would mean something like sbatch --x11 xterm (in short) Kind regards Marcus -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wagner(a)itc.rwth-aachen.de www.itc.rwth-aachen.de

1 0

Semi-repeatable low CPU usage
by Gereon Kremer 14 Mar '19

14 Mar '19

Hi, (as of now) we mainly use the system for benchmarking, that is we measure how long it takes a solver to run on a particular input. I'm not so much interested in the result of the solver, but rather want to know the runtime. I honestly don't really care about a difference of a second or whether we use wall-clock or CPU time. After our solver has loaded the input file (of only a few KB) there is no further IO and the whole process is entirely CPU bound -- so we assume 100% CPU load and just assume that CPU time and wall-clock time are essentially the same. All tasks are run with a timeout (here: two minutes + 3 seconds grace time accounting for CPU vs. wall clock etc. measured with date) and a memout (here: 8GB) The corresponding part of the script looks like this (with $cmd being the command that is run): start=`date +"%s%3N"` ulimit -c 0 && ulimit -S -v 8388608 && ulimit -S -t 123 && time $cmd end=`date +"%s%3N"` echo "time: $(( end - start ))" I however observer that from time to time a task takes way longer than it should, i.e. the time that is output is way beyond 120. I currently have an example with above 5 minutes and have already seen instances with almost 10 minutes. About every second run or so (one run being an array job with 1000 individual jobs running 12 tasks each) I hit a case where one individual task takes way longer. The time output would then look like this: 122.24s user 0.53s system 40% cpu 5:05.67 total Unfortunately I cannot really reproduce it: it happens with seemingly random inputs and only once or twice on a run. It however happens rather consistently every second run or so. Running this particular input (on the login node) is just fine with 100% CPU load and stopped by ulimit after 123 seconds. As I run multiple tasks within one array job this also leads to this array job being canceled (as I compute the overall time limit from the timeouts and assume that every task actually finishes within its timeout), for example: slurmstepd: error: *** JOB 491231 ON nihm017 CANCELLED AT 2019-03-11T23:57:48 DUE TO TIME LIMIT *** (The issue happened about 6-8 minutes earlier than this message) Can you trace this to something happening on the nodes? Or do I simply have to rerun stuff until it did not happen anymore? Best, Gereon -- Gereon Kremer Lehr- und Forschungsgebiet Theorie Hybrider Systeme RWTH Aachen Tel: +49 241 80 21243

2 3

Increases Jobs per User in integrated hosting
by Gereon Kremer 11 Mar '19

11 Mar '19

Hi all, apparently we have a limit of 100 concurrent Jobs per User on slurm. This seems reasonable for the (shared) main cluster as we wouldn't schedule more than that anyway as other users want to use the system as well. The situation is somewhat different for the integrated hosting part however (though this comes with a few questions from my side): My understanding is that we have exclusive access to our hardware. (Is this the case? Or do we only have "prioritized" access and the hardware is used by others as well if idle?) Anyway we would expect that a user (from our ih project) can use all our hardware provided that no other user (from our ih project) is using it as well. If we do the math however we provide more than 300 cores, but only 100 jobs are scheduled. At the same time we will probably have only one or two user using our partition quite frequently, essentially wasting time... Long story short: Could we increase this limit (at least for ih partitions)? Best, Gereon -- Gereon Kremer Lehr- und Forschungsgebiet Theorie Hybrider Systeme RWTH Aachen Tel: +49 241 80 21243

1 0

jobs hang in rsync
by Ondracka, Pavel 09 Mar '19

09 Mar '19

Hi, I've recently noticed that some of my jobs do hang (or at least take a many hours to finish while normally they run around 3 hours). The problematic part seems to be in rsync calls. Since I have bad experience with the network discs throughput and stability, I do something like this in my jobfiles: CASE=$(basename $SLURM_SUBMIT_DIR) TMP=/w0/tmp/slurm_$(username).$SLURM_JOB_ID/ rsync -a $SLURM_SUBMIT_DIR $TMP cd $TMP/$CASE # DO WORK rsync -a --exclude '*.dayfile' $TMP/$CASE/ $SLURM_SUBMIT_DIR scp $TMP/$CASE/*.dayfile $SLURM_SUBMIT_DIR/ rm -rf $TMP/$CASE/ i.e., I copy data to the local drive to speed calculations up. Mostly it works OK, however sometime the job hangs, connecting to the node with "srun --jobid <jobid> --pty /bin/zsh" and running ps ux shows: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND sl119982 212008 0.0 0.0 124916 2032 ? S 11:56 0:00 /bin/zsh /var/spool/slurm/job437663/slurm_script sl119982 218531 0.0 0.0 118248 1532 ? S 12:01 0:00 rsync -a --exclude *.dayfile /w0/tmp/slurm_sl119982.437663//6-4/ /rwthfs/rz/cluster/work/sl119982/TiAlON/x_0.6667_y_0.0000_g_0.0625/6-4 sl119982 218532 0.0 0.0 117928 876 ? S 12:01 0:00 rsync -a --exclude *.dayfile /w0/tmp/slurm_sl119982.437663//6-4/ /rwthfs/rz/cluster/work/sl119982/TiAlON/x_0.6667_y_0.0000_g_0.0625/6-4 sl119982 218533 0.0 0.0 118188 768 ? D 12:01 0:00 rsync -a --exclude *.dayfile /w0/tmp/slurm_sl119982.437663//6-4/ /rwthfs/rz/cluster/work/sl119982/TiAlON/x_0.6667_y_0.0000_g_0.0625/6-4 sl119982 223991 0.0 0.0 127240 2440 pts/0 Ss 12:13 0:00 /bin/zsh sl119982 224467 0.0 0.0 115588 2220 pts/0 S 12:13 0:00 bash sl119982 233247 0.0 0.0 155380 1928 pts/0 R+ 12:28 0:00 ps ux There are 3?!? rsync processes running and all are sleeping? I have no idea what is going on. I tried to attach to the rsync process to see what is going on: gdb attach 218533 bt #0 0x00002b2909102620 in __close_nocancel () from /lib64/libc.so.6 #1 0x0000564544f79de6 in recv_files () #2 0x0000564544f84161 in do_recv () #3 0x0000564544f849ac in start_server () #4 0x0000564544f84af5 in child_main () #5 0x0000564544fa3ce9 in local_child () #6 0x0000564544f67e9b in main () And actually after detaching from the process it somehow got started, switched back to running status and everything finished. Any ideas? The jobid of the last stuck job was 437663 if anyone wants to investigate. I'll send more jobids when I see this again... Best regards Pavel

2 1

disabled hyperthreading?
by Pavel Ondračka 08 Mar '19

08 Mar '19

Hi, I have a question about the hyperthreading. Previously when I wanted to allocate full node with hyperthreading I did this: #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-core=2 #SBATCH --ntasks-per-node=96 #SBATCH --mem=180G This no longer works, i.e., setting ntasks per node to anything higher than 48 will show sbatch: error: Batch job submission failed: Requested node configuration is not available even when the --ntasks-per-core=2 is set. Any ideas? How can I allocate full node with hyperthreading? Best regards Pavel Ondračka

4 3

GPU cluster
by Zhijian Li 07 Mar '19

07 Mar '19

Hi, I used --grep gpu:1 to ask for a node with GPU available, then I got below information: sbatch: error: Batch job submission failed: Requested node configuration is not available does this mean that the GPU cluster is not ready? Is it temporary or is there any plan to bring GPU cluster back? Best wishes, Li *______________________________* Zhijian Li Institute for Computational Genomics RWTH Aachen University Pauwelsstrasse 19 52074 Aachen, Germany

1 0

Different behaviour if using /home/ or /rwthfs/...
by Gereon Kremer 05 Mar '19

05 Mar '19

Hi, I observe a weird behaviour when using different paths to a binary and an input file. As we know $HOME and $WORK resolve to /home/.../ and /work/... though /home/ and /work/ are symlinks to /rwthfs/rz/cluster/... So it should not make a difference, right? I have a (statically linked) binary that behaves in a certain way (that I want to debug...) If I call it with the canonical paths I get: % time /rwthfs/rz/cluster/home/gk809425/smtrat_aklima/build/smtrat_2 /rwthfs/rz/cluster/work/gk809425/benchmarks/QF_NRA/hycomp/ball_count_2d_hill.01.seq_lazy_linear_enc_lemmas_global_4.smt2 (error "expected sat, but returned unsat") /rwthfs/rz/cluster/home/gk809425/smtrat_aklima/build/smtrat_2 64.78s user 0.14s system 99% cpu 1:05.08 total So it terminates after about 65 seconds. (repeatably) Now I use the non-canonical paths: % pwd /home/gk809425/smtrat_aklima/build % time ./smtrat_2 $WORK/benchmarks/QF_NRA/hycomp/ball_count_2d_hill.01.seq_lazy_linear_enc_lemmas_global_4.smt2 Which does not terminate for more than four minutes... Also it is CPU-bound, so it does not seem to wait for IO. Just to be sure: those commands were executed in the same session, so it is the same environment in terms of loaded modules, env variables, etc. Can anyone guess what is going on here? Best, Gereon -- Gereon Kremer Lehr- und Forschungsgebiet Theorie Hybrider Systeme RWTH Aachen Tel: +49 241 80 21243

2 5