- claix18-slurm-pilot - lists.rwth-aachen.de

Clang & C++14
by Philipp Berger 05 Feb '19

05 Feb '19

Dear list, apparently the Clang installation is build against the headers of the default GCC 4.8.5, which limits its understanding of C++ to C++11. Would it be possible to at least target the headers of GCC5? That would at least support C++14. Of course, with GCC8.2.0 being available as well, C++17 should also be possible ;) Kind regards, Philipp -- Philipp Berger https://moves.rwth-aachen.de/people/berger/ Software Modeling and Verification Group RWTH Aachen University Phone +49/241/80-21206 Ahornstraße 55, 52056 Aachen, Germany

4 5

Maintenance and additionally requeued jobs
by Marcus Wagner 05 Feb '19

05 Feb '19

Dear all, i made a mistake yesterday and misinterpreted the TIME column of 'squeue'. For PENDING jobs, it shows the requested time, for RUNNING jobs, it shows the time since jobstart. So the following jobs will hit the maintenance mark of 9 o'clock. So I also had to requeue them: 111600,116276,116231,116287,119637,119647,118940,119631 Best Marcus -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wagner(a)itc.rwth-aachen.de www.itc.rwth-aachen.de

1 1

Maintenance
by Marcus Wagner 04 Feb '19

04 Feb '19

Dear all, we will have to do another maintenance. It will begin tomorrow at 9:00 o'clock. Sorry for the short period, in which we could inform you. Longer running jobs hda to be requeued. That are the following JobIDs: 117826, 116379 Best Marcus -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wagner(a)itc.rwth-aachen.de www.itc.rwth-aachen.de

1 0

interactive sessions, X11 forwarding
by Alekseeva, Uliana 31 Jan '19

31 Jan '19

Hi all, I have a question: how to do the X11 forwarding during interactive sessions? What I do: srun --nodes=4 --account=jara0172 --time=06:00:00 --exclusive --x11 --pty /usr/local_rwth/bin/zsh But if I try to run xclock, I get an error: X11 connection rejected because of wrong authentication. Error: Can't open display: localhost:99.0 Best regards, Uliana --------- Dr. Uliana Alekseeva Forschungszentrum Jülich Institute for Advanced Simulation D-52425 Jülich Phone: (49) 2461 61 3429 IT Center RWTH Aachen University HPC Group Seffenter Weg 23 52074 Aachen Phone: (49) 241 80 29711

2 1

missing modules on backend
by Jonas Becker 25 Jan '19

25 Jan '19

Hi, today the jobs I submit to CLAIX-2018 exit within one second. The new modules, like intelmpi/2019 and openmpi/3.1.3, do not exist on the back end host executing the batch script, nrm020.hpc.itc.rwth-aachen.de . +(0):ERROR:0: Unable to locate a modulefile for 'openmpi/3.1.3' Instead, the module list on the back end seems to be the one of the old LFS cluster parts. (the Claix2018 login node does still offer all new modules) Fixing the --partition=c18m flag does not change this for me. Example Job with error: Job ID 90942 Best, Jonas

4 4

SLURM Array Jobs
by Philipp Berger 24 Jan '19

24 Jan '19

Dear list, I am looking to run large numbers of jobs as an Array Job using SLURM. I created a job file "jobs.txt", containing one configuration per line. Using SLURM_ARRAY_TASK_ID I select the appropriate line for the current task and execute the corresponding configuration in the batch script. My test file currently holds 1308 configurations, which I was unable to submit using sbatch, as the MaxArrayTask variable seems to be set to 1001. What is the optimal/proposed way of scheduling large numbers of configurations? I need one core (or thread?!) per configuration (#SBATCH --cpus-per-task 1 and #SBATCH --ntasks-per-core 1 (are both necessary?)), 20G of memory per configuration (#SBATCH --mem-per-cpu=20480M). My hope was to build these large configuration lists and then later submit them to our own batch system so that it would use all available nodes to crunch all configurations blockwise serialised as fast as possible. Is that not what the SLURM Array Jobs are supposed to do? Kind regards, Philipp -- Philipp Berger https://moves.rwth-aachen.de/people/berger/ Software Modeling and Verification Group RWTH Aachen University Phone +49/241/80-21206 Ahornstraße 55, 52056 Aachen, Germany

2 1

Fwd: IntelMPI/2019 problems with our code
by Marcus Wagner 18 Jan '19

18 Jan '19

-------- Forwarded Message -------- Subject: IntelMPI/2019 problems with our code Date: Fri, 18 Jan 2019 11:44:57 +0100 From: Jonas Becker <jonas.becker2(a)rwth-aachen.de> To: Marcus Wagner <wagner(a)itc.rwth-aachen.de> Hi, our simulation aborts on CLAIX-2018 when using the intelmpi/2019 or intelmpi/2018 module. The new OpenMPI module works for us. The master rank waits for data from the worker ranks in a busy loop: while(!flag) { if (is_chkpt_time()) M_write(); MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &flag, &stat); } The problem occurs pretty much reproducibly ~9 minutes after start of the batch job. It occurs even at 2 mpi tasks (one master and one worker process). During the first 9 minutes, the simulation works flawlessly. All calls are in the MPI_COMM_WORLD communicator and work up to this point. Then the simulation aborts: Abort(635909) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Iprobe: Invalid communicator, error stack: PMPI_Iprobe(123): MPI_Iprobe(src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG, MPI_COMM_WORLD, flag=0x7ffe77c98c30, status=0x7ffe77c98c1c) failed PMPI_Iprobe(90).: Invalid communicator Do some of you successfully use intelmpi2018/2019 on the Claix-2018 cluster? Our MPI code may contain a few mistakes that throw off intelmpi but not openmpi. If simulations >10minutes are possible for everyone else, we will look into finding and fixing it in our own code. Another thing is that every time intelmpi is run in claix2018 batch mode, it throws this warning: MPI startup(): I_MPI_JOB_STARTUP_TIMEOUT environment variable is not supported. MPI startup(): To check the list of supported variables, use the impi_info utility or refer to https://software.intel.com/en-us/mpi-library/documentation/get-started. Best, Jonas On 1/17/19 10:04 AM, Marcus Wagner wrote: > Dear all, > > > we canceled the actual maintenance, but made a new one for tomorrow 9 > o'clock. So all Jobs, which finish before that time, will run now. > > > Btw. > > this list was not intended as a announcement list from our side. > Weren't there any problems? Is everything clear regarding SLURM and > CLAIX18 for you? > > If that is the case, we are really happy, but I barely can believe this. > > > > Best > Marcus > >

1 1

Maintenance ended
by Marcus Wagner 18 Jan '19

18 Jan '19

Best Marcus -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wagner(a)itc.rwth-aachen.de www.itc.rwth-aachen.de

1 0

Maintenance interruption
by Marcus Wagner 17 Jan '19

17 Jan '19

Dear all, we canceled the actual maintenance, but made a new one for tomorrow 9 o'clock. So all Jobs, which finish before that time, will run now. Btw. this list was not intended as a announcement list from our side. Weren't there any problems? Is everything clear regarding SLURM and CLAIX18 for you? If that is the case, we are really happy, but I barely can believe this. Best Marcus -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wagner(a)itc.rwth-aachen.de www.itc.rwth-aachen.de

2 3

Maintenance of CLAIX18
by Marcus Wagner 16 Jan '19

16 Jan '19

Good morning everyone, we have to do a maintenance now, so no new jobs will be scheduled onto the hosts. The running jobs will not be affected. With kind regards Marcus Wagner -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wagner(a)itc.rwth-aachen.de www.itc.rwth-aachen.de

1 0