Hi,
a few questions/points form my side:
- Claix 18 is using Intel OPA. What network topology is used? I guess it
is a Fat tree. Is it blocking or non-blocking? Is it 1:2 blocking as on
Claix 16?
- Have you configured topology-aware resource allocation within the
SLURM scheduler, in other words does the scheduler knows the topology
and tries to minimize the hop-count?
- I assume TurboBoost is enabled by default? Is it possible (or will it
be possible in the future) to include an option to switch TurboBoost
off? E.g. on JURECA it is possible to disable TurboBoost with `#SBATCH
--disable-turbomode` for measurements.
Otherwise it is possible to set frequencies with likwid?
- I tried to run jobs with 256 nodes. I am getting an MPI error (cf. job
92414):
nrm008.hpc.itc.rwth-aachen.de.233340PSM2 no hfi units are active (err=23)
[245] MPI startup(): tmi fabric is not available and fallback fabric is
not enabled
Any ideas where this is coming from? Should I manually adjust
I_MPI_FABRICS? I don't want to set the fallback fabric, since this would
be TCP and would significantly impact performance. Affected jobs are not
canceled but are running into the time limit.
Theses errors are only occurring on the nrm nodes, not on the ncm nodes.
Could there be a problem with the nrm nodes? Currently I am using the
partition c18m, which contains both nodes types. How can I only select
ncm nodes?
- Historic jobs can not be viewed with scontrol or sstat. sacct on the
other hand works. For example:
$ scontrol show job 90949
slurm_load_jobs error: Invalid job id specified
$ sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j 90945
AveCPU AvePages AveRSS AveVMSize JobID
---------- ---------- ---------- ---------- ------------
sstat: error: couldn't get steps for job 90945
- Some of my jobs don't appear in the queue and are not scheduled, even
if sbatch returns `Submitted batch job 92365`
- You are using a hand-build module system. One issue with this approach
is that dependencies are not dissolved properly. For example loading the
module python does something unexpected. A short example:
$ module load intel; ldd main.x | grep mkl
intel/19.0 already loaded, doing
nothing [ WARNING ]
libmkl_intel_lp64.so =>
/opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_intel_lp64.so
(0x00002ac58b9ee000)
libmkl_core.so =>
/opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_core.so
(0x00002ac58c53c000)
libmkl_intel_thread.so =>
/opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_intel_thread.so
(0x00002ac5906c8000)
$ module load python; ldd main.x | grep mkl
Loading python 2.7.12 [ OK ]
The SciPy Stack available: http://www.scipy.org/stackspec.html
Build with GCC compilers.
libmkl_intel_lp64.so =>
/usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_intel_lp64.so
(0x00002abea70ad000)
libmkl_core.so =>
/usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_core.so
(0x00002abea7bcb000)
libmkl_intel_thread.so =>
/usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_intel_thread.so
(0x00002abea96ba000)
Using mkl_get_version_string() shows that the python mkl is version
2017.0.0 instead of the expected 2019.0.1 version that should be loaded.
A different approach to a hand-build module system would be using
easybuild to create the module system. This would avoid such issues.
The blue print paper can be found here:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7830454
JSC has published there easybuild configuration on github:
https://github.com/easybuilders/JSC
The config files from HPC-UGent are also publicly available.
Best,
Sebastian
Hi all,
I have looked into the Network Performance on CLAIX18. I have measured
latency and bandwidth for intra and inter node communication. I have
used the Intel IMB PingPong Benchmark complied with the modules
intel/19.0 and intelmpi/2019. To get a sufficient statistic I have
submitted 64 jobs with 1 node using 2 tasks and 64 jobs with 2 nodes
using 1 task each, respectively. The scheduler has started the jobs on
different sets of nodes. I have attached the results, showing the
configuration and the average, min and max of the measurement.
Let's first look at the inter node communication: I have measured an
average latency of 2.12 usec. In the best case I measured 1 usec and in
the maximum is 7.1 usec. The bandwidth is on average 6488 Mbytes/sec.
The maximum is 11995 Mbytes/sec and minimum is 2483 Mbytes/sec.
The latency for intra node communication looks okay, however the
bandwidth shows variation.
On average theses results don't correspond with the advertised values
from Intel. Either I have done something wrong or I haven't understood
the topology or there is a problem with the machine.
Have you run such a benchmark as well? Can you observe something similar?
@Marcus: To get a better understanding of the machine, could you please
share a bit more information on the network topology:
- How many levels does the tree have?
- On which level is the tree pruned?
- Could you send me the connectivity file / connection map, e.g. a list
of cables connecting the nodes, edge and core switches? I would like to
add the hop count information into my result. (I have a script for
computing the hop count from a connection map. Depending on the format I
just need to adjust the reading routine)
Cheers,
Sebastian
Dear list,
apparently the Clang installation is build against the headers of the
default GCC 4.8.5, which limits its understanding of C++ to C++11. Would
it be possible to at least target the headers of GCC5? That would at
least support C++14. Of course, with GCC8.2.0 being available as well,
C++17 should also be possible ;)
Kind regards,
Philipp
--
Philipp Berger https://moves.rwth-aachen.de/people/berger/
Software Modeling and Verification Group
RWTH Aachen University Phone +49/241/80-21206
Ahornstraße 55, 52056 Aachen, Germany
Hi all,
I have a question: how to do the X11 forwarding during interactive sessions?
What I do:
srun --nodes=4 --account=jara0172 --time=06:00:00 --exclusive --x11 --pty /usr/local_rwth/bin/zsh
But if I try to run xclock, I get an error:
X11 connection rejected because of wrong authentication.
Error: Can't open display: localhost:99.0
Best regards,
Uliana
---------
Dr. Uliana Alekseeva
Forschungszentrum Jülich
Institute for Advanced Simulation
D-52425 Jülich
Phone: (49) 2461 61 3429
IT Center RWTH Aachen University
HPC Group
Seffenter Weg 23
52074 Aachen
Phone: (49) 241 80 29711
Hi,
today the jobs I submit to CLAIX-2018 exit within one second.
The new modules, like intelmpi/2019 and openmpi/3.1.3, do not exist on
the back end host executing the batch script,
nrm020.hpc.itc.rwth-aachen.de .
+(0):ERROR:0: Unable to locate a modulefile for 'openmpi/3.1.3'
Instead, the module list on the back end seems to be the one of the old
LFS cluster parts. (the Claix2018 login node does still offer all new
modules)
Fixing the --partition=c18m flag does not change this for me.
Example Job with error: Job ID 90942
Best,
Jonas
Dear list,
I am looking to run large numbers of jobs as an Array Job using SLURM.
I created a job file "jobs.txt", containing one configuration per line.
Using SLURM_ARRAY_TASK_ID I select the appropriate line for the current
task and execute the corresponding configuration in the batch script.
My test file currently holds 1308 configurations, which I was unable to
submit using sbatch, as the MaxArrayTask variable seems to be set to
1001. What is the optimal/proposed way of scheduling large numbers of
configurations?
I need one core (or thread?!) per configuration (#SBATCH --cpus-per-task
1 and #SBATCH --ntasks-per-core 1 (are both necessary?)), 20G of memory
per configuration (#SBATCH --mem-per-cpu=20480M).
My hope was to build these large configuration lists and then later
submit them to our own batch system so that it would use all available
nodes to crunch all configurations blockwise serialised as fast as
possible. Is that not what the SLURM Array Jobs are supposed to do?
Kind regards,
Philipp
--
Philipp Berger https://moves.rwth-aachen.de/people/berger/
Software Modeling and Verification Group
RWTH Aachen University Phone +49/241/80-21206
Ahornstraße 55, 52056 Aachen, Germany
-------- Forwarded Message --------
Subject: IntelMPI/2019 problems with our code
Date: Fri, 18 Jan 2019 11:44:57 +0100
From: Jonas Becker <jonas.becker2(a)rwth-aachen.de>
To: Marcus Wagner <wagner(a)itc.rwth-aachen.de>
Hi,
our simulation aborts on CLAIX-2018 when using the intelmpi/2019 or
intelmpi/2018 module. The new OpenMPI module works for us.
The master rank waits for data from the worker ranks in a busy loop:
while(!flag) {
if (is_chkpt_time()) M_write();
MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD,
&flag, &stat);
}
The problem occurs pretty much reproducibly ~9 minutes after start of
the batch job. It occurs even at 2 mpi tasks (one master and one worker
process). During the first 9 minutes, the simulation works flawlessly.
All calls are in the MPI_COMM_WORLD communicator and work up to this
point. Then the simulation aborts:
Abort(635909) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Iprobe:
Invalid communicator, error stack:
PMPI_Iprobe(123): MPI_Iprobe(src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG,
MPI_COMM_WORLD, flag=0x7ffe77c98c30, status=0x7ffe77c98c1c) failed
PMPI_Iprobe(90).: Invalid communicator
Do some of you successfully use intelmpi2018/2019 on the Claix-2018
cluster?
Our MPI code may contain a few mistakes that throw off intelmpi but not
openmpi. If simulations >10minutes are possible for everyone else, we
will look into finding and fixing it in our own code.
Another thing is that every time intelmpi is run in claix2018 batch
mode, it throws this warning:
MPI startup(): I_MPI_JOB_STARTUP_TIMEOUT environment variable is not
supported.
MPI startup(): To check the list of supported variables, use the
impi_info utility or refer to
https://software.intel.com/en-us/mpi-library/documentation/get-started.
Best,
Jonas
On 1/17/19 10:04 AM, Marcus Wagner wrote:
> Dear all,
>
>
> we canceled the actual maintenance, but made a new one for tomorrow 9
> o'clock. So all Jobs, which finish before that time, will run now.
>
>
> Btw.
>
> this list was not intended as a announcement list from our side.
> Weren't there any problems? Is everything clear regarding SLURM and
> CLAIX18 for you?
>
> If that is the case, we are really happy, but I barely can believe this.
>
>
>
> Best
> Marcus
>
>
Best
Marcus
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner(a)itc.rwth-aachen.de
www.itc.rwth-aachen.de
Dear all,
we canceled the actual maintenance, but made a new one for tomorrow 9
o'clock. So all Jobs, which finish before that time, will run now.
Btw.
this list was not intended as a announcement list from our side.
Weren't there any problems? Is everything clear regarding SLURM and
CLAIX18 for you?
If that is the case, we are really happy, but I barely can believe this.
Best
Marcus
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner(a)itc.rwth-aachen.de
www.itc.rwth-aachen.de
Good morning everyone,
we have to do a maintenance now, so no new jobs will be scheduled onto
the hosts. The running jobs will not be affected.
With kind regards
Marcus Wagner
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner(a)itc.rwth-aachen.de
www.itc.rwth-aachen.de