Hi, today the jobs I submit to CLAIX-2018 exit within one second. The new modules, like intelmpi/2019 and openmpi/3.1.3, do not exist on the back end host executing the batch script, nrm020.hpc.itc.rwth-aachen.de . +(0):ERROR:0: Unable to locate a modulefile for 'openmpi/3.1.3' Instead, the module list on the back end seems to be the one of the old LFS cluster parts. (the Claix2018 login node does still offer all new modules) Fixing the --partition=c18m flag does not change this for me. Example Job with error: Job ID 90942 Best, Jonas
Hi Jonas, thank you for the hint, that has been fixed now. Best Marcus On 1/24/19 11:42 AM, Jonas Becker wrote:
Hi,
today the jobs I submit to CLAIX-2018 exit within one second.
The new modules, like intelmpi/2019 and openmpi/3.1.3, do not exist on the back end host executing the batch script, nrm020.hpc.itc.rwth-aachen.de .
+(0):ERROR:0: Unable to locate a modulefile for 'openmpi/3.1.3'
Instead, the module list on the back end seems to be the one of the old LFS cluster parts. (the Claix2018 login node does still offer all new modules)
Fixing the --partition=c18m flag does not change this for me.
Example Job with error: Job ID 90942
Best, Jonas
_______________________________________________ claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de
-- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wagner@itc.rwth-aachen.de www.itc.rwth-aachen.de
Hi all, I am trying to use ANSYS CFX on CLAIX-18, and seem to have a similar problem with (or without) module files on the back end. I get an error message like this: “<IBM Platform MPI>: : warning, dlopen of libhwloc.so failed /opt/MPI/openmpi-3.1.3/linux/intel_16.0.8.266/lib/linux_amd64/libhwloc.so: cannot open shared object file: No such file or directory” (e.g. job no. 92761) Submitting the same batch script on cluster-linux.rz.rwth-aachen.de while having loaded the same modules, everything works fine. Is there still something missing? Best, Johannes Von: Marcus Wagner [mailto:wagner@itc.rwth-aachen.de] Gesendet: Donnerstag, 24. Januar 2019 13:14 An: claix18-slurm-pilot@lists.rwth-aachen.de Betreff: [claix18-slurm-pilot] Re: missing modules on backend Hi Jonas, thank you for the hint, that has been fixed now. Best Marcus On 1/24/19 11:42 AM, Jonas Becker wrote: Hi, today the jobs I submit to CLAIX-2018 exit within one second. The new modules, like intelmpi/2019 and openmpi/3.1.3, do not exist on the back end host executing the batch script, nrm020.hpc.itc.rwth-aachen.de . +(0):ERROR:0: Unable to locate a modulefile for 'openmpi/3.1.3' Instead, the module list on the back end seems to be the one of the old LFS cluster parts. (the Claix2018 login node does still offer all new modules) Fixing the --partition=c18m flag does not change this for me. Example Job with error: Job ID 90942 Best, Jonas _______________________________________________ claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de<mailto:claix18-slurm-pilot@lists.rwth-aachen.de> To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de<mailto:claix18-slurm-pilot-leave@lists.rwth-aachen.de> -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wagner@itc.rwth-aachen.de<mailto:wagner@itc.rwth-aachen.de> www.itc.rwth-aachen.de<http://www.itc.rwth-aachen.de>
Hi all, On 01/24/2019 04:16 PM, Janssen, Johannes wrote:
I am trying to use ANSYS CFX on CLAIX-18, and seem to have a similar problem with (or without) module files on the back end. I get an error message like this:
“<IBM Platform MPI>: : warning, dlopen of libhwloc.so failed /opt/MPI/openmpi-3.1.3/linux/intel_16.0.8.266/lib/linux_amd64/libhwloc.so: cannot open shared object file: No such file or directory” (e.g. job no. 92761)
I am really puzzled. *IBM Platform MPI* - ??? The patch
/opt/MPI/openmpi-3.1.3/linux/intel_16.0.8.266/lib/linux_amd64/libhwloc.so seem to be a compination of $MPI_ROOT (after loading 'intel/16.0' instead of 'intel' module and 'openmpi' instead if 'intelmpi' module), and some suffix '.../lib/linux_amd64/libhwloc.so' which does not exist. Note that 'linux_amd64' subdirectory is and was not available in any openmpi variant....
I am not an ANSYS specialist, but the error is not in Open MPI itself but in the ANSYS, I would say. Best, Paul Kapinos -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915
Hi all, this is an old discussion between me and Paul. This has to do with the default loaded modules, besides the fact, that you load other modules, than our default. ISV software often brings its own MPI with it, because it is thoroughly tested with these versions. Loading other MPIs might disturb the ISV software. It seems that IBM MPI respects $MPI_ROOT which is set by the openmpi module. So the PATH /rwthfs/rz/SW/ANSYS/v192/commonfiles/MPI/IBM/9.1.4.3/linx64/lib/linux_amd64/libhwloc.so becomes /opt/MPI/openmpi-3.1.3/linux/intel_16.0.8.266/lib/linux_amd64/libhwloc.so So, there are few options, you could try. 1. unload the loaded mpi-module, export the envvariable R_NODEFMODS with some value (true e.g.), unset all variables, which have been set by the mpi module 2. load the intelmpi module instead openmpi, use as start-method "Intel MPI Distributed Parallel" 3. unload the loaded mpi-module, export the envvariable R_NODEFMODS with some value (true e.g.), unset all variables, which have been set by the mpi module, use as start-method dependent on the ansys version: "Platform MPI Local Parallel" < ansys v18.0 <= "IBM MPI Local Parallel" Best Marcus P.S. I just saw, that HP MPI start method is deprecated and unsupported on v17.0 and it does not exist any further on v19.2, so 1. might not be a real option On 1/24/19 5:16 PM, Paul Kapinos wrote:
Hi all,
On 01/24/2019 04:16 PM, Janssen, Johannes wrote:
I am trying to use ANSYS CFX on CLAIX-18, and seem to have a similar problem with (or without) module files on the back end. I get an error message like this:
“<IBM Platform MPI>: : warning, dlopen of libhwloc.so failed /opt/MPI/openmpi-3.1.3/linux/intel_16.0.8.266/lib/linux_amd64/libhwloc.so: cannot open shared object file: No such file or directory” (e.g. job no. 92761)
I am really puzzled. *IBM Platform MPI* - ???
The patch
/opt/MPI/openmpi-3.1.3/linux/intel_16.0.8.266/lib/linux_amd64/libhwloc.so seem to be a compination of $MPI_ROOT (after loading 'intel/16.0' instead of 'intel' module and 'openmpi' instead if 'intelmpi' module), and some suffix '.../lib/linux_amd64/libhwloc.so' which does not exist. Note that 'linux_amd64' subdirectory is and was not available in any openmpi variant....
I am not an ANSYS specialist, but the error is not in Open MPI itself but in the ANSYS, I would say.
Best,
Paul Kapinos
_______________________________________________ claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de
-- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wagner@itc.rwth-aachen.de www.itc.rwth-aachen.de
participants (4)
-
Janssen, Johannes
-
Jonas Becker
-
Marcus Wagner
-
Paul Kapinos