On 01/24/2019 01:59 PM, Sebastian Achilles wrote:
- You are using a hand-build module system.
We do.
One issue with this approach is that dependencies are not dissolved properly. in your below case: That is not a bug, That is a feature.
For example loading the module python does something unexpected. A short example:
$ module load intel; ldd main.x | grep mkl intel/19.0 already loaded, doing nothing [ WARNING ] libmkl_intel_lp64.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_intel_lp64.so (0x00002ac58b9ee000) libmkl_core.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_core.so (0x00002ac58c53c000) libmkl_intel_thread.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_intel_thread.so (0x00002ac5906c8000)
Yessir, you binary use the MKL from the Intel compiler loaded [via LD_LIBRARY_PATH]. In general you have to load the *same* modules for running the binary as used at compile+link time. That enshure the same environment for running your main.x executable as at the build time - only if your binary *and all resolved libs* are unchanged your results are 99.9% reproducible. (Remaining 0.1% goes to the Linux updates you cannot avoid; Yes we got the situation when we must recompile parts of software after 'minor OS upgrade'). On the other hand you often like to get updates; in 99% you can use the same binary with newer versions of libraries - that is why we in general update the minor versions (bug fix releases) without notice to the users. When changing major version (like in the cluster going from intel/16 to intel/19 compiler) you binary *could* stay runnable however we do not promise it - instead you must either - get the old modules (typically supported for limited time as workaround) - RECOMPILE your application (you do not still drive your grandpa's VW Bug, didn't you?)
$ module load python; ldd main.x | grep mkl
Rhetorical question: Would you please tell us what do you want to do: [1] - would you like to run main.x application [2] - would you like to use the Python and maybe NumPy, SciPy and so on? If [2] so would you love, if the MKL version linked to that NumPy used in that Python installation would be changed and switched and twisted after almost any module command the user issued? I believe you called that behaviour
does something unexpected. ... didn't you? So in case of Python we deceided for 'reproducibility' instead of 'interchargeability' and thus is why the MKL from python is prepended. But even here you have the freedom to [break your environment by] change the LD_LIBRARY_PATH envvar and say to use MKL/2019 with Python/NumPy not build to use it.
If [1], so WHY TO HELL YOU LOADED A PYTHON MODULE?? instead of running your application in the environment you used to build it? (Try to load the matlab module and call the 'kate' text editor. It won't start. Oh noh! There are incompatible software products in that world!)
Loading python 2.7.12 [ OK ] The SciPy Stack available: http://www.scipy.org/stackspec.html Build with GCC compilers. libmkl_intel_lp64.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_intel_lp64.so (0x00002abea70ad000) libmkl_core.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_core.so (0x00002abea7bcb000) libmkl_intel_thread.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_intel_thread.so (0x00002abea96ba000)
Using mkl_get_version_string() shows that the python mkl is version 2017.0.0 instead of the expected 2019.0.1 version that should be loaded.
Yessir. But instead of using MKL after loading a python module you could think about divide your tasks to tiny, small environment versions. KISS: Keep It Simple and Stupid is in general a good idea, and especially in software business. On the other hand we always enjoy to investigate all the cases when the used did yet another shoot into his feet by 'just small environment changes' (sourcing that file from Joe, or - running gag - some module commands hard coded into environment; for example:
Unloading openmpi 1.10.4 [ OK ] Unloading Intel Suite 16.0.2.181 [ OK ] +(0):ERROR:0: Unable to locate a modulefile for 'intel/17.0.4.196' /opt/MPI/openmpi-1.10.4/linux/none [ ERROR ] No openmpi/1.10.4 for none compiler available [ ERROR ] Loading MISC environment [ OK ] Loading gnuplot 5.2.2 [ OK ] or Sie sind mit dem Knoten 'login18-1' verbunden .... Unloading intelmpi 2018.4.274 [ OK ] Unloading Intel Suite 19.0.1.144 [ OK ] +(0):ERROR:0: Unable to locate a modulefile for 'intel/17.0.4.196' No intelmpi/2018.4.274 for none compiler available, abort. [ ERROR ] ... so we do not prohibit it yet: we just *do not recommend* to do some steps which lead to some Interesting Investigations some when later. That secure our jobs.
A different approach to a hand-build module system would be using easybuild to create the module system. This would avoid such issues.
We know about easybuild (a note from Jülich: "Man müsse 'easy' streichen".) Would you pinpoint us to a recipe to build ABINIT with Intel 19 compiler? (*) https://forum.abinit.org/viewtopic.php?f=17&t=4007 Or to build BOOST with PGI compilers? (still in my mailbox, damn). Or CP2K using Intel MPI with ScaLAPACK from MKL? (A developer got an account at our cluster as we have that bunch of compilers available...) Maybe latest patch of VASP with Wannier library (discussion with developers deceased some time ago..) All jesting aside, we *considered* to introduce easybuild. It need time [we do not have] and effort; especially if you want to get Latest and Greatest versions (and our users DO!) - you will first invest time to learn easybuild, then to change it to some readable not blown-up statethe to fix the application itself, then to fix easybuild (Oh My Deer Its Python!!) then to repeat... Somewhen later. Have a nice evening Paul Kapinos -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915