General questions and problems with 256 node jobs
Hi, a few questions/points form my side: - Claix 18 is using Intel OPA. What network topology is used? I guess it is a Fat tree. Is it blocking or non-blocking? Is it 1:2 blocking as on Claix 16? - Have you configured topology-aware resource allocation within the SLURM scheduler, in other words does the scheduler knows the topology and tries to minimize the hop-count? - I assume TurboBoost is enabled by default? Is it possible (or will it be possible in the future) to include an option to switch TurboBoost off? E.g. on JURECA it is possible to disable TurboBoost with `#SBATCH --disable-turbomode` for measurements. Otherwise it is possible to set frequencies with likwid? - I tried to run jobs with 256 nodes. I am getting an MPI error (cf. job 92414): nrm008.hpc.itc.rwth-aachen.de.233340PSM2 no hfi units are active (err=23) [245] MPI startup(): tmi fabric is not available and fallback fabric is not enabled Any ideas where this is coming from? Should I manually adjust I_MPI_FABRICS? I don't want to set the fallback fabric, since this would be TCP and would significantly impact performance. Affected jobs are not canceled but are running into the time limit. Theses errors are only occurring on the nrm nodes, not on the ncm nodes. Could there be a problem with the nrm nodes? Currently I am using the partition c18m, which contains both nodes types. How can I only select ncm nodes? - Historic jobs can not be viewed with scontrol or sstat. sacct on the other hand works. For example: $ scontrol show job 90949 slurm_load_jobs error: Invalid job id specified $ sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j 90945 AveCPU AvePages AveRSS AveVMSize JobID ---------- ---------- ---------- ---------- ------------ sstat: error: couldn't get steps for job 90945 - Some of my jobs don't appear in the queue and are not scheduled, even if sbatch returns `Submitted batch job 92365` - You are using a hand-build module system. One issue with this approach is that dependencies are not dissolved properly. For example loading the module python does something unexpected. A short example: $ module load intel; ldd main.x | grep mkl intel/19.0 already loaded, doing nothing [ WARNING ] libmkl_intel_lp64.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_intel_lp64.so (0x00002ac58b9ee000) libmkl_core.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_core.so (0x00002ac58c53c000) libmkl_intel_thread.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_intel_thread.so (0x00002ac5906c8000) $ module load python; ldd main.x | grep mkl Loading python 2.7.12 [ OK ] The SciPy Stack available: http://www.scipy.org/stackspec.html Build with GCC compilers. libmkl_intel_lp64.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_intel_lp64.so (0x00002abea70ad000) libmkl_core.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_core.so (0x00002abea7bcb000) libmkl_intel_thread.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_intel_thread.so (0x00002abea96ba000) Using mkl_get_version_string() shows that the python mkl is version 2017.0.0 instead of the expected 2019.0.1 version that should be loaded. A different approach to a hand-build module system would be using easybuild to create the module system. This would avoid such issues. The blue print paper can be found here: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7830454 JSC has published there easybuild configuration on github: https://github.com/easybuilders/JSC The config files from HPC-UGent are also publicly available. Best, Sebastian
Hi Sebastian, On 1/24/19 1:59 PM, Sebastian Achilles wrote:
Hi,
a few questions/points form my side:
- Claix 18 is using Intel OPA. What network topology is used? I guess it is a Fat tree. Is it blocking or non-blocking? Is it 1:2 blocking as on Claix 16?
To make it short: Fat Tree, right, blocking, yes.
- Have you configured topology-aware resource allocation within the SLURM scheduler, in other words does the scheduler knows the topology and tries to minimize the hop-count?
Since we are still in the acceptance phase, this is not the case. But will be in the future.
- I assume TurboBoost is enabled by default? Is it possible (or will it be possible in the future) to include an option to switch TurboBoost off? E.g. on JURECA it is possible to disable TurboBoost with `#SBATCH --disable-turbomode` for measurements. Otherwise it is possible to set frequencies with likwid?
I'm not sure, if TurboBoost is activated, normally we try to fix the frequency to the maximum. Perhaps Sascha and/or Paul can answer this part.
- I tried to run jobs with 256 nodes. I am getting an MPI error (cf. job 92414): nrm008.hpc.itc.rwth-aachen.de.233340PSM2 no hfi units are active (err=23) [245] MPI startup(): tmi fabric is not available and fallback fabric is not enabled
Any ideas where this is coming from? Should I manually adjust I_MPI_FABRICS? I don't want to set the fallback fabric, since this would be TCP and would significantly impact performance. Affected jobs are not canceled but are running into the time limit.
Theses errors are only occurring on the nrm nodes, not on the ncm nodes. Could there be a problem with the nrm nodes? Currently I am using the partition c18m, which contains both nodes types. How can I only select ncm nodes?
The nrm nodes are the first batch of the new Tier3 system. Seems, something is still odd with them. I took them out of the partition again.
- Historic jobs can not be viewed with scontrol or sstat. sacct on the other hand works. For example: $ scontrol show job 90949 slurm_load_jobs error: Invalid job id specified
not sure about that one, I always thought, you could get details of jobs shown with squeue. This means, completed jobs will not be shown. I could find nothing about that in the man page though.
$ sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j 90945 AveCPU AvePages AveRSS AveVMSize JobID ---------- ---------- ---------- ---------- ------------ sstat: error: couldn't get steps for job 90945
excerpt from the manpage: DESCRIPTION Status information for running jobs invoked with Slurm. So, with sstat, you can only observe running jobs.
- Some of my jobs don't appear in the queue and are not scheduled, even if sbatch returns `Submitted batch job 92365`
[2019-01-24T12:16:54.089] _slurm_rpc_submit_batch_job: JobId=92365 InitPrio=103522 usec=8187 [2019-01-24T12:16:54.227] email msg to : Slurm Job_id=92365 Name=1D-NEGF_execute_1 Began, Queued time 00:00:00 [2019-01-24T12:16:54.227] sched: Allocate JobId=92365 NodeList=nrm023 #CPUs=24 Partition=c18m [2019-01-24T12:16:54.365] prolog_running_decr: Configuration for JobId=92365 is complete [2019-01-24T12:16:56.296] _job_complete: JobId=92365 WEXITSTATUS 127 [2019-01-24T12:16:56.296] email msg to : Slurm Job_id=92365 Name=1D-NEGF_execute_1 Failed, Run time 00:00:02, FAILED, ExitCode 127 [2019-01-24T12:16:56.296] _job_complete: JobId=92365 done $> sacct -j 92365 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 92365 1D-NEGF_e+ c18m default 24 FAILED 127:0 92365.batch batch default 24 FAILED 127:0 92365.extern extern default 24 COMPLETED 0:0 92365.0 1D-NEGF-M+ default 24 FAILED 127:0 It has been immediately scheduled and failed. You should have received an email.
- You are using a hand-build module system. One issue with this approach is that dependencies are not dissolved properly. For example loading the module python does something unexpected. A short example:
$ module load intel; ldd main.x | grep mkl intel/19.0 already loaded, doing nothing [ WARNING ] libmkl_intel_lp64.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_intel_lp64.so (0x00002ac58b9ee000) libmkl_core.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_core.so (0x00002ac58c53c000) libmkl_intel_thread.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_intel_thread.so (0x00002ac5906c8000)
$ module load python; ldd main.x | grep mkl Loading python 2.7.12 [ OK ] The SciPy Stack available: http://www.scipy.org/stackspec.html Build with GCC compilers. libmkl_intel_lp64.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_intel_lp64.so (0x00002abea70ad000) libmkl_core.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_core.so (0x00002abea7bcb000) libmkl_intel_thread.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_intel_thread.so (0x00002abea96ba000)
Using mkl_get_version_string() shows that the python mkl is version 2017.0.0 instead of the expected 2019.0.1 version that should be loaded.
A different approach to a hand-build module system would be using easybuild to create the module system. This would avoid such issues. The blue print paper can be found here: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7830454 JSC has published there easybuild configuration on github: https://github.com/easybuilders/JSC The config files from HPC-UGent are also publicly available.
Hi Paul, I think, this is your part.
Best, Sebastian
Best, Marcus
_______________________________________________ claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de
-- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wagner@itc.rwth-aachen.de www.itc.rwth-aachen.de
Hi Marcus, thank you very much for your answer! Do the nrm have a different configuration compared to the ncm nodes? I am still wondering why sometimes my job fails when I just submit the same job multiple times. Most of the jobs that failed run on the nrm nodes (I am getting a `Bus error`). I am not specifying the `#SBATCH --mem` option, since I assume that I will get the whole memory of that node. Is this correct? This is how I am used to use SLURM on JURECA and JUWELS. And this is how I understood the documentation: "NOTE: A memory size specification of zero is treated as a special case and grants the job access to all of the memory on each node. If the job is allocated multiple nodes in a heterogeneous cluster, the memory limit on each node will be that of the node in the allocation with the smallest memory size (same limit will apply to every node in the job's allocation)." Are nodes in SLURM on CLAIX18 scheduled exclusively? So when I request a certain number of nodes, is it ensured, that I am the only user running on these nodes? Have you implemented any kind of default CPU binding or pinning? Or does the user have specify this in there job scripts? Best, Sebastian On 24.01.19 14:23, Marcus Wagner wrote:
Hi Sebastian,
On 1/24/19 1:59 PM, Sebastian Achilles wrote:
Hi,
a few questions/points form my side:
- Claix 18 is using Intel OPA. What network topology is used? I guess it is a Fat tree. Is it blocking or non-blocking? Is it 1:2 blocking as on Claix 16?
To make it short: Fat Tree, right, blocking, yes.
- Have you configured topology-aware resource allocation within the SLURM scheduler, in other words does the scheduler knows the topology and tries to minimize the hop-count?
Since we are still in the acceptance phase, this is not the case. But will be in the future.
- I assume TurboBoost is enabled by default? Is it possible (or will it be possible in the future) to include an option to switch TurboBoost off? E.g. on JURECA it is possible to disable TurboBoost with `#SBATCH --disable-turbomode` for measurements. Otherwise it is possible to set frequencies with likwid?
I'm not sure, if TurboBoost is activated, normally we try to fix the frequency to the maximum. Perhaps Sascha and/or Paul can answer this part.
- I tried to run jobs with 256 nodes. I am getting an MPI error (cf. job 92414): nrm008.hpc.itc.rwth-aachen.de.233340PSM2 no hfi units are active (err=23) [245] MPI startup(): tmi fabric is not available and fallback fabric is not enabled
Any ideas where this is coming from? Should I manually adjust I_MPI_FABRICS? I don't want to set the fallback fabric, since this would be TCP and would significantly impact performance. Affected jobs are not canceled but are running into the time limit.
Theses errors are only occurring on the nrm nodes, not on the ncm nodes. Could there be a problem with the nrm nodes? Currently I am using the partition c18m, which contains both nodes types. How can I only select ncm nodes?
The nrm nodes are the first batch of the new Tier3 system. Seems, something is still odd with them. I took them out of the partition again.
- Historic jobs can not be viewed with scontrol or sstat. sacct on the other hand works. For example: $ scontrol show job 90949 slurm_load_jobs error: Invalid job id specified
not sure about that one, I always thought, you could get details of jobs shown with squeue. This means, completed jobs will not be shown. I could find nothing about that in the man page though.
$ sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j 90945 AveCPU AvePages AveRSS AveVMSize JobID ---------- ---------- ---------- ---------- ------------ sstat: error: couldn't get steps for job 90945
excerpt from the manpage: DESCRIPTION Status information for running jobs invoked with Slurm.
So, with sstat, you can only observe running jobs.
- Some of my jobs don't appear in the queue and are not scheduled, even if sbatch returns `Submitted batch job 92365`
[2019-01-24T12:16:54.089] _slurm_rpc_submit_batch_job: JobId=92365 InitPrio=103522 usec=8187 [2019-01-24T12:16:54.227] email msg to : Slurm Job_id=92365 Name=1D-NEGF_execute_1 Began, Queued time 00:00:00 [2019-01-24T12:16:54.227] sched: Allocate JobId=92365 NodeList=nrm023 #CPUs=24 Partition=c18m [2019-01-24T12:16:54.365] prolog_running_decr: Configuration for JobId=92365 is complete [2019-01-24T12:16:56.296] _job_complete: JobId=92365 WEXITSTATUS 127 [2019-01-24T12:16:56.296] email msg to : Slurm Job_id=92365 Name=1D-NEGF_execute_1 Failed, Run time 00:00:02, FAILED, ExitCode 127 [2019-01-24T12:16:56.296] _job_complete: JobId=92365 done
$> sacct -j 92365 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 92365 1D-NEGF_e+ c18m default 24 FAILED 127:0 92365.batch batch default 24 FAILED 127:0 92365.extern extern default 24 COMPLETED 0:0 92365.0 1D-NEGF-M+ default 24 FAILED 127:0
It has been immediately scheduled and failed. You should have received an email.
- You are using a hand-build module system. One issue with this approach is that dependencies are not dissolved properly. For example loading the module python does something unexpected. A short example:
$ module load intel; ldd main.x | grep mkl intel/19.0 already loaded, doing nothing [ WARNING ] libmkl_intel_lp64.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_intel_lp64.so (0x00002ac58b9ee000) libmkl_core.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_core.so (0x00002ac58c53c000) libmkl_intel_thread.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_intel_thread.so (0x00002ac5906c8000)
$ module load python; ldd main.x | grep mkl Loading python 2.7.12 [ OK ] The SciPy Stack available: http://www.scipy.org/stackspec.html Build with GCC compilers. libmkl_intel_lp64.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_intel_lp64.so (0x00002abea70ad000) libmkl_core.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_core.so (0x00002abea7bcb000) libmkl_intel_thread.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_intel_thread.so (0x00002abea96ba000)
Using mkl_get_version_string() shows that the python mkl is version 2017.0.0 instead of the expected 2019.0.1 version that should be loaded.
A different approach to a hand-build module system would be using easybuild to create the module system. This would avoid such issues. The blue print paper can be found here: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7830454 JSC has published there easybuild configuration on github: https://github.com/easybuilders/JSC The config files from HPC-UGent are also publicly available.
Hi Paul, I think, this is your part.
Best, Sebastian
Best, Marcus
_______________________________________________ claix18-slurm-pilot mailing list --claix18-slurm-pilot@lists.rwth-aachen.de To unsubscribe send an email toclaix18-slurm-pilot-leave@lists.rwth-aachen.de
-- Marcus Wagner, Dipl.-Inf.
IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wagner@itc.rwth-aachen.de www.itc.rwth-aachen.de
_______________________________________________ claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de
Hi Sebastian, On 1/28/19 9:09 AM, Sebastian Achilles wrote:
Hi Marcus,
thank you very much for your answer!
Do the nrm have a different configuration compared to the ncm nodes? I am still wondering why sometimes my job fails when I just submit the same job multiple times. Most of the jobs that failed run on the nrm nodes (I am getting a `Bus error`).
No, the nrm nodes have the very same configuration. It seems as if they had not been installed completely/tested thoroughly before. That is, why I took them out of service for the moment. Yet I'm a bit dazzled regarding the 'Bus error'. Do you have a bit more details for me? E.g. the job ID, or where and when the job ran?
I am not specifying the `#SBATCH --mem` option, since I assume that I will get the whole memory of that node. Is this correct? This is how I am used to use SLURM on JURECA and JUWELS. And this is how I understood the documentation: "NOTE: A memory size specification of zero is treated as a special case and grants the job access to all of the memory on each node. If the job is allocated multiple nodes in a heterogeneous cluster, the memory limit on each node will be that of the node in the allocation with the smallest memory size (same limit will apply to every node in the job's allocation)."
We try to prohibit "--mem" as option, as we would like the user to ask for memory per task. So, yes, please do not use --mem. We might considering the usage of "--mem=0" though, but not sure yet. We will have to discuss this internally.
Are nodes in SLURM on CLAIX18 scheduled exclusively? So when I request a certain number of nodes, is it ensured, that I am the only user running on these nodes?
All JARA-jobs are scheduled exclusively. We will schedule exclusive, if you need more than one node (like done atm. on LSF), but this is not active yet. We are discussing to use exclusive=user per default. Thus only jobs of the same user can be on one node. This can be overridden by #SBATCH --exclusive.
Have you implemented any kind of default CPU binding or pinning? Or does the user have specify this in there job scripts?
not yet, as we did not want to disturb the benchmarks by NEC on CLAIX18. As the GPU nodes are still beeing benchmarked, and this is a clusterwide option, we will not activate anything at the moment. So you will have to do this in in your script. Best Marcus
Best, Sebastian
On 24.01.19 14:23, Marcus Wagner wrote:
Hi Sebastian,
On 1/24/19 1:59 PM, Sebastian Achilles wrote:
Hi,
a few questions/points form my side:
- Claix 18 is using Intel OPA. What network topology is used? I guess it is a Fat tree. Is it blocking or non-blocking? Is it 1:2 blocking as on Claix 16?
To make it short: Fat Tree, right, blocking, yes.
- Have you configured topology-aware resource allocation within the SLURM scheduler, in other words does the scheduler knows the topology and tries to minimize the hop-count?
Since we are still in the acceptance phase, this is not the case. But will be in the future.
- I assume TurboBoost is enabled by default? Is it possible (or will it be possible in the future) to include an option to switch TurboBoost off? E.g. on JURECA it is possible to disable TurboBoost with `#SBATCH --disable-turbomode` for measurements. Otherwise it is possible to set frequencies with likwid?
I'm not sure, if TurboBoost is activated, normally we try to fix the frequency to the maximum. Perhaps Sascha and/or Paul can answer this part.
- I tried to run jobs with 256 nodes. I am getting an MPI error (cf. job 92414): nrm008.hpc.itc.rwth-aachen.de.233340PSM2 no hfi units are active (err=23) [245] MPI startup(): tmi fabric is not available and fallback fabric is not enabled
Any ideas where this is coming from? Should I manually adjust I_MPI_FABRICS? I don't want to set the fallback fabric, since this would be TCP and would significantly impact performance. Affected jobs are not canceled but are running into the time limit.
Theses errors are only occurring on the nrm nodes, not on the ncm nodes. Could there be a problem with the nrm nodes? Currently I am using the partition c18m, which contains both nodes types. How can I only select ncm nodes?
The nrm nodes are the first batch of the new Tier3 system. Seems, something is still odd with them. I took them out of the partition again.
- Historic jobs can not be viewed with scontrol or sstat. sacct on the other hand works. For example: $ scontrol show job 90949 slurm_load_jobs error: Invalid job id specified
not sure about that one, I always thought, you could get details of jobs shown with squeue. This means, completed jobs will not be shown. I could find nothing about that in the man page though.
$ sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j 90945 AveCPU AvePages AveRSS AveVMSize JobID ---------- ---------- ---------- ---------- ------------ sstat: error: couldn't get steps for job 90945
excerpt from the manpage: DESCRIPTION Status information for running jobs invoked with Slurm.
So, with sstat, you can only observe running jobs.
- Some of my jobs don't appear in the queue and are not scheduled, even if sbatch returns `Submitted batch job 92365`
[2019-01-24T12:16:54.089] _slurm_rpc_submit_batch_job: JobId=92365 InitPrio=103522 usec=8187 [2019-01-24T12:16:54.227] email msg to : Slurm Job_id=92365 Name=1D-NEGF_execute_1 Began, Queued time 00:00:00 [2019-01-24T12:16:54.227] sched: Allocate JobId=92365 NodeList=nrm023 #CPUs=24 Partition=c18m [2019-01-24T12:16:54.365] prolog_running_decr: Configuration for JobId=92365 is complete [2019-01-24T12:16:56.296] _job_complete: JobId=92365 WEXITSTATUS 127 [2019-01-24T12:16:56.296] email msg to : Slurm Job_id=92365 Name=1D-NEGF_execute_1 Failed, Run time 00:00:02, FAILED, ExitCode 127 [2019-01-24T12:16:56.296] _job_complete: JobId=92365 done
$> sacct -j 92365 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 92365 1D-NEGF_e+ c18m default 24 FAILED 127:0 92365.batch batch default 24 FAILED 127:0 92365.extern extern default 24 COMPLETED 0:0 92365.0 1D-NEGF-M+ default 24 FAILED 127:0
It has been immediately scheduled and failed. You should have received an email.
- You are using a hand-build module system. One issue with this approach is that dependencies are not dissolved properly. For example loading the module python does something unexpected. A short example:
$ module load intel; ldd main.x | grep mkl intel/19.0 already loaded, doing nothing [ WARNING ] libmkl_intel_lp64.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_intel_lp64.so (0x00002ac58b9ee000) libmkl_core.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_core.so (0x00002ac58c53c000) libmkl_intel_thread.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_intel_thread.so (0x00002ac5906c8000)
$ module load python; ldd main.x | grep mkl Loading python 2.7.12 [ OK ] The SciPy Stack available: http://www.scipy.org/stackspec.html Build with GCC compilers. libmkl_intel_lp64.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_intel_lp64.so (0x00002abea70ad000) libmkl_core.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_core.so (0x00002abea7bcb000) libmkl_intel_thread.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_intel_thread.so (0x00002abea96ba000)
Using mkl_get_version_string() shows that the python mkl is version 2017.0.0 instead of the expected 2019.0.1 version that should be loaded.
A different approach to a hand-build module system would be using easybuild to create the module system. This would avoid such issues. The blue print paper can be found here: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7830454 JSC has published there easybuild configuration on github: https://github.com/easybuilders/JSC The config files from HPC-UGent are also publicly available.
Hi Paul, I think, this is your part.
Best, Sebastian
Best, Marcus
_______________________________________________ claix18-slurm-pilot mailing list --claix18-slurm-pilot@lists.rwth-aachen.de To unsubscribe send an email toclaix18-slurm-pilot-leave@lists.rwth-aachen.de
-- Marcus Wagner, Dipl.-Inf.
IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wagner@itc.rwth-aachen.de www.itc.rwth-aachen.de
_______________________________________________ claix18-slurm-pilot mailing list --claix18-slurm-pilot@lists.rwth-aachen.de To unsubscribe send an email toclaix18-slurm-pilot-leave@lists.rwth-aachen.de
_______________________________________________ claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de
-- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wagner@itc.rwth-aachen.de www.itc.rwth-aachen.de
On 01/24/2019 01:59 PM, Sebastian Achilles wrote:
- You are using a hand-build module system.
We do.
One issue with this approach is that dependencies are not dissolved properly. in your below case: That is not a bug, That is a feature.
For example loading the module python does something unexpected. A short example:
$ module load intel; ldd main.x | grep mkl intel/19.0 already loaded, doing nothing [ WARNING ] libmkl_intel_lp64.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_intel_lp64.so (0x00002ac58b9ee000) libmkl_core.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_core.so (0x00002ac58c53c000) libmkl_intel_thread.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_intel_thread.so (0x00002ac5906c8000)
Yessir, you binary use the MKL from the Intel compiler loaded [via LD_LIBRARY_PATH]. In general you have to load the *same* modules for running the binary as used at compile+link time. That enshure the same environment for running your main.x executable as at the build time - only if your binary *and all resolved libs* are unchanged your results are 99.9% reproducible. (Remaining 0.1% goes to the Linux updates you cannot avoid; Yes we got the situation when we must recompile parts of software after 'minor OS upgrade'). On the other hand you often like to get updates; in 99% you can use the same binary with newer versions of libraries - that is why we in general update the minor versions (bug fix releases) without notice to the users. When changing major version (like in the cluster going from intel/16 to intel/19 compiler) you binary *could* stay runnable however we do not promise it - instead you must either - get the old modules (typically supported for limited time as workaround) - RECOMPILE your application (you do not still drive your grandpa's VW Bug, didn't you?)
$ module load python; ldd main.x | grep mkl
Rhetorical question: Would you please tell us what do you want to do: [1] - would you like to run main.x application [2] - would you like to use the Python and maybe NumPy, SciPy and so on? If [2] so would you love, if the MKL version linked to that NumPy used in that Python installation would be changed and switched and twisted after almost any module command the user issued? I believe you called that behaviour
does something unexpected. ... didn't you? So in case of Python we deceided for 'reproducibility' instead of 'interchargeability' and thus is why the MKL from python is prepended. But even here you have the freedom to [break your environment by] change the LD_LIBRARY_PATH envvar and say to use MKL/2019 with Python/NumPy not build to use it.
If [1], so WHY TO HELL YOU LOADED A PYTHON MODULE?? instead of running your application in the environment you used to build it? (Try to load the matlab module and call the 'kate' text editor. It won't start. Oh noh! There are incompatible software products in that world!)
Loading python 2.7.12 [ OK ] The SciPy Stack available: http://www.scipy.org/stackspec.html Build with GCC compilers. libmkl_intel_lp64.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_intel_lp64.so (0x00002abea70ad000) libmkl_core.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_core.so (0x00002abea7bcb000) libmkl_intel_thread.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_intel_thread.so (0x00002abea96ba000)
Using mkl_get_version_string() shows that the python mkl is version 2017.0.0 instead of the expected 2019.0.1 version that should be loaded.
Yessir. But instead of using MKL after loading a python module you could think about divide your tasks to tiny, small environment versions. KISS: Keep It Simple and Stupid is in general a good idea, and especially in software business. On the other hand we always enjoy to investigate all the cases when the used did yet another shoot into his feet by 'just small environment changes' (sourcing that file from Joe, or - running gag - some module commands hard coded into environment; for example:
Unloading openmpi 1.10.4 [ OK ] Unloading Intel Suite 16.0.2.181 [ OK ] +(0):ERROR:0: Unable to locate a modulefile for 'intel/17.0.4.196' /opt/MPI/openmpi-1.10.4/linux/none [ ERROR ] No openmpi/1.10.4 for none compiler available [ ERROR ] Loading MISC environment [ OK ] Loading gnuplot 5.2.2 [ OK ] or Sie sind mit dem Knoten 'login18-1' verbunden .... Unloading intelmpi 2018.4.274 [ OK ] Unloading Intel Suite 19.0.1.144 [ OK ] +(0):ERROR:0: Unable to locate a modulefile for 'intel/17.0.4.196' No intelmpi/2018.4.274 for none compiler available, abort. [ ERROR ] ... so we do not prohibit it yet: we just *do not recommend* to do some steps which lead to some Interesting Investigations some when later. That secure our jobs.
A different approach to a hand-build module system would be using easybuild to create the module system. This would avoid such issues.
We know about easybuild (a note from Jülich: "Man müsse 'easy' streichen".) Would you pinpoint us to a recipe to build ABINIT with Intel 19 compiler? (*) https://forum.abinit.org/viewtopic.php?f=17&t=4007 Or to build BOOST with PGI compilers? (still in my mailbox, damn). Or CP2K using Intel MPI with ScaLAPACK from MKL? (A developer got an account at our cluster as we have that bunch of compilers available...) Maybe latest patch of VASP with Wannier library (discussion with developers deceased some time ago..) All jesting aside, we *considered* to introduce easybuild. It need time [we do not have] and effort; especially if you want to get Latest and Greatest versions (and our users DO!) - you will first invest time to learn easybuild, then to change it to some readable not blown-up statethe to fix the application itself, then to fix easybuild (Oh My Deer Its Python!!) then to repeat... Somewhen later. Have a nice evening Paul Kapinos -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915
Dear Paul, In my opinion it is a common use case on a HPC system to run first a computation and perform processioning afterwards. What I want to do: I want to have one fixed set of modules (intel/19.0 intelmpi/2019 and python with numpy as well as the current MKL version). This is by default not possible at the moment. I know my example was shortened. Obviously one should compile+link with the same module set loaded. But nevertheless, if you load intel or intel + python you will get a different MKL version. For scientific computing it is very important to choose and know the correct MKL version as might affect performance and lead to different performance results. I am very much aware of the advantages and disadvantages of Easybuild. And I know that it is not that easy to use, as I have used it myself. In the end it was just a suggestion. Have a nice evening, Sebastian On 24.01.19 17:01, Paul Kapinos wrote:
On 01/24/2019 01:59 PM, Sebastian Achilles wrote:
- You are using a hand-build module system. We do.
One issue with this approach is that dependencies are not dissolved properly. in your below case: That is not a bug, That is a feature.
For example loading the module python does something unexpected. A short example:
$ module load intel; ldd main.x | grep mkl intel/19.0 already loaded, doing nothing [ WARNING ] libmkl_intel_lp64.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_intel_lp64.so (0x00002ac58b9ee000) libmkl_core.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_core.so (0x00002ac58c53c000) libmkl_intel_thread.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_intel_thread.so (0x00002ac5906c8000)
Yessir, you binary use the MKL from the Intel compiler loaded [via LD_LIBRARY_PATH]. In general you have to load the *same* modules for running the binary as used at compile+link time. That enshure the same environment for running your main.x executable as at the build time - only if your binary *and all resolved libs* are unchanged your results are 99.9% reproducible. (Remaining 0.1% goes to the Linux updates you cannot avoid; Yes we got the situation when we must recompile parts of software after 'minor OS upgrade').
On the other hand you often like to get updates; in 99% you can use the same binary with newer versions of libraries - that is why we in general update the minor versions (bug fix releases) without notice to the users. When changing major version (like in the cluster going from intel/16 to intel/19 compiler) you binary *could* stay runnable however we do not promise it - instead you must either - get the old modules (typically supported for limited time as workaround) - RECOMPILE your application (you do not still drive your grandpa's VW Bug, didn't you?)
$ module load python; ldd main.x | grep mkl Rhetorical question: Would you please tell us what do you want to do: [1] - would you like to run main.x application [2] - would you like to use the Python and maybe NumPy, SciPy and so on?
If [2] so would you love, if the MKL version linked to that NumPy used in that Python installation would be changed and switched and twisted after almost any module command the user issued? I believe you called that behaviour
does something unexpected. ... didn't you? So in case of Python we deceided for 'reproducibility' instead of 'interchargeability' and thus is why the MKL from python is prepended. But even here you have the freedom to [break your environment by] change the LD_LIBRARY_PATH envvar and say to use MKL/2019 with Python/NumPy not build to use it.
If [1], so WHY TO HELL YOU LOADED A PYTHON MODULE?? instead of running your application in the environment you used to build it? (Try to load the matlab module and call the 'kate' text editor. It won't start. Oh noh! There are incompatible software products in that world!)
Loading python 2.7.12 [ OK ] The SciPy Stack available: http://www.scipy.org/stackspec.html Build with GCC compilers. libmkl_intel_lp64.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_intel_lp64.so (0x00002abea70ad000) libmkl_core.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_core.so (0x00002abea7bcb000) libmkl_intel_thread.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_intel_thread.so (0x00002abea96ba000)
Using mkl_get_version_string() shows that the python mkl is version 2017.0.0 instead of the expected 2019.0.1 version that should be loaded. Yessir. But instead of using MKL after loading a python module you could think about divide your tasks to tiny, small environment versions. KISS: Keep It Simple and Stupid is in general a good idea, and especially in software business.
On the other hand we always enjoy to investigate all the cases when the used did yet another shoot into his feet by 'just small environment changes' (sourcing that file from Joe, or - running gag - some module commands hard coded into environment; for example:
Unloading openmpi 1.10.4 [ OK ] Unloading Intel Suite 16.0.2.181 [ OK ] +(0):ERROR:0: Unable to locate a modulefile for 'intel/17.0.4.196' /opt/MPI/openmpi-1.10.4/linux/none [ ERROR ] No openmpi/1.10.4 for none compiler available [ ERROR ] Loading MISC environment [ OK ] Loading gnuplot 5.2.2 [ OK ] or Sie sind mit dem Knoten 'login18-1' verbunden .... Unloading intelmpi 2018.4.274 [ OK ] Unloading Intel Suite 19.0.1.144 [ OK ] +(0):ERROR:0: Unable to locate a modulefile for 'intel/17.0.4.196' No intelmpi/2018.4.274 for none compiler available, abort. [ ERROR ] ... so we do not prohibit it yet: we just *do not recommend* to do some steps which lead to some Interesting Investigations some when later. That secure our jobs.
A different approach to a hand-build module system would be using easybuild to create the module system. This would avoid such issues. We know about easybuild (a note from Jülich: "Man müsse 'easy' streichen".)
Would you pinpoint us to a recipe to build ABINIT with Intel 19 compiler? (*) https://forum.abinit.org/viewtopic.php?f=17&t=4007 Or to build BOOST with PGI compilers? (still in my mailbox, damn). Or CP2K using Intel MPI with ScaLAPACK from MKL? (A developer got an account at our cluster as we have that bunch of compilers available...) Maybe latest patch of VASP with Wannier library (discussion with developers deceased some time ago..)
All jesting aside, we *considered* to introduce easybuild. It need time [we do not have] and effort; especially if you want to get Latest and Greatest versions (and our users DO!) - you will first invest time to learn easybuild, then to change it to some readable not blown-up statethe to fix the application itself, then to fix easybuild (Oh My Deer Its Python!!) then to repeat... Somewhen later.
Have a nice evening
Paul Kapinos
Hello again, On 01/24/2019 06:20 PM, Sebastian Achilles wrote:
In my opinion it is a common use case on a HPC system to run first a computation and perform processioning afterwards. Yes, of course. But, must it always be done *in the same environment*?
BTW: If you run a [serial] postprocessing Python task in the same [1000 rank parallel batch job] environment, you will earn a lot of love by the administrators [and waste 1000s core-h on unused ressources].
What I want to do: I want to have one fixed set of modules (intel/19.0 intelmpi/2019 and python with numpy as well as the current MKL version). This is by default not possible at the moment. This will not be possible in the future, as your 'fixed set of modules' will be obsolete at least in fall. The world keeps changing.
I know my example was shortened. Obviously one should compile+link with the same module set loaded. But nevertheless, if you load intel or intel + python you will get a different MKL version.
*and you get the same MKL version as was used for NumPy, SciPy and friends in that version of Python installation*. Why? Why? As we will ensure that the Python user, who likely use NumPy and SciPy and therethrough MKL, use always the same version of MKL not depending on Intel compiler version loaded. THE SAME WHAT YOU WANNA TO HAVE FOR YOUR APPLICATION! At this point I will stop further discussion of possibilities but tell you that we cannot support all-all compibations of software and modules, but any of those by own.
For scientific computing it is very important to choose and know the correct MKL version as might affect performance and lead to different performance results. Yessis. So avoid loading Python when relying on minor revision of MKL in your non-Python application. [Avoid carrying your snowboard when going to swim...] OR, if you need Python - so load and use it with at least *stable* version of MKL linked to. OR, install an own version of any software and use it in way you like and give up our support.
(By the way, the number of users/use cases when the MKL version really change the behaviour of the application is quite limited. I would say 99% of users are not affected by this).
I am very much aware of the advantages and disadvantages of Easybuild. And I know that it is not that easy to use, as I have used it myself.
In the end it was just a suggestion.
Have a nice evening, Sebastian
On 24.01.19 17:01, Paul Kapinos wrote:
On 01/24/2019 01:59 PM, Sebastian Achilles wrote:
- You are using a hand-build module system. We do.
One issue with this approach is that dependencies are not dissolved properly. in your below case: That is not a bug, That is a feature.
For example loading the module python does something unexpected. A short example:
$ module load intel; ldd main.x | grep mkl intel/19.0 already loaded, doing nothing [ WARNING ] libmkl_intel_lp64.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_intel_lp64.so (0x00002ac58b9ee000) libmkl_core.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_core.so (0x00002ac58c53c000) libmkl_intel_thread.so => /opt/intel/Compiler/19.0/1.144/rwthlnk/mkl/lib/intel64_lin/libmkl_intel_thread.so (0x00002ac5906c8000)
Yessir, you binary use the MKL from the Intel compiler loaded [via LD_LIBRARY_PATH]. In general you have to load the *same* modules for running the binary as used at compile+link time. That enshure the same environment for running your main.x executable as at the build time - only if your binary *and all resolved libs* are unchanged your results are 99.9% reproducible. (Remaining 0.1% goes to the Linux updates you cannot avoid; Yes we got the situation when we must recompile parts of software after 'minor OS upgrade').
On the other hand you often like to get updates; in 99% you can use the same binary with newer versions of libraries - that is why we in general update the minor versions (bug fix releases) without notice to the users. When changing major version (like in the cluster going from intel/16 to intel/19 compiler) you binary *could* stay runnable however we do not promise it - instead you must either - get the old modules (typically supported for limited time as workaround) - RECOMPILE your application (you do not still drive your grandpa's VW Bug, didn't you?)
$ module load python; ldd main.x | grep mkl Rhetorical question: Would you please tell us what do you want to do: [1] - would you like to run main.x application [2] - would you like to use the Python and maybe NumPy, SciPy and so on?
If [2] so would you love, if the MKL version linked to that NumPy used in that Python installation would be changed and switched and twisted after almost any module command the user issued? I believe you called that behaviour
does something unexpected. ... didn't you? So in case of Python we deceided for 'reproducibility' instead of 'interchargeability' and thus is why the MKL from python is prepended. But even here you have the freedom to [break your environment by] change the LD_LIBRARY_PATH envvar and say to use MKL/2019 with Python/NumPy not build to use it.
If [1], so WHY TO HELL YOU LOADED A PYTHON MODULE?? instead of running your application in the environment you used to build it? (Try to load the matlab module and call the 'kate' text editor. It won't start. Oh noh! There are incompatible software products in that world!)
Loading python 2.7.12 [ OK ] The SciPy Stack available: http://www.scipy.org/stackspec.html Build with GCC compilers. libmkl_intel_lp64.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_intel_lp64.so (0x00002abea70ad000) libmkl_core.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_core.so (0x00002abea7bcb000) libmkl_intel_thread.so => /usr/local_rwth/sw/python/2.7.12/x86_64/extra/lib/libmkl_intel_thread.so (0x00002abea96ba000)
Using mkl_get_version_string() shows that the python mkl is version 2017.0.0 instead of the expected 2019.0.1 version that should be loaded. Yessir. But instead of using MKL after loading a python module you could think about divide your tasks to tiny, small environment versions. KISS: Keep It Simple and Stupid is in general a good idea, and especially in software business.
On the other hand we always enjoy to investigate all the cases when the used did yet another shoot into his feet by 'just small environment changes' (sourcing that file from Joe, or - running gag - some module commands hard coded into environment; for example:
Unloading openmpi 1.10.4 [ OK ] Unloading Intel Suite 16.0.2.181 [ OK ] +(0):ERROR:0: Unable to locate a modulefile for 'intel/17.0.4.196' /opt/MPI/openmpi-1.10.4/linux/none [ ERROR ] No openmpi/1.10.4 for none compiler available [ ERROR ] Loading MISC environment [ OK ] Loading gnuplot 5.2.2 [ OK ] or Sie sind mit dem Knoten 'login18-1' verbunden .... Unloading intelmpi 2018.4.274 [ OK ] Unloading Intel Suite 19.0.1.144 [ OK ] +(0):ERROR:0: Unable to locate a modulefile for 'intel/17.0.4.196' No intelmpi/2018.4.274 for none compiler available, abort. [ ERROR ] ... so we do not prohibit it yet: we just *do not recommend* to do some steps which lead to some Interesting Investigations some when later. That secure our jobs.
A different approach to a hand-build module system would be using easybuild to create the module system. This would avoid such issues. We know about easybuild (a note from Jülich: "Man müsse 'easy' streichen".)
Would you pinpoint us to a recipe to build ABINIT with Intel 19 compiler? (*) https://forum.abinit.org/viewtopic.php?f=17&t=4007 Or to build BOOST with PGI compilers? (still in my mailbox, damn). Or CP2K using Intel MPI with ScaLAPACK from MKL? (A developer got an account at our cluster as we have that bunch of compilers available...) Maybe latest patch of VASP with Wannier library (discussion with developers deceased some time ago..)
All jesting aside, we *considered* to introduce easybuild. It need time [we do not have] and effort; especially if you want to get Latest and Greatest versions (and our users DO!) - you will first invest time to learn easybuild, then to change it to some readable not blown-up statethe to fix the application itself, then to fix easybuild (Oh My Deer Its Python!!) then to repeat... Somewhen later.
Have a nice evening
Paul Kapinos
-- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915
participants (3)
-
Marcus Wagner
-
Paul Kapinos
-
Sebastian Achilles