Re: Multi-Node ANSYS simulations
Hello Marcus, unfortunately I get essentially the same problem. srun spawns #cores instances of the CFX solver, every one of which tries to access all cores. Since they still try to communicate with the other node over ssh the result is the same error as below. Regards, Thomas Von: Marcus Wagner [mailto:wagner@itc.rwth-aachen.de] Gesendet: Dienstag, 12. Februar 2019 15:21 An: claix18-slurm-pilot@lists.rwth-aachen.de Betreff: [claix18-slurm-pilot] Re: Multi-Node ANSYS simulations Dear Thomas, could you please test the following: srun cfx5solve -batch -parallel -partition $SLURM_NTASKS -def job.def -par-dist "$CFXHOSTS" -start-method "Intel MPI Distributed Parallel". Best Marcus On 2/12/19 11:10 AM, Gier, Thomas wrote: Hello, I'm having issues running ANSYS CFX calculations across multiple nodes. Single-node simulations run fine, but multi-node configurations crash because ssh connections are being denied: " +--------------------------------------------------------------------+ | An error has occurred in cfx5solve: | | | | Remote connection to ncm0791.hpc.itc.rwthaachen.de | | (ncm0791.hpc.itc.rwth-aachen.de) could not be started, or exited | | with return code 255. It gave the following output: | | | | Permission denied (publickey,gssapi-keyex,gssapi-with-mic,pass- | | word,hostbased). | | | | Check that you have typed the hostname correctly, and that you | | have an account "tg084461" on the specified host with access | | permission from this host. You can use the following command to | | check the connection to a UNIX machine: | | | | ssh ncm0791.hpc.itc.rwth-aachen.de uname | +--------------------------------------------------------------------+" Am I missing something in my submission script, or is this a cluster config issue? Regards, Thomas Gier _______________________________________________ claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de<mailto:claix18-slurm-pilot@lists.rwth-aachen.de> To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de<mailto:claix18-slurm-pilot-leave@lists.rwth-aachen.de> -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wagner@itc.rwth-aachen.de<mailto:wagner@itc.rwth-aachen.de> www.itc.rwth-aachen.de<http://www.itc.rwth-aachen.de>
Dear Thomas, could you send me an example, such that I can test this for myself? Direct ssh-access is in our opinion no real option. I will have to see, if there is a workaround. Best Marcus Am 12.02.2019 um 17:22 schrieb Gier, Thomas:
Hello Marcus,
unfortunately I get essentially the same problem. srun spawns #cores instances of the CFX solver, every one of which tries to access all cores.
Since they still try to communicate with the other node over ssh the result is the same error as below.
Regards,
Thomas
*Von:*Marcus Wagner [mailto:wagner@itc.rwth-aachen.de] *Gesendet:* Dienstag, 12. Februar 2019 15:21 *An:* claix18-slurm-pilot@lists.rwth-aachen.de *Betreff:* [claix18-slurm-pilot] Re: Multi-Node ANSYS simulations
Dear Thomas,
could you please test the following:
srun cfx5solve -batch -parallel -partition $SLURM_NTASKS -def job.def -par-dist "$CFXHOSTS" -start-method "Intel MPI Distributed Parallel".
Best Marcus
On 2/12/19 11:10 AM, Gier, Thomas wrote:
Hello,
I'm having issues running ANSYS CFX calculations across multiple nodes.
Single-node simulations run fine, but multi-node configurations crash because ssh connections are being denied:
" +--------------------------------------------------------------------+
| An error has occurred in cfx5solve: |
| |
| Remote connection to ncm0791.hpc.itc.rwthaachen.de |
| (ncm0791.hpc.itc.rwth-aachen.de) could not be started, or exited |
| with return code 255. It gave the following output: |
| |
| Permission denied (publickey,gssapi-keyex,gssapi-with-mic,pass- |
| word,hostbased). |
| |
| Check that you have typed the hostname correctly, and that you |
| have an account "tg084461" on the specified host with access |
| permission from this host. You can use the following command to |
| check the connection to a UNIX machine: |
| |
| ssh ncm0791.hpc.itc.rwth-aachen.de uname |
+--------------------------------------------------------------------+"
Am I missing something in my submission script, or is this a cluster config issue?
Regards,
Thomas Gier
_______________________________________________
claix18-slurm-pilot mailing list --claix18-slurm-pilot@lists.rwth-aachen.de <mailto:claix18-slurm-pilot@lists.rwth-aachen.de>
To unsubscribe send an email toclaix18-slurm-pilot-leave@lists.rwth-aachen.de <mailto:claix18-slurm-pilot-leave@lists.rwth-aachen.de>
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner@itc.rwth-aachen.de <mailto:wagner@itc.rwth-aachen.de>
www.itc.rwth-aachen.de <http://www.itc.rwth-aachen.de>
_______________________________________________ claix18-slurm-pilot mailing list -- claix18-slurm-pilot@lists.rwth-aachen.de To unsubscribe send an email to claix18-slurm-pilot-leave@lists.rwth-aachen.de
participants (2)
-
Gier, Thomas
-
Marcus Wagner