
Hi all, I have looked into the Network Performance on CLAIX18. I have measured latency and bandwidth for intra and inter node communication. I have used the Intel IMB PingPong Benchmark complied with the modules intel/19.0 and intelmpi/2019. To get a sufficient statistic I have submitted 64 jobs with 1 node using 2 tasks and 64 jobs with 2 nodes using 1 task each, respectively. The scheduler has started the jobs on different sets of nodes. I have attached the results, showing the configuration and the average, min and max of the measurement. Let's first look at the inter node communication: I have measured an average latency of 2.12 usec. In the best case I measured 1 usec and in the maximum is 7.1 usec. The bandwidth is on average 6488 Mbytes/sec. The maximum is 11995 Mbytes/sec and minimum is 2483 Mbytes/sec. The latency for intra node communication looks okay, however the bandwidth shows variation. On average theses results don't correspond with the advertised values from Intel. Either I have done something wrong or I haven't understood the topology or there is a problem with the machine. Have you run such a benchmark as well? Can you observe something similar? @Marcus: To get a better understanding of the machine, could you please share a bit more information on the network topology: - How many levels does the tree have? - On which level is the tree pruned? - Could you send me the connectivity file / connection map, e.g. a list of cables connecting the nodes, edge and core switches? I would like to add the hop count information into my result. (I have a script for computing the hop count from a connection map. Depending on the format I just need to adjust the reading routine) Cheers, Sebastian