OpenMP parallelization of an OpenMesh app
Hi All, I have implemented a simulation as an OpenMesh app. In general, I have calculations on n separate meshes and occasionally a joint calculation which affects all of the meshes. I was trying to parallelize the simulation via OpenMP, assigning a separate core to each of the meshes. For testing, I am running in a regime where there are no joint calculations so all the calculations should be completely independent. Yet, none of my CPUs run at 100%, they are at around 80%. When I run it on a single core, it's constantly near 100%. I was wondering whether there are any global shared variables which prevent such parallelization or what needs to be modified in the cmake file. Currently, I do set(CMAKE_CXX_FLAGS_RELEASE "-O3 -fopenmp") and it seems to compile and use the defined number of cores. Is there anything else to be done? Note that I do not want to parallelize OpenMesh itself, I just want to parallelize my own little app, a single for loop. Thank you, Botond
Hi Botond, You probably should not look at the % use of each core. Use the OMP_NUM_THREADS variable to do a scaling experiment of elapsed time with 1,2,4,8 etc up to the number of physical cores to see how close you are getting to perfect scaling. You also need to be careful to "prime the OpenMP pump" before your timing loop as OpenMP starts up the threads in a thread pool. Getting perfect scaling is always tricky when using OpenMP or other mult-threading approaches. There can be software issues with memory contention and locks, or overheads based on your hardware. Using a tool like Intel Amplifier or other thread profiling tools are essential to understanding the issue and finding the bottlenecks. Andrew On Mon, Dec 21, 2020 at 10:00 AM <btyukodi@brandeis.edu> wrote:
Hi All,
I have implemented a simulation as an OpenMesh app. In general, I have calculations on n separate meshes and occasionally a joint calculation which affects all of the meshes. I was trying to parallelize the simulation via OpenMP, assigning a separate core to each of the meshes. For testing, I am running in a regime where there are no joint calculations so all the calculations should be completely independent. Yet, none of my CPUs run at 100%, they are at around 80%. When I run it on a single core, it's constantly near 100%.
I was wondering whether there are any global shared variables which prevent such parallelization or what needs to be modified in the cmake file. Currently, I do
set(CMAKE_CXX_FLAGS_RELEASE "-O3 -fopenmp")
and it seems to compile and use the defined number of cores. Is there anything else to be done? Note that I do not want to parallelize OpenMesh itself, I just want to parallelize my own little app, a single for loop.
Thank you, Botond _______________________________________________ OpenMesh mailing list -- openmesh@lists.rwth-aachen.de To unsubscribe send an email to openmesh-leave@lists.rwth-aachen.de https://lists.rwth-aachen.de/postorius/lists/openmesh.lists.rwth-aachen.de
Hi Andrew, Thank you for your answer. Indeed, there are many points where this could go wrong. For instance, I just noticed that using the rand() function on separate threads (which I do) may cause locks. I guess my question is then rather whether OpenMesh itself uses any variables/algorithms that would cause locks when operating on different mesh objects or I can be almost sure that it's my poor algorithm that causes this? I will certainly look into thread profiling tools, sounds like it's tremendously useful and yes, scaling is the way to be precise about this. I'm not sure though what you mean by "priming the OpenMP pump"... Thanks again, Botond
Hi Botond, Ah yes, our friend "rand()" is a problem in multi threaded code. You might want to look at some of the C++ random number generators in <random>. What I meant by prime the pump, is that when doing OpenMP benchmarks, the first time an OpenMP loop is encountered, the runtime creates the threads. This can really bias benchmarks. So to 'prime the pump' you should have a simple OpenMP loop before the one you want to time so that all the OpenMP threads are up before you start the benchmark.. Andrew On Mon, Dec 21, 2020 at 1:12 PM <btyukodi@brandeis.edu> wrote:
Hi Andrew,
Thank you for your answer. Indeed, there are many points where this could go wrong. For instance, I just noticed that using the rand() function on separate threads (which I do) may cause locks.
I guess my question is then rather whether OpenMesh itself uses any variables/algorithms that would cause locks when operating on different mesh objects or I can be almost sure that it's my poor algorithm that causes this? I will certainly look into thread profiling tools, sounds like it's tremendously useful and yes, scaling is the way to be precise about this. I'm not sure though what you mean by "priming the OpenMP pump"...
Thanks again, Botond _______________________________________________ OpenMesh mailing list -- openmesh@lists.rwth-aachen.de To unsubscribe send an email to openmesh-leave@lists.rwth-aachen.de https://lists.rwth-aachen.de/postorius/lists/openmesh.lists.rwth-aachen.de
participants (2)
-
Andrew Cunningham
-
btyukodi@brandeis.edu