For parallel particle codes that have to be written quickly (while retaining flexibility), the task-based parallelism approach doesn’t always work well. The usual approach that is taken in those situations is some sort of domain decomposition and a lot of associated fine-grained code for communication between processes. One tries to strike the appropriate balance between communication and computation while making sure that the computation is load-balanced. As a rule of thumb, less communication is better.
One approach (among many) in particle-based codes that are being parallelized starting from a serial version is:
The creation of the particles on the root/master processor.
Scattering the particles to various processes.
Communicating ghost regions at processor boundaries.
Migrating particles that have crossed processor boundaries to the appropriate process.
In the interest of simplicity, we ignore the communication of interparticle forces.
Creating and scattering particles
Particles are created on the master process (P0) and then transferred to other parallel processes during the “scatter” operation. In the animation below, we assume that there are nine processes - P0 through P8. The domain is decomposed into nine squares and the contents of each square are sent to the appropriate process.
A possible MPI implementation of the scattering process is described below. For convenience we use the boost::mpi wrappers around MPI calls is most cases. However, some MPI calls do not have associated Boost calls and we have to use the MPI calls directly.
The first step is to set up the MPI communicator and determine the rank (and MPI coordinates in a virtual Cartesian topology) of the current process:
In the above, the IntVec class is an std::array<int, 3>.
The scatter operation
In the scatter operation, the particles are assigned to each patch and then sent to the appropriate patches using the asynchronous isend operation:
Here ParticlePArray is a std::vector<ParticleP> and ParticleP is a std::shared_ptr<Particle>. The Particle class contains particle data. For simplicity, we do not consider the performance implications of an array of structures (as used in this implementation) versus a structure of arrays (which is more efficient).
In the next part of this series, we will discuss two approaches for inter-patch communication for particle-based simulations.
If you have questions/comments/corrections, please contact banerjee at parresianz dot com dot zen (without the dot zen).