Turbulence Simulation using many Graphics Processors
ORAL
Abstract
Unsteady simulations of turbulence are performed using up to 64 graphics processors on the NSF XSede supercomputer, Lincoln, located at NCSA. For a $512^3$ simulations the performance of 16 GPUs (Tesla S1070) is about 45 times faster than that obtained with the same number of CPU cores of quad-core Intel Harpertown processors on the same machine. The code is optimized to use the fast shared-memory on the GPUs and to use communication/computation overlapping. Results show that the computation time is now so fast that even for large problems, with up to 8 million unknowns per GPU, the MPI communication time controls the scaling behavior of the CFD algorithm.
*This work is supported by the Department of Defense and Oak Ridge National Laboratory.
–