Turbulence Simulation using many Graphics Processors

ORAL

Abstract

Unsteady simulations of turbulence are performed using up to 64 graphics processors on the NSF XSede supercomputer, Lincoln, located at NCSA. For a $512^3$ simulations the performance of 16 GPUs (Tesla S1070) is about 45 times faster than that obtained with the same number of CPU cores of quad-core Intel Harpertown processors on the same machine. The code is optimized to use the fast shared-memory on the GPUs and to use communication/computation overlapping. Results show that the computation time is now so fast that even for large problems, with up to 8 million unknowns per GPU, the MPI communication time controls the scaling behavior of the CFD algorithm.

*This work is supported by the Department of Defense and Oak Ridge National Laboratory.

Authors

  • Ali Khajeh-Saeed

    • University of Massachusetts Amherst, Mechanical and Industrial Engineering, Amherst, MA 01003, United States
    • Aerospace Engineering Department, Sharif University of Technology, Tehran, Iran
  • J. Blair Perot

    • University of Massachusetts Amherst, Mechanical and Industrial Engineering, Amherst, MA 01003, United States
    • University of Massachusetts, Amherst
    • University of Massachusetts Amherst