Abstract: |
|
In recent years, heterogenious clusters using accelerators are widely used for high performance computing system. In such clusters, the inter-node communication among accelerators requires several memory copies via CPU memory, and the communication latency causes severe performance degradation. To address this problem, we propose Tightly Coupled Accelerators (TCA) architecture, which is capable of reducing the communication latency between accelerators over different nodes. The TCA architecture communicates directly via the PCIe protocol, which allows it to eliminate protocol overhead, such as that associated with IB and MPI, as well as the memory copy overhead. We constructed HA-PACS/TCA cluster which is equipped with the TCA communication board (PEACH2 board) as the proprietary interconnect for GPU to utilize GPU-to-GPU direct communication over the nodes. As the result of performance evaluation, HA-PACS/TCA demonstrates that the TCA interconnect achieves good performance for GPU-to-GPU communication as a latency of 2.3 us and bandwidth of 2.7GB/s.
Authors Toshihiro Hanawa, University of Tokyo; Yuetsu Kodama, University of Tsukuba; Taisuke Boku, University of Tsukuba; Mitsuhisa Sato, University of Tsukuba |
|