Reports & Publications

64 GPU AI Computing Performance Comparison Test H3C RoCE Network (S9825-G & S9855-G Series Switches) vs. InfiniBand Network

Sponsor: New H3C Technologies Co., Ltd
H3C RoCE Network (S9825-G & S9855-G Series Switches) vs. InfiniBand Network

Abstract

The H3C S9825-G/9855-G series switches are next-generation high-performance, high-density 400GE/100GE Ethernet switches designed by H3C for high-end data centers and AIGC computing scenarios. They support redundant hot-swappable power supplies and fans.

 

Tolly conducted tests evaluating the performance of the NVIDIA Collective Communication Library (NCCL) with 64 GPUs and the large-scale model Llama3 under different network architectures. Specifically, the tests compared the performance of an RDMA over Converged Ethernet (RoCE) network using H3C S9825-8C-G and S9855-32DH-G switches with an InfiniBand (IB) network using NVIDIA QM9700 switches in a 64-GPU environment.

 

In the RoCE network, the H3C S9825-8C-G switch functioned as the spine device, while the H3C S9855-32DH-G switch served as the leaf device connecting to the servers.

 

The test results for NCCL and the large language model Llama3 indicate that, under the same workload scenarios, RoCE delivers performance comparable to IB and provides a consistent user experience.