Reports & Publications
64 GPU AI Computing Performance Comparison Test H3C RoCE Network (S12500CR & S9855-G Series Switches) vs. InfiniBand Network
Login or create an account to download this report
Abstract
H3C S12500CR is a next-generation flagship switch launched by H3C Technologies Co., Ltd. for AI computing and large-scale model scenarios. It adopts a CLOS+ orthogonal hardware architecture, achieving rate convergence between network and computing nodes, providing a 100% lossless data channel for networking and AI computing. It supports high-density, high-speed interface cards, meeting the requirements of ultra-large-scale data centers and AIGC computing networks for high-density, non-blocking server access.
The H3C 9855-G series switches are next-generation high-performance, high-density 400GE/100GE Ethernet switches designed by H3C for high- end data centers and AIGC computing scenarios. They support redundant hot-swappable power supplies and fans. The 9855-G can be deployed in next-generation data center core and aggregation networks, connecting upstream to S12500 series core switches via 400GE links and downstream to 400GE/200GE/100GE switches, providing high-bandwidth, large-capacity server access.
Tolly evaluated the performance of the NVIDIA Collective Communication Library (NCCL) on a 64-GPU cluster and the training of a large-scale model (Llama3) under different network architectures. Specifically, the test compared the performance of an RDMA over Converged Ethernet (RoCE) network using H3C S12504CR, S12508CR, and S9855-32DH-G switches against an InfiniBand (IB) network using NVIDIA QM9700 switches in a 64-GPU environment. Both the RoCE and IB networks adopted the multi-track topology shown in Figure 1. In the RoCE network, H3C S12504CR and S12508CR switches functioned as spine devices, while the H3C S9855-32DH-G switch served as the leaf device connecting the servers.
The test results of NCCL and the large-scale language model Llama3 demonstrated that, under the same workload scenarios, RoCE delivers performance comparable to IB and ensures a consistent user experience.