Reports & Publications
64 GPU AI Computing Performance Comparison Test H3C DDC-based RoCE Switch Network vs. InfiniBand Network
Login or create an account to download this report
Abstract
DDC (Distributed Disaggregated Chassis) technology is an innovative network architecture design that breaks away from traditional centralized chassis switch designs. It adopts a distributed and disaggregated approach to enhance the flexibility and scalability of data center networks. Based on advanced hardware technologies such as VOQ (Virtual Output Queue) and CELL (cell-based) switching, DDC improves link utilization and throughput between NCP and NCF, fully meeting the stringent requirements of HPC (High-Performance Computing) and AI workloads for low forwarding latency and low packet loss rates.
Tolly conducted tests evaluating the performance of the NVIDIA Collective Communication Library (NCC L) with 64 GPUs and the large-scale model Llama3 under different network architectures. Specifically, the tests compared the performance differences between RDMA over Converged Ethernet (RoCE) and InfiniBand (IB) in a 64-GPU environment. Additionally, within RoCE networks, Tolly engineers assessed the advantages of H3C’s DDC technology over traditional ECMP technology.
The test results for NCCL and the large language model Llama3 indicate that DDC-based RoCE delivers performance comparable to IB and provides a consistent user experience in the same workload scenarios.
The NCCL Alltoall test results demonstrate that DDC offers a significant advantage in bus bandwidth (busbw) compared to the traditional ECMP hash-based approach.