Reports & Publications

64 GPU AI Performance Comparison Test and Automated O&M Test H3C RoCE Network AD-DC Path Navigation Solution vs. Traditional ECMP Solution

Sponsor: New H3C Technologies Co., Ltd
H3C RoCE Network AD-DC Path Navigation Solution vs. Traditional ECMP Solution

Abstract

New H3C Technologies commissioned Tolly to evaluate the performance and operational benefits of its AD-DC path navigation solution in a 64-GPU AI environment. The main focus of the project was to compare AD-DC against traditional ECMP load balancing under the same RoCE-based training scenario, measuring collective-communication bandwidth and validating automated operations and maintenance capabilities for fault detection and root-cause analysis.  


The test bed used eight servers, each equipped with eight NVIDIA H20 GPUs and eight 400G NICs, for a total of 64 GPUs. The network fabric used H3C S9827-128DH switches in both spine and leaf roles, with two spine switches and four leaf switches, all running a RoCE architecture. The software stack included Ubuntu 22.04.4, CUDA 12.4, and NCCL 2.22.3. Tolly compared two traffic-engineering methods on the same hardware: H3C’s AD-DC path navigation, which actively plans service traffic paths based on topology and traffic characteristics, and a traditional ECMP hash-based approach.  


In NCCL Ring-AllReduce testing, AD-DC showed clear performance gains over ECMP at larger message sizes. Average bus bandwidth increased from 212.644GB/s with ECMP to 233.262GB/s with AD-DC, a 9.70% improvement overall. While the two methods were close at smaller message sizes, AD-DC’s advantage became substantial as transfer sizes increased, reaching 10.43% at 512MB, 11.29% at 1GB, 17.32% at 8GB, and 22.76% at 16GB. Tolly presents this as evidence that globally coordinated path planning can improve traffic load sharing and reduce congestion in large-scale AI communication workloads.  


The report also emphasizes operations and maintenance capabilities. Tolly verified that AD-DC can monitor key lossless-network metrics including optical module status, switch interface bandwidth utilization, interface byte rate, PFC pause-frame counts, and ECN metrics. According to the screenshots and table on pages 3 through 6, the system can also perform comparative analysis across time periods at both the fabric and switch-interface-queue levels, correlating changes in NIC throughput, PFC frame rate, and ECN packet activity to help identify faults and root causes during model training.  


Overall, the report positions H3C’s AD-DC path navigation as both a performance optimization and an operational intelligence layer for RoCE-based AI fabrics. In this evaluation, it improved NCCL collective bandwidth over traditional ECMP while also providing richer observability for maintaining stability in large-scale model training environments.