Reports & Publications
Dell PowerSwitch Z9664F-ON 400GbE Ethernet AI Fabric Performance
Login or create an account to download this report
Abstract
AI training involves massive amounts of data being processed in parallel by GPUs across a collection of high-end servers. Network designers need to have confidence that their Ethernet networking fabric can handle the demands of AI processing and perform at or near wire speed. The Dell PowerSwitch line offers high-performance and high-port density that can serve as a network fabric for AI.
Dell Inc. commissioned Tolly to benchmark a 400GbE network fabric consisting of 10 Dell PowerSwitch Z9664F-ON switches with Enterprise SONiC Distribution by Dell Technologies version 4.3.0 in a RAIL optimized topology. Eight Dell PowerEdge servers, each outfitted with eight GPUs and eight 400GbE network interfaces, were connected to the switch fabric and were tasked with running various AI workloads across the network fabric.
Three separate AI workloads were run across the converged Ethernet fabric. 1) RDMA throughput measured using the perftest load generator, 2) NVIDIA Collective Communications Library (NCCL) test, 3) LLM Fine-Tuning. All test traffic was generated from 8 Dell PowerEdge XE9680 servers each outfitted with 8 NICs and 8 NVIDIA H100 GPUs.
The Dell PowerSwitch Ethernet Fabric provided zero-loss handling of AI traffic from the 64 connected NICs/GPUs delivering ~391Gbps of Perftest RDMA throughput and ~390GBps of NCCL benchmark inter-node throughput.
#GenAI