AI Model Training Performance Huawei Xinghe Intelligent Fabric vs. InfiniBand Network
Sponsor: Huawei Technologies, Co. Ltd
All Reports Sponsored by this Vendor
Document Number: 224132
Publication Date: 11/11/2024
Page Count: 4
Abstract
The development of AI imposes higher demands on networks. Data center networks need to adapt more quickly to AI workload requirements, enhance the interaction capabilities between various computes, and thus achieve better model training performance.
Tolly evaluated the performance of Huawei Xinghe Intelligent Fabric, and compared it with InfiniBand (IB) network in environments with the same AI computing resources for training NCCL, Llama 2, and GPT-3. Huawei Xinghe Intelligent Fabric had an average effective bandwidth 7.91% higher than the IB network. The AI accelerator NSLB algorithm of Huawei Xinghe Intelligent Fabric achieves global load balancing, resulting in a more balanced network load distribution compared to traditional Hash algorithms, thereby supporting higher performance in training large language models.
Login Sign-up