AI Model Training Performance Huawei Xinghe Intelligent Fabric vs. InfiniBand Network

AI Model Training Performance Huawei Xinghe Intelligent Fabric vs. InfiniBand Network

Sponsor: Huawei Technologies, Co. Ltd

All Reports Sponsored by this Vendor

Document Number: 224132

Publication Date: 11/11/2024

Page Count: 4

Abstract

The development of AI imposes higher demands on networks. Data center networks need to adapt more quickly to AI workload requirements, enhance the interaction capabilities between various computes, and thus achieve better model training performance.


Tolly evaluated the performance of Huawei Xinghe Intelligent Fabric, and compared it with InfiniBand (IB) network in environments with the same AI computing resources for training NCCL, Llama 2, and GPT-3. Huawei Xinghe Intelligent Fabric had an average effective bandwidth 7.91% higher than the IB network. The AI accelerator NSLB algorithm of Huawei Xinghe Intelligent Fabric achieves global load balancing, resulting in a more balanced network load distribution compared to traditional Hash algorithms, thereby supporting higher performance in training large language models.

Login Sign-up
An unhandled error has occurred. Reload 🗙