Authors
Guanbin Xu, Zhihao Le, Yinhe Chen, Zhiqi Lin, Zewen Jin, Youshan Miao, Cheng Li
Abstract
The collective communication libraries are pivotal in optimizing the performance of distributed and parallel deep neural network (DNN) training. AutoCCL proposes an automated framework that tunes collective communication configurations to significantly improve training throughput.