Publication

nnScaler: Constraint-Guided Parallelization Plan Generation for Deep Learning Training

nnScaler advocates a more general approach enabling domain experts to construct their own search spaces via three primitives (op-trans, op-assign, op-order) that capture model transformations and spatiotemporal scheduling of transformed models for any parallel plan. Compared to DeepSpeed, Megatron-LM, and Alpa, nnScaler achieves up to 3.5x speedup on popular DNN models.

OSDI 2024 / July 2024
deep learningparallelizationDNN trainingdistributed systems

Authors

Zhiqi Lin, Youshan Miao, Quanlu Zhang, Fan Yang, Yi Zhu, Cheng Li, Saeed Maleki, Xu Cao, Ning Shang, Yilei Yang, Weijiang Xu, Mao Yang, Lintao Zhang, Lidong Zhou

Abstract

nnScaler enables domain experts to construct custom search spaces for parallel DNN training via three primitives, achieving up to 3.5x speedup over existing solutions like DeepSpeed, Megatron-LM, and Alpa.