Publication

MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism

It remains challenging to train billion-scale DNN models on a single modern multi-GPU server due to the GPU memory wall. MPress democratizes this by leveraging memory-saving inter-operator parallelism, enabling users to train previously infeasible large models without a distributed cluster.

HPCA 2023 / February 2023
model trainingGPU memoryinter-operator parallelismLLM

Authors

Quan Zhou, Haiquan Wang, Xiaoyan Yu, Cheng Li, Youhui Bai, Feng Yan, Yinlong Xu

Abstract

MPress leverages memory-saving inter-operator parallelism to democratize billion-scale DNN model training on a single multi-GPU server, overcoming the GPU memory wall without requiring a distributed cluster.