Authors
Quan Zhou, Haiquan Wang, Xiaoyan Yu, Cheng Li, Youhui Bai, Feng Yan, Yinlong Xu
Abstract
MPress leverages memory-saving inter-operator parallelism to democratize billion-scale DNN model training on a single multi-GPU server, overcoming the GPU memory wall without requiring a distributed cluster.