Authors
Haiquan Wang, Chaoyi Ruan, Jia He, Jiaqi Ruan, Chengjie Tang, Xiaosong Ma, Cheng Li
Abstract
DHeLlam targets communication bottlenecks in distributed LLM training and introduces automatic micro-batch co-execution to improve overall training efficiency.