MLSys @ USTC

Publications

Research on distributed training, tensor core optimization, and efficient AI systems.

2025
IEEE TPDS 2025 February 2025

GLPilot: Efficient Distributed GNN Training With Learnable Embeddings

Chengru Yang, Chaoyi Ruan, Chengjie Tang, Ping Gong, Shiyi Wang, Xiang Song, Cheng Li

GLPilot introduces a staleness-bounded embedding buffering mechanism to reduce remote fetches in distributed GNN training with learnable vertex embeddings.

GNNdistributed traininggraph learningembeddings
2024
OSDI 2024 July 2024

nnScaler: Constraint-Guided Parallelization Plan Generation for Deep Learning Training

Zhiqi Lin, Youshan Miao, Quanlu Zhang, Fan Yang, Yi Zhu, Cheng Li, Saeed Maleki, Xu Cao, Ning Shang, Yilei Yang, Weijiang Xu, Mao Yang, Lintao Zhang, Lidong Zhou

nnScaler generates efficient parallelization plans for DNN training via three primitives that capture any parallel plan's model transformation and spatiotemporal scheduling.

deep learningparallelizationDNN trainingdistributed systems
2023
SOSP 2023 October 2023

SPFresh: Incremental In-Place Update for Billion-Scale Vector Search

Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, Peng Cheng, Mao Yang

SPFresh supports efficient in-place vector updates for billion-scale ANNS through the LIRE lightweight incremental rebalancing protocol.

vector searchANNSin-place updatebillion-scale
EuroSys 2023 May 2023

FrozenHot Cache: Rethinking Cache Management for Modern Hardware

Ziyue Qiu, Juncheng Yang, Juncheng Zhang, Cheng Li, Xiaosong Ma, Qi Chen, Mao Yang, Yinlong Xu

FrozenHot improves cache scalability on modern multi-core hardware by separating a frozen hot segment that eliminates per-access synchronization overhead.

cachestoragescalabilitymulti-core
2021
SC 2021 November 2021

Lunule: An Agile and Judicious Metadata Load Balancer for CephFS

Yiduo Wang, Cheng Li, Xinyang Shao, Youxu Chen, Fang Yan, Yinlong Xu

Lunule proposes an imbalance factor model for accurate metadata load balancing in CephFS, enabling agile and efficient rebalancing.

CephFSmetadataload balancingdistributed file system
SOSP 2021 October 2021

Gradient Compression Supercharged High-Performance Data Parallel DNN Training

Youhui Bai, Cheng Li, Quan Zhou, Jun Yi, Ping Gong, Feng Yan, Ruichuan Chen, Yinlong Xu

This paper introduces Tensor Homomorphic Compression (THC) enabling direct aggregation of compressed gradients, supercharging data parallel DNN training.

gradient compressionDNN trainingdistributed systemscommunication
2020
2019
2018
USENIX ATC 2018 July 2018

Fine-grained Consistency for Geo-Replicated Systems

Cheng Li, Nuno Preguiça, Rodrigo Rodrigues

This paper presents PoR consistency, a novel fine-grained consistency definition that generalizes the trade-off between performance and coordination in geo-replicated systems.

consistencygeo-replicationdistributed systems