MLSys @ USTC

Projects

Systems for distributed LLM training and efficient AI infrastructure.

TrainingActive

TWIST

High-Efficiency LLM Training System via Strand Interleaving on NVIDIA Hopper GPUs.

llmtrainingdistributed-systemshopper
SystemsActive

Adacluster

Adaptive clustering system for large-scale ML workloads with dynamic resource scheduling.

ml-systemsschedulingresource-management