TM

Teng Ma / 马腾

Staff Engineer & Research Leader, Alibaba Apsara Lab

I was born in Anhui, China. I am a Staff Engineer (P8) and Research Leader in the Operating System Lab (OSLab) MLSys group at Alibaba Cloud. I received my Ph.D. from Tsinghua University (advised by Prof. Yongwei Wu and Prof. Kang Chen, working closely with Dr. Mingxing Zhang and Prof. Xuehai Qian), with postdoctoral research at CASIA & Alibaba (advised by Dr. Zhengyu He and Prof. Zhaoxiang Zhang). I spent half a year as a visiting student at Prof. Shan Lu's group at the University of Chicago. I build innovative software systems exploiting new hardware and kernel features in novel architectures such as memory disaggregation and LLM. I maintain and contribute to open-source projects including Mooncake, SGLang, Dynamo, RBG, and AIGW.

News

43+
Publications
680+
Citations
13
h-index
5K+
GitHub Stars

Publications

43+ papers at top systems venues: SOSP (2), ASPLOS (2), ATC (4), EuroSys (2), VLDB (2), ICDE, TPDS (3), ToN, ToS, DAC (2), CLUSTER (2), SC, INFOCOM. Google Scholar.

2026
Surviving Partial Rank Failures in Wide Expert-Parallel MoE Inference
X Sun, S Chen, P Ma, ..., T Ma, ...
arXiv 2026
ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL
W Gao, Y Zhao, ..., T Ma, ...
arXiv 2026
LightDSA: Enabling Efficient DSA Through Hardware-Aware Transparent Optimization
Y Wang*, T Ma*, Y Luo, D He, Z Liu, Y Chai
EuroSys 2026 Co-first
Bridging the GPU Utilization Gap: Predictive Multi-Dimensional Resource Scheduling for AI Workloads
Y Lu, D He, T Ma, Z Liu, L Ruan, J Jiang, Y Wu
EuroSys 2026
Pooling Engram Conditional Memory in Large Language Models using CXL
R Ma, T Ma*, Z Su, H Zha, et al.
EuroMLSys 2026 Corresponding
TENT: A Declarative Slice Spraying Engine for Data Movement in Disaggregated LLM Serving
F Ren, R Qin, T Ma, S Cai, et al.
FAISys Workshop 2026
Adaptive Multi-Objective Tiered Storage Configuration for KV Cache in LLM Service
X Zheng, Z Wang, R Ma, ..., T Ma, ...
arXiv 2026
Shard: A Scalable and Resize-Optimized Hash Index on Disaggregated Memory
H Zha, T Ma*, B Lu, Y Wang, et al.
VLDB 2026 Corresponding
SHMemora: Protective Key-Value Store on Distributed Shared Memory
J Luo, S Lin, Y Xu, S Liu, J Xia, D Liu, Z Liu, H Zhang, T Ma*, S Deng*
ICDE 2026 Corresponding
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
R Qin, Z Li, W He, J Cui, H Tang, F Ren, T Ma, S Cai, Y Zhang, M Zhang
ACM ToS 2026
2025
Tokencake: A KV-Cache-centric Serving Framework for LLM-based Multi-Agent Applications
Z Bian, F Wu, T Ma, Y Zhuo
arXiv 2025
TokenSim: Enabling Hardware and Software Exploration for LLM Inference Systems
F Wu, Z Bian, G Duan, T Xu, J Wu, T Ma, Y Yao, R Gong, Y Zhuo
APPT 2025
MemTunnel: A CXL-based Rack-Scale Host Memory Pooling Architecture for Cloud Service
T Guan, Y Guan, Z Du, ..., T Ma, Z Liu, et al.
IEEE TPDS 2025
DSA-2LM: A CPU-Free Tiered Memory Architecture with Intel DSA
R Liu, T Ma*, M Zhang, J Huang, et al.
ATC 2025 Corresponding
Utilizing Contrastive Learning for Locating Network Anomalies in Real-time Conferencing
T Ma, D He, Z Ming, J Xu, L Cui, Y Chai
ICME 2025 1st Author
CXL-Interplay: Unraveling and Characterizing CXL Interference in Modern Computer Systems
S Mao, J Luo, ..., Z Liu, T Ma*, S Deng*
DAC 2025 Corresponding
RAGNAR: Exploring Volatile-Channel Vulnerabilities on RDMA NIC
Y Xu, Y Fan, T Ma, S Deng
DAC 2025
2024
TrEnv: Transparently Share Serverless Execution Environments Across Different Functions and Nodes
J Huang, M Zhang*, T Ma*, Z Liu, S Lin, K Chen, et al.
SOSP 2024 Corresponding
HydraRPC: RPC in the CXL Era
T Ma, Z Liu, C Wei, J Huang, Y Zhuo, et al.
ATC 2024 1st Author
Revisiting Distributed Memory in the CXL Era
T Ma, M Zhang, K Chen, J Huang, Z Liu, Y Wu
HotInfra 2024 1st Author
ZERO+: Monitoring Large-Scale Cloud-Native Infrastructure Using One-Sided RDMA
Z Song, J Wu, T Ma*, Z Wang, L Kong, et al.
IEEE/ACM ToN 2024 Corresponding
Diagnosing Application-network Anomalies for Millions of IPs in Production Clouds
Z Wang, H Hu, L Kong, ..., T Ma, et al.
ATC 2024
LogGenius: An Unsupervised Log Parsing Framework with Zero-shot Prompt Engineering
X Yu, S Nong, D He, W Zheng, T Ma, N Liu, J Li, G Xie
ICWS 2024
2023
Partial Failure Resilient Memory Management System for (CXL-based) Distributed Shared Memory
M Zhang*, T Ma*, J Hua, Z Liu, K Chen, et al.
SOSP 2023 Co-first
Efficient Scheduler Live Update for Linux Kernel with Modularization
T Ma, S Chen, Y Wu, E Deng, Z Song, Q Chen, M Guo
ASPLOS 2023 1st Author
DySched: Relieving Large-Scale RDMA Incast for Cloud-Native Applications
J Wu, Z Wang, T Ma, L Kong, et al.
ISPA 2023
LigBee: Symbol-Level Cross-Technology Communication from LoRa to ZigBee
Z Wang, L Kong, ..., T Ma, Z Song, Z Liu, G Chen
IEEE INFOCOM 2023
2022
A Survey of Storage Systems in the RDMA Era
S Ma*, T Ma*, K Chen, Y Wu
IEEE TPDS 2022 Co-first
Zero Overhead Monitoring for Cloud-native Infrastructure using RDMA
Z Wang, T Ma, L Kong, et al.
ATC 2022
Log-ROC: Log Structured RAID on Open-Channel SSD
T Ma, Z Li, N Liu
ICCD 2022 1st Author
SeqDLM: A Sequencer-based Distributed Lock Manager for Efficient Shared File Access
Q Chen, S Ma, K Chen, T Ma, X Liu, D Chen, Y Wu, Z Chen
SC 2022
2021
Thinking More about RDMA Memory Semantics
T Ma, K Chen, S Ma, Z Song, Y Wu
CLUSTER 2021 1st Author
HybridSkipList: A Case Study of Designing Distributed Data Structure with Hybrid RDMA
T Ma, N Liu, D He
COMPSAC 2021 1st Author
CUBIST: High-Quality 360-Degree Video Streaming via Tile-based Edge Caching
D He, T Ma, J Jiang, C Westphal, et al.
ICWS 2021
2020
AsymNVM: An Efficient Framework for Implementing Persistent Data Structures on Asymmetric NVM Architecture
T Ma, M Zhang, K Chen, X Qian, Z Song, Y Wu
ASPLOS 2020 1st Author
2019
X-RDMA: Effective RDMA Middleware in Large-scale Production
T Ma, T Ma, Z Song, J Li, H Chang, K Chen, H Jiang, Y Wu
CLUSTER 2019 1st Author
2018
RF-RPC: Remote Fetching RPC Paradigm for RDMA-Enabled Network
Y Wu, T Ma, M Su, M Zhang, K Chen, Z Guo
IEEE TPDS 2018
Large Scale Communication in Cloud Needs Hybrid RDMA Schema
T Ma, M Zhang, Z Song, M Liu, K Chen, Y Wu
OSDI 2018 (Poster) 1st Author
NVM Allocator in Disaggregation Era
T Ma, M Zhang, D He, K Chen, Y Wu
OSDI 2018 (Poster) 1st Author
2016
Measuring and Optimizing Distributed Array Programs
M Zhang, Y Wu, K Chen, T Ma, W Zheng
VLDB 2016

Open Source

Mooncake 5K+

Maintainer · PyTorch Foundation Project

KVCache-centric disaggregated architecture for LLM serving. 172 contributors. Adopted by Alibaba, Ant, JD, Tencent, iFLYTEK, Meituan.

LLM InferenceKV CacheDisaggregationRDMA

SGLang

Committer · 6K+ LOC

High-performance LLM serving. Co-designed PD/EPD Disaggregation, HiCache, Checkpoint Engine, Sparse KVCache.

LLM ServingHiCachePD Disagg

Dynamo

Committer · NVIDIA

NVIDIA's inference framework. Transfer Engine integration for training-inference heterogeneous communication.

NVIDIAInferenceElastic EP

RBG & AIGW

Committer

AI serving gateway and routing. Designed the full AI serving stack (SGLang + Mooncake + AIGW + RBG), recognized as "Most Influential Open Source Project" by InfoQ.

AI GatewayRoutingInfoQ

LMSys Projects

(Co)Lead Multiple Projects

HiCache, PD Disaggregation, EPD Disaggregation, Chunked Pipeline Parallel, rfork, GB200 Deployment, Kimi K2.

LMSys BlogProduction

Contributor Projects

Contributor · VERL, LMCache, vLLM, LMDeploy, ROLL, Checkpoint Engine

End-to-end training frameworks, KV cache connectors, and inference engine integrations across the LLM ecosystem.

TrainingvLLMRL

Talks & Presentations

2026.06
记忆感知的大模型KVCache优化
AICon 全球人工智能开发与应用大会 · Shanghai
2026.03
圆桌主持:智算新生态 — 异构AI算力底座如何驱动大模型全场景落地
龙蜥 x SGLang 智算沙龙 (OpenAnolis x SGLang MeetUp) · Shanghai
2025.10
从同构走向分离的大模型推理系统
CNCC 2025 开源AI基础设施论坛
2025.08
Mooncake:面向长上下文的KVCache中心化推理优化方案
AICon 全球人工智能开发与应用大会 · Shenzhen
2025.08
AI推理系统演进下的基础软件与开源生态
CCF 中国开源大会 · Shanghai
2025.05
Mooncake项目最新进展
SGLang x MUSA Meetup (Moore Threads) · Beijing
2025.02
SGLang 和 LMCache 基于 Mooncake 实现高效PD分离框架
GDC 2025 全球开发者先锋大会 · Shanghai
2024.12
Revisiting Distributed Memory in the CXL Era
UT Arlington, TX, USA
2024.07
HydraRPC: RPC in the CXL Era
SAMSUNG · San Jose, CA, USA
2023.03
Efficient Scheduler Live Update for Linux Kernel with Modularization
ASPLOS 2023 · Vancouver, Canada
2022.05
Plugsched: A Safe and Efficient Live Update Approach for Cloud OS Scheduler
ChinaSys 2022 · Guiyang
2020.09
非对称NVM架构上的持久化数据结构设计
夏培肃青年论坛 · CAS, Beijing
2020.03
AsymNVM: Persistent Data Structures on Asymmetric NVM Architecture
ASPLOS 2020 · Lausanne, Switzerland
2019.09
X-RDMA: Effective RDMA Middleware in Large-scale Production
IEEE CLUSTER 2019 · Albuquerque, USA
2018.10
Large Scale Communication in Cloud Needs Hybrid RDMA Schema
OSDI 2018 · Carlsbad, USA
2018.08
Reports of HPC-AI Competition
NSCC · Singapore

Experience

Dec 2023 – Present
Staff Engineer (P8), Research Leader · Alibaba Apsara Lab
OSLab, MLSys · Beijing
Leading LLM training/inference infrastructure. Managing 8-person engineering team.
Jun 2021 – Dec 2023
Senior Engineer (P7) · Alibaba DAMO Academy
OSLab, MLSys · Beijing
Leveraging new hardware (RDMA/NVM/CXL/DSA) for Function Computing & LLM Serving.
Jul 2021 – Dec 2023
Postdoc · Institute of Automation, CAS & Alibaba
Advisors: Prof. Zhaoxiang Zhang & Dr. Zhengyu He (Ant Group CTO)
Sep 2018 – Nov 2019
Research Intern · Alibaba Inc.
Designed RDMA middleware (X-RDMA). 3000+ node deployment in Alibaba Cloud.
Sep 2016 – Jun 2021
Ph.D. in Computer Science · Tsinghua University
Advisors: Prof. Yongwei Wu (IEEE Fellow) & Prof. Kang Chen
Jan 2020 – May 2020
Visiting Ph.D. Student · University of Chicago
Advisor: Prof. Shan Lu
Sep 2012 – Jun 2016
B.S. in Computer Science · USTB
University of Science and Technology Beijing

Honors & Awards

OS2ATC Open Source Contribution Award2026
ChinaSys Best Poster (5/88)2024
CCF TCSS Outstanding Doctoral Dissertation2023
China Postdoctoral Science Foundation (80K RMB)2023
AliStar Fellowship2021
Top 10 Research Intern, Alibaba2020
Huawei Scholarship2020
Renmin Scholarship (First Prize)2019
Sohu Scholarship2019
1st Prize, 6th RDMA Programming Competition2018
1st Place, APAC HPC-AI Competition2018
3rd Prize, 5th RDMA Programming Competition2017
National Scholarship2015
Bronze, ACM-ICPC Asia Shanghai Regional2015

Academic Service

Patents

US Patent 10,613,992 · US Patent 11,237,925 · CN202410712069 · CN202311072436 · CN202310271211 · CN202211510303 · CN202210682464 · CN202210557119 · CN202310573810 · CN202010427533