|
Huiqiang Jiang obtained his Master's Degree in Software Engineering from
Peking University,
worked with Prof. Xiang Jing.
And also was a research intern at the KC Group, Microsoft Research Asia (19/6-21/3) with
Börje Karlsson
and Guoxin Wang
as well as the search group, Ant Group (20/6-20/8).
He was a Research SDE in Microsoft Research Asia, Shanghai System Group (21/7-25/7).
Huiqiang's research primarily focuses on system-algorithm co-design, particularly on efficient methods to accelerate inference and training, including dynamic sparse attention (MInference, RetrievalAttention, MMInference), KV Cache centric analysis (SCBench), prompt compression (LLMLingua), speculative decoding, model compression, sparse inference (PIT), neural architecture search, and efficient tuning, with a particular emphasis on LLMs. Additionally, he is interested in addressing typical challenges in natural language processing.
I'm actively seeking research interns to collaborate on efficient LLM methods. If you're interested in these research topics, please contact me at iofu728[aT]gmail[DoT]com.
Selected Publications
† equal contribution, ‡ student I advised, § corresponding.
NLP & MLSys
-
Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling
Yucheng Li†‡, Huiqiang Jiang†§, Yang Xu, Jianxin Yang, Yi Zhang, Yizhong Cao, Yuhao Shen, Fan Zhou, Rui Men, Jianwei Zhang, An Yang, Bowen Yu, Bo Zheng, Fei Huang, Junyang Lin, Dayiheng Liu, Jingren Zhou.
-
FlashQLA: Flash Qwen Linear Attention
Chengruidong Zhang, Xi Lin, Huiqiang Jiang, Zekun Wang, Xiao Li, Yizhong Cao, Bohan Zhuang, Rui Men, Jianwei Zhang, Bo Zheng, Junyang Lin, Dayiheng Liu, Jingren Zhou.
[Code]
-
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Chujie Zheng, Kai Dang, Bowen Yu, Mingze Li, Huiqiang Jiang, Junrong Lin, Yuqiong Liu, Hao Lin, Chencan Wu, Feng Hu, An Yang, Jingren Zhou, Junyang Lin.
-
MTraining: Efficient Distributed Training for Ultra-Long Contexts via Dynamic Sparse Attention
Wenxuan Li†‡, Chengruidong Zhang†, Huiqiang Jiang†, Yucheng Li, Yuqing Yang, Lili Qiu.
In Proc. of MLSys'26
-
SortedRL: Accelerating RL Training for LLMs through Online Length-aware Scheduling
Yiqi Zhang†‡, Huiqiang Jiang†, Xufang Luo†, Zhihe Yang, Chengruidong Zhang, Yifei Shen, Dongsheng Li, Yuqing Yang, Lili Qiu, Yang You.
In ICML Workshop Efficient Systems for Foundation Models (Es-FoMo), 2025
-
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
Yaoqi Chen, Jinkai Zhang, Baotong Lu, Qianxi Zhang, Chengruidong Zhang, Jingjia Luo, Di Liu, Huiqiang Jiang, Qi Chen, Jing Liu, Bailu Ding, Xiao Yan, Jiawei Jiang, Chen Chen, Mingxing Zhang, Weiming Zhang, Yuqing Yang, Fan Yang, Mao Yang.
In Proc. of VLDB'26
[Code]
[Project Page]
-
MMInference: Accelerating Pre-filling for Long-Context Visual Language Models via Modality-Aware Permutation Sparse Attention
Yucheng Li‡, Huiqiang Jiang§, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Amir H. Abdi, Dongsheng Li, Jianfeng Gao, Yuqing Yang, Lili Qiu.
In Proc. of ICML'25
[Code]
[Project Page]
-
SCBench: A KV Cache-Centric Analysis of Long-Context Methods
Yucheng Li‡, Huiqiang Jiang§, Qianhui Wu, Xufang Luo, Surin Ahn, Chengruidong Zhang, Amir H. Abdi, Dongsheng Li, Jianfeng Gao, Yuqing Yang, Lili Qiu.
In Proc. of ICLR'25
[Code]
[Project Page]
[Dataset]
-
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Di Liu‡, Meng Chen, Baotong Lu, Huiqiang Jiang, Zhenhua Han, Qianxi Zhang, Qi Chen, Chengruidong Zhang, Bailu Ding, Kai Zhang, Chen Chen, Fan Yang, Yuqing Yang, Lili Qiu.
In Proc. of NeurIPS'25
also appeared in NeurIPS Workshop ENLSP-IV (Best Paper Award), 2024
[Code]
-
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Huiqiang Jiang†§, Yucheng Li†‡, Chengruidong Zhang†, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu.
In Proc. of NeurIPS'24 (Spotlight)
[Code]
[Project Page]
[Demo]
-
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Rühle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang.
In Proc. of ACL'24 Findings
[Code]
[Project Page]
[Demo]
-
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression
Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu.
In Proc. of ACL'24
[Code]
[Project Page]
[Demo]
-
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models
Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, Lili Qiu.
In Proc. of EMNLP'23 (Oral)
[Code]
[Project Page]
[Demo]
-
PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation
Ningxin Zheng†, Huiqiang Jiang†, Quanlu Zhang, Zhenhua Han, Lingxiao Ma, Yuqing Yang, Fan Yang, Chengruidong Zhang, Lili Qiu, Mao Yang, Lidong Zhou.
In Proc. of SOSP'23
[Code]
Selected Honors & Awards
- Awarded as Best Paper Award in ENLSP-IV @ NeurIPS'24, 2024.
- Awarded as Top Reviewer in NeurIPS, 2024.
- Awarded as Microsoft Global Hackathon Executive Challenge Winner Award, 2023, 2024.
- Awarded as Microsoft Machine Learning, AI & Data Science Conference Distinguished Contribution Award Winner, 2024.
Academic Service
- Area Chair: NeurIPS (26), ICML (26), ICLR (26), ARR (25)
- Conference Reviewer: ICLR (24/25), NeurIPS (24/25), ICML (25), MLSys (26), ARR (23-25), KDD (25), AAAI (26), EMNLP (23), COLING (24/25)
- Journal Reviewer: TMLR, TASLP, TIST, TMC
Last Updated: Jun, 2026
|