Huiqiang Jiang (姜慧强)

Research Manager in Qwen,
a fake MLSys/NLPer Google Scholar,
Research focus on Efficient Methods (in LLMs)

A unpopular blogger Blog & Zhihu
A programming enthusiast @iofu728

Phone: +86 178 xxxx xxxx
Email: iofu728[aT]gmail[DoT.]com
Huiqiang Jiang


Huiqiang Jiang obtained his Master's Degree in Software Engineering from Peking University, worked with Prof. Xiang Jing. And also was a research intern at the KC Group, Microsoft Research Asia (19/6-21/3) with Börje Karlsson and Guoxin Wang as well as the search group, Ant Group (20/6-20/8). He was a Research SDE in Microsoft Research Asia, Shanghai System Group (21/7-25/7).
Huiqiang's research primarily focuses on system-algorithm co-design, particularly on efficient methods to accelerate inference and training, including dynamic sparse attention (MInference, RetrievalAttention, MMInference), KV Cache centric analysis (SCBench), prompt compression (LLMLingua), speculative decoding, model compression, sparse inference (PIT), neural architecture search, and efficient tuning, with a particular emphasis on LLMs. Additionally, he is interested in addressing typical challenges in natural language processing.

I'm actively seeking research interns to collaborate on efficient LLM methods. If you're interested in these research topics, please contact me at iofu728[aT]gmail[DoT]com.

Selected Publications

† equal contribution, ‡ student I advised, § corresponding.

NLP & MLSys

  1. Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling
    Yucheng Li, Huiqiang Jiang§, Yang Xu, Jianxin Yang, Yi Zhang, Yizhong Cao, Yuhao Shen, Fan Zhou, Rui Men, Jianwei Zhang, An Yang, Bowen Yu, Bo Zheng, Fei Huang, Junyang Lin, Dayiheng Liu, Jingren Zhou.

  2. FlashQLA: Flash Qwen Linear Attention
    Chengruidong Zhang, Xi Lin, Huiqiang Jiang, Zekun Wang, Xiao Li, Yizhong Cao, Bohan Zhuang, Rui Men, Jianwei Zhang, Bo Zheng, Junyang Lin, Dayiheng Liu, Jingren Zhou.
    [Code]

  3. Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
    Chujie Zheng, Kai Dang, Bowen Yu, Mingze Li, Huiqiang Jiang, Junrong Lin, Yuqiong Liu, Hao Lin, Chencan Wu, Feng Hu, An Yang, Jingren Zhou, Junyang Lin.

  4. MTraining: Efficient Distributed Training for Ultra-Long Contexts via Dynamic Sparse Attention
    Wenxuan Li, Chengruidong Zhang, Huiqiang Jiang, Yucheng Li, Yuqing Yang, Lili Qiu.
    In Proc. of MLSys'26

  5. SortedRL: Accelerating RL Training for LLMs through Online Length-aware Scheduling
    Yiqi Zhang, Huiqiang Jiang, Xufang Luo, Zhihe Yang, Chengruidong Zhang, Yifei Shen, Dongsheng Li, Yuqing Yang, Lili Qiu, Yang You.
    In ICML Workshop Efficient Systems for Foundation Models (Es-FoMo), 2025

  6. RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
    Yaoqi Chen, Jinkai Zhang, Baotong Lu, Qianxi Zhang, Chengruidong Zhang, Jingjia Luo, Di Liu, Huiqiang Jiang, Qi Chen, Jing Liu, Bailu Ding, Xiao Yan, Jiawei Jiang, Chen Chen, Mingxing Zhang, Weiming Zhang, Yuqing Yang, Fan Yang, Mao Yang.
    In Proc. of VLDB'26
    [Code] [Project Page]

  7. MMInference: Accelerating Pre-filling for Long-Context Visual Language Models via Modality-Aware Permutation Sparse Attention
    Yucheng Li, Huiqiang Jiang§, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Amir H. Abdi, Dongsheng Li, Jianfeng Gao, Yuqing Yang, Lili Qiu.
    In Proc. of ICML'25
    [Code] [Project Page]

  8. SCBench: A KV Cache-Centric Analysis of Long-Context Methods
    Yucheng Li, Huiqiang Jiang§, Qianhui Wu, Xufang Luo, Surin Ahn, Chengruidong Zhang, Amir H. Abdi, Dongsheng Li, Jianfeng Gao, Yuqing Yang, Lili Qiu.
    In Proc. of ICLR'25
    [Code] [Project Page] [Dataset]

  9. RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
    Di Liu, Meng Chen, Baotong Lu, Huiqiang Jiang, Zhenhua Han, Qianxi Zhang, Qi Chen, Chengruidong Zhang, Bailu Ding, Kai Zhang, Chen Chen, Fan Yang, Yuqing Yang, Lili Qiu.
    In Proc. of NeurIPS'25
    also appeared in NeurIPS Workshop ENLSP-IV (Best Paper Award), 2024
    [Code]

  10. MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
    Huiqiang Jiang§, Yucheng Li, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu.
    In Proc. of NeurIPS'24 (Spotlight)
    [Code] [Project Page] [Demo]

  11. LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
    Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Rühle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang.
    In Proc. of ACL'24 Findings
    [Code] [Project Page] [Demo]

  12. LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression
    Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu.
    In Proc. of ACL'24
    [Code] [Project Page] [Demo]

  13. LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models
    Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, Lili Qiu.
    In Proc. of EMNLP'23 (Oral)
    [Code] [Project Page] [Demo]

  14. PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation
    Ningxin Zheng, Huiqiang Jiang, Quanlu Zhang, Zhenhua Han, Lingxiao Ma, Yuqing Yang, Fan Yang, Chengruidong Zhang, Lili Qiu, Mao Yang, Lidong Zhou.
    In Proc. of SOSP'23
    [Code]

Selected Honors & Awards

  • Awarded as Best Paper Award in ENLSP-IV @ NeurIPS'24, 2024.
  • Awarded as Top Reviewer in NeurIPS, 2024.
  • Awarded as Microsoft Global Hackathon Executive Challenge Winner Award, 2023, 2024.
  • Awarded as Microsoft Machine Learning, AI & Data Science Conference Distinguished Contribution Award Winner, 2024.

Academic Service

  • Area Chair: NeurIPS (26), ICML (26), ICLR (26), ARR (25)
  • Conference Reviewer: ICLR (24/25), NeurIPS (24/25), ICML (25), MLSys (26), ARR (23-25), KDD (25), AAAI (26), EMNLP (23), COLING (24/25)
  • Journal Reviewer: TMLR, TASLP, TIST, TMC

Last Updated: Jun, 2026 Website Hit Counter