BIO
I am a third-year PhD student in the School of Computer Science at Carnegie Mellon University(CMU), specializing in machine learning and computer software engineering. My research focuses on Large Language Model post-training, make the LLMs better (more align with domain specific tasks), faster (more efficient in training and inference), and cheaper (training with less GPU hours and GPU memory utilization).
At CMU, I am advised by Prof. Heather Miller. Previously, I earned my master in Computer Science from New York University advised by Prof. Anna Choromanska and Prof. Parijat Dube. I received my B.S. in Computer Science and Engineering from The Chinese University of Hong Kong(CUHK) where I worked with Prof. David Zhang, Dapeng and Prof. Rui Huang. Before starting my PhD, My research mainly focuses on distributed machine learning system.
- (This personal website is updated as of March 2025.)
News
- 5-2025: I will intern at AWS-AI-Labs@Amazon this summer working on LLM post-training enabled speculative decoding via latent space reasoning.
- 2-2025: Open-source SMT. We implemented SMT in two frameworks: DeepSpeed and Hugging Face Trainer.
- 2-2025: SMT: Fine-Tuning Large Language Models with Sparse Matrices, has been accepted by ICLR 2025.
- 5-2024: Adjacent Leader Decentralized Stochastic Gradient Descent has been accepted by ECAI 2024.
- 4-2024: Multi-View Radar Autoencoder for Self-Supervised Automotive Radar Representation Learning has been accepted by IEEE Intelligent Vehicles Symposium (IV) 2024.
- I started my Ph.D. journey at CMU.
- 2- 2023: Open-source a general codebase to implement any (de)centralized, (a)synchronous distributed SGD algorithms when models fit into a single machine. The paper, which proposes a novel distributed SGD algorithm.
Selected Publications
Haoze He, Juncheng Billy Li, Xuan Jiang, Heather Miller, “Sparse Matrix in Large Language Model Fine-Tuning”, International Conference on Learning Representations (ICLR), Accepted, Jan. 2025. [code]
Haoze He, Jing Wang, Anna Choromanska, “Adjacent Leader Decentralized Stochastic Gradient Descent”, European Conference on Artificial Intelligence (ECAI), Accepted, June 2024. [code]
My full publication list can be found on my Google Scholar profile.
Academic Blog
- Peter Zhong, Haoze He, Omar Khattab, Christopher Potts, Matei Zaharia, Heather Miller, “A Guide to Large Language Model Abstractions”, Jan. 2024.
Education
- Ph.D. in Machine Learning and Software Engineering at Carnegie Mellon University, 2023-present
- GPA: 4.16/4.0, Rank: top1%
- M.S. in Computer Engineering at New York University, 2021-2023
- GPA: 3.93/4.0, Rank: top1%
- B.S. in Computer Science and Engineering at The Chinese University of Hong Kong, 2016-2020
Work Experience
- Applied Research Scientist Intern, Amazon AWS AI Labs, Summer 2025
- Teaching Assistant, Carnegie Mellon University, LTI at SCS, Large Language Model Systems (11-868), Spring 2025
- Research Assistant, Carnegie Mellon University, S3D at SCS, 2023 ~ present
- Research Assistant, New York University, Engineering School, 2022 ~ 2023
Awards
- Presidential Fellowship, Carnegie Mellon University, Nov. 2024
Service
- Reviewer, International Conference on Learning Representations (ICLR) — 2025, 2026
- Reviewer, International Joint Conference on Neural Networks (IJCNN) — 2025
- Reviewer, The Association for the Advancement of Artificial Intelligence (AAAI) — 2025
- Reviewer, International Conference on Acoustics, Speech, and Signal Processing (ICASSP) — 2022 – 2025
- Reviewer, International Conference on Computer Vision (ICCV) Workshops — 2023
Open-Sources for the Community
- Build an open-source website for NYU EECS/DS community and help 150+ NYU students each semester. This website summary the open-source courses in NYU EECS/DS, provide links and repositories for each course, list the workload, and provide course experiences for reference. Anyone from the NYU community is welcome to fork and contribute!