Xiaofeng Hou
I am an assistant professor in the Department of Computer Science and Engineering, Shanghai Jiao Tong University (SJTU). At SJTU, I’m a member of Emerging Parallel Computing Center (EPCC) and Sustainable Architecture and Intelligence Laboratory (SAIL). Prior to joining SJTU, I worked with Prof. Kwang-Ting CHENG at the AI Chip Center for Emerging Smart Systems (ACCESS), Hong Kong University of Science and Technology (HKUST). I earned my PhD degree from Shanghai Jiao Tong University under the joint supervision of Prof. Chao Li and Prof. Minyi Guo. I earned my BS degree from Dalian University of Technology.
My research addresses the critical computing challenges in the era of AI. I specialize in architecture-system co-designs for highly-efficient intelligent computing. My work spans from autonomous edge devices to hyperscale datacenters, with a focus on:
Efficient Multi-Modal LLM Serving: Developing novel hardware and software solutions to reduce the latency and energy costs of large model inference.
Automated Architecture/System Designs: Using automated methods to discover and implement optimal computer architectures and systems for emerging applications.
【诚招】博士/硕士/本科实习生 — 共同打造下一代高效智能计算系统
我的课题组专注于高效能、可持续的AI计算,通过软硬件协同设计,解决大模型时代的算力与能耗挑战。课题组长期招收博士生、硕士生和本科实习生。目前有如下多个科研课题:
- 多模态模型推理加速: 让文生图/视频模型跑得更快、更省电;
- 稀疏混合专家MOE大模型系统优化: 突破显存墙,高效服务万亿参数的稀疏大模型;
- 边缘与端侧推理加速技术: 在汽车、卫星等边缘场景实现低功耗的本地AI;
- 大模型微服务化系统: 构建灵活、高可用的分布式AI服务系统。
感兴趣的同学欢迎发邮件(hou-xf at cs.sjtu.edu.cn)与我交流。
News
| Nov 9, 2025 | Two papers are accepted by AAAI 2026 (The 40th Annual AAAI Conference on Artificial Intelligence)! Preprint coming soon. |
|---|---|
| Oct 27, 2025 | Paper MoE-APEX: An Efficient MoE Inference System with Adaptive Precision Expert Offloading accepted to ASPLOS 2026 (The ACM International Conference on Architectural Support for Programming Languages and Operating Systems,)! Preprint coming soon. |
| Apr 27, 2025 | Paper SpaceExit: Enabling Efficient Adaptive Computing in Space with Early Exits accepted to USENIX ATC 2025 (The 2025 USENIX Annual Technical Conference)! Preprint coming soon. |
| Jan 29, 2025 | Paper EXIST: Enabling Extremely Efficient Intra-Service Tracing Observability in Datacenters accepted to ASPLOS 2025 (The 2025 International Conference on Architectural Support for Programming Languages and Operating Systems)! Preprint coming soon. |
| Oct 21, 2024 | Two papers were selected for the Best Paper Nominees in ICCD 2024 (International Conference on Computer Design)! Preprint coming soon. |
| Aug 21, 2024 | TACO and RTSS have accepted our work on system optimizations for heterogeneous autonomous driving platforms! Preprint coming soon. |
| Jul 11, 2024 | Paper A Tale of Two Domains: Exploring Efficient Architecture Design for Truly Autonomous Things was selected for the Best Paper Session in ISCA 2024 (International Symposium on Computer Architecture)! Preprint coming soon. |
| Jun 22, 2024 | Paper CPM: A Cross-layer Power Management Facility to Enable Highly-efficient Real-time AIoT System received the Best Paper Honorable Mention Award in IWQoS 2024 (IEEE/ACM International Symposium on Quality of Service)! Preprint coming soon. |
| Mar 23, 2024 | We will host the First International Workshop on Acceleration and Optimization of Multi-modal Computing (AOMC 2024) @Co-located with ISCA 2024 at Buenos Aires, Argentina, June 2024. |
| Mar 21, 2024 | Paper A Tale of Two Domains: Exploring Efficient Architecture Design for Truly Autonomous Things accepted to ISCA (The 51st International Symposium on Computer Architecture)! Preprint coming soon. |
| Sep 29, 2023 | Paper SMG: A System-level Modality Gating Facility for Fast and Energy-Efficient Multimodal Computing accepted to RTSS (IEEE Real-Time Systems Symposium)! Preprint coming soon. |
| Sep 11, 2023 | The project code and tutorials for MMBench are now available!. |
| Aug 22, 2023 | Paper MMBench: Benchmarking End-to-End Multi-modal DNNs and Understanding Their Hardware-Software Implications accepted to the top conference on workload characterization IISWC (IEEE International Symposium on Workload Characterization). |
| May 4, 2023 | Paper MMExit: Enabling Fast and Efficient Multi-modal DNN Inference with Adaptive Network Exits accepted as one of Best Paper Nominees (4/164)!. |
| May 1, 2023 | Paper MMExit: Enabling Fast and Efficient Multi-modal DNN Inference with Adaptive Network Exits accepted to Euro-Par (The International European Conference on Parallel and Distributed Computing)! Preprint coming soon. |
| Mar 11, 2023 | Paper Architecting Efficient Multi-modal AIoT Systems accepted to ISCA (The 50th International Symposium on Computer Architecture)! Preprint coming soon. |
| Oct 19, 2022 | Paper Characterizing and Understanding End-to-End Multi-modal Neural Networks on GPUs accepted to IEEE CAL (The IEEE Computer Architecture Letters)! Preprint coming soon. |
| Mar 12, 2022 | Paper Enabling Efficient Request Management through Microservice Level Parallelism accepted to IPDPS (The IEEE International Parallel and Distributed Processing Symposium)! Preprint coming soon. |
| Nov 26, 2021 | DataCLUE covered on the AINLP! |
| Nov 18, 2021 | Preprint of DataCLUE, a benchmark suite for data-centeric NLP, is now availiable on Arxiv! Project code coming soon! |
| Jan 19, 2021 | I have joined ACCESS as a Post-doctoral Fellow, working with Prof. CHENG! |
| Aug 25, 2020 |
I passed my PhD defense! Preprint of the dissertation coming soon. |
| Aug 21, 2020 | Paper ANT-Man: Towards Agile Power Management in the Microservice Era accepted to SC (The International Conference for High Performance Computing, Networking, Storage, and Analysis)! Preprint coming soon. |
| Oct 2, 2019 | I am a Visiting Scholar as a Junior Specialist in University of California Riverside, advised by Prof. Shaolei Ren. |
| Aug 9, 2018 | Paper Power Grab in Aggressively Provisioned Data Centers: What is the Risk and What Can Be Done About It received the Best Paper Award (4/264) in ICCD (International Conference on Computer Design)! Preprint coming soon. |
Selected Publications
- ASPLOS 2026MoE-APEX: An Efficient MoE Inference System with Adaptive Precision Expert OffloadingIn International Conference on Architectural Support for Programming Languages and Operating Systems 2026
- USENIX ATC 2025SpaceExit: Enabling Efficient Adaptive Computing in Space with Early ExitsIn USENIX Annual Technical Conference 2025
- ASPLOS 2025EXIST: Enabling Extremely Efficient Intra-Service Tracing Observability in DatacentersIn International Conference on Architectural Support for Programming Languages and Operating Systems 2025
- ICCD 2024 Best Paper NomineeAutoVCoder: A Systematic Framework for Automated Verilog Code Generation using LLMsIn IEEE International Conference on Computer Design 2024
- ICCD 2024 Best Paper NomineeContinuous Energy Efficiency Optimization for Autonomous Embedded Systems Using Shadow CyclesIn IEEE International Conference on Computer Design 2024
- TC 2024Improving Efficiency in Multi-modal Autonomous Embedded Systems through Adaptive GatingIn IEEE Transactions on Computers 2024
- RTSS 2024Jigsaw: Taming BEV-centric Perception on Dual-SoC for Autonomous DrivingIn EEE Real-Time Systems Symposium 2024
- TACO 2024A2: Towards Accelerator Level Parallelism for Autonomous Micromobility SystemsIn ACM Transactions on Architecture and Code Optimization 2024
- TPDS 2024WASP: Efficient Power Management Enabling Workload-Aware, Self-Powered AIoT DevicesIn IEEE Transactions on Parallel and Distributed Systems 2024
- IWQoS 2024 Best Paper NomineeCPM: A Cross-layer Power Management Facility to Enable Highly-efficient Real-time AIoT SystemsIn IEEE/ACM International Symposium on Quality of Service 2024
- ISCA 2024 Best Paper NomineeA Tale of Two Domains: Exploring Efficient Architecture Design for Truly Autonomous ThingsIn Proceedings of the 51st Annual International Symposium on Computer Architecture 2024
- RTSS 2023SMG: A System-level Modality Gating Facility for Fast and Energy-Efficient Multimodal ComputingIn IEEE Real-Time Systems Symposium 2023
- Euro-Par 2023 Best Paper NomineeMMExit: Enabling Fast and Efficient Multi-modal DNN Inference with Adaptive Network ExitsIn European Conference on Parallel Processing, 2023
- ISCA 2023Architecting Efficient Multi-modal AIoT SystemsIn Proceedings of the 50th Annual International Symposium on Computer Architecture 2023
- SC 2020ANT-man: Towards Agile Power Management in the Microservice EraIn International Conference for High Performance Computing, Networking, Storage and Analysis, 2020
- ICCD 2018 Best Paper AwardPower Grab in Aggressively Provisioned Data Centers: What is the Risk and What Can Be Done About ItIn International Conference on Computer Design, 2018
- ISCA 2016Power Attack Defense: Securing Battery-Backed Data CentersInternational Symposium on Computer Architecture, 2016