Rui Shao - Harbin Institute of Technology (Shenzhen)

Welcome to the multimOdal peRception, reasonIng, and decisiON (Orion) Lab at HIT(SZ), led by Prof. Rui Shao. We study intelligent agents based on Multimodal Large Language Model (MLLM) that can perceive, reason, and act through interaction with the world.

Looking for self-motivated Ph.D/M.S./Undergraduate students. [2026硕士/博士招生, 3-4名硕士, 2名博士]

News

07/2025: Two papers about MLLM are accepted by ACM MM 2025.
06/2025: Three papers about MLLM, AI Agent are accepted by ICCV 2025.
06/2025: One paper about Audio-Visual Multimodal Large Language Model is accepted by TPAMI.
05/2025: One paper about GUI agent is accepted by ACL Main 2025 .
05/2025: Invited to serve as ICMR 2025 Panel Co-Chairs and BMVC 2025 Area Chair.
05/2025: One paper about Robot Skill Learning is accepted by ICML 2025 as Spotlight (2.6%).
02/2025: Three papers about Ego-Centric video MLLM, MLLM agent and Embodied MLLM are accepted by CVPR 2025.
01/2025: One paper about SmartPhone Multimodal Agent accepted by ICLR 2025 as Spotlight (5.1%).
12/2024: The extension of our ECCV 2022 paper (SeqDeepFake) has been accepted by IJCV.
10/2024: We have built GitHub Orgnization of JiuTian-VL that will post all information about our JiuTian MLLM.
10/2024: I have one paper about the adapter of large vision models accepted by IJCV.
10/2024: Two papers about MLLMs are accepted at NeurIPS 2024, including contributions from a undergraduate.
07/2024: One paper about Audio-Visual Multimodal Large Language Model accepted by ECCV 2024.
02/2024: Our Multimodal Large Language Model (MLLM)- JiuTian-LION has been accepted by CVPR 2024.
02/2024: The extension of our CVPR 2023 paper has been accepted by TPAMI.
08/2023: We have built the GitHub Repo for our Multimodal Large Language Model (MLLM)- JiuTian . Enjoy it!
04/2023: I have released the code and dataset of our CVPR 2023 work in our GitHub Repo . Enjoy it!
02/2023: I have one paper accepted by CVPR 2023. Code and dataset will be released soon. Please stay tuned!
07/2022: I have one paper accepted by ECCV 2022. We have released the code and dataset in our project page
05/2022: I have released the code of Federated Generalized Face Presentation Attack Detection in TNNLS 2022. Codes
04/2022: I have one paper accepted by TNNLS .
03/2022: I have released the code of Open-set Adversarial Defense with Clean-Adversarial Mutual Learning in IJCV 2022. Codes
01/2022: The extension of our ECCV 2020 paper has been accepted by IJCV.
08/2020: I have released the code of Open-set Adversarial Defense in ECCV 2020. Codes
07/2020: I have one paper accepted by ECCV 2020. See you online!
11/2019: I have one paper accepted by AAAI 2020. See you at New York City, USA!
02/2019: I have one paper accepted by CVPR 2019. See you at Long Beach, USA!
02/2019: One paper is accepted by TIE.
08/2018: I have one paper accepted by TIFS .
07/2018: I have released the code of Hierarchical Adversarial Deep Domain Adaptation in ACMMM 2018. Codes
07/2018: I have one paper accepted by ACM MM 2018. See you at Seoul, Korea!
08/2018: I have a new homepage.

About Me

I am interested in computer vision and multimodal learning. My current research focuses on Multimodal Large Language Model (MLLM) (e.g., JiuTian MLLM) and its applications on Embodied AI.

Rui Shao (邵睿)

Professor

News

About Me

Biography

Services