Rui Shao (邵睿)  

Professor


School of Computer Science and Technology,

Harbin Institute of Technology (Shenzhen)

Email: shaorui[AT]hit.edu.cn, rshaojimmy[AT]gmail.com

[Google Scholar] [GitHub] [LinkedIn] [CV] [中文简历] [中文主页]


News

About Me

I am currently a Professor at School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen). Prior to that, I was a postdoc at Nanyang Technological University, Singapore, working with Prof. Ziwei Liu and Prof. Chen Change Loy.

I received my PhD degree from Department of Computer Science, Hong Kong Baptist University in 2021, supervised by Prof. Pong C. Yuen, and my bachelor degree from School of Information and Communication Engineering, University of Electronic Science and Technology of China (UESTC) in 2015. I also spent a memorable high-school time in Shenzhen Foreign Languages School. I visited the Johns Hopkins University for 6 months in 2020.

I am interested in computer vision and multimodal learning. My current research focuses on Multimodal Large Language Model (MLLM) (e.g., "JiuTian" MLLM ) and its trustworthy issues.

Looking for self-motivated Ph.D/M.S./Undergraduate students. [2024硕士/博士招生, 4-5名硕士, 2名博士]
Looking for PostDocs in computer vision and MLLM.

Biography

  • 2021-2023, Research Fellow, MMLab@NTU, Singapore
  • 2021.7- 2021.11, Researcher, SenseTime, Shenzhen, China, participating project of MMhuman3D codebase.
  • 2017-2021, Ph.D., Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
  • 2020.2-2020.7, Visiting scholar at Johns Hopkins University, working with Prof. Vishal M Patel, Baltimore, U.S.
  • 2011-2015, B.S., School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China

    Pre-prints (* denotes corresponding authors)

    Enhancing Emotional Generation Capability of Large Language Models via Emotional Chain-of-Thought

    Zaijing Li, Rui Shao*, Gongwei Chen, Dongmei Jiang, Liqiang Nie

    [arXiv]

    Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding

    Renshan Zhang, Yibo Lyu, Rui Shao*, Gongwei Chen, Weili Guan, Liqiang Nie*

    [arXiv]

    Selected Publications (* denotes corresponding authors)

    Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks

    Zaijing Li, Yuquan Xie, Rui Shao*, Gongwei Chen, Dongmei Jiang, Liqiang Nie*

    Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.

    [arXiv] [Code] [Project Page]

    MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

    Leyang Shen, Gongwei Chen, Rui Shao*, Weili Guan, Liqiang Nie*

    Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.

    [arXiv] [Code]

    LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

    Gongwei Chen, Leyang Shen, Rui Shao*, Xiang Deng, Liqiang Nie*

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

    [arXiv] [Code] [Project Page] [Press]

    Robust Sequential DeepFake Detection

    Rui Shao, Tianxing Wu, Ziwei Liu

    International Journal of Computer Vision (IJCV), 2025

    [arXiv] [Code]

    DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection

    Rui Shao, Tianxing Wu, Liqiang Nie, Ziwei Liu

    International Journal of Computer Vision (IJCV), 2025

    [arXiv] [Code]

    Detecting and Grounding Multi-Modal Media Manipulation and Beyond

    Rui Shao, Tianxing Wu, Jianlong Wu, Liqiang Nie, Ziwei Liu

    IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024

    [arXiv] [Code] [Project Page] [Press]

    CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios

    Qilang Ye, Zitong Yu, Rui Shao, Xinyu Xie, Philip Torr, Xiaochun Cao

    European Conference on Computer Vision (ECCV), 2024.

    [arXiv] [Code] [Project Page]

    Detecting and Grounding Multi-Modal Media Manipulation

    Rui Shao, Tianxing Wu, Ziwei Liu

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

    [arXiv] [Code] [Project Page] [Press]

    Video Infilling with Rich Motion Prior

    Xinyu Hou, Liming Jiang, Rui Shao, Chen Change Loy

    British Machine Vision Conference (BMVC), 2023.

    [arXiv] [Code]

    Detecting and Recovering Sequential DeepFake Manipulation

    Rui Shao, Tianxing Wu, Ziwei Liu

    European Conference on Computer Vision (ECCV), 2022.

    [arXiv] [Code] [Project Page] [Poster] [Press1] [Press2] [Press3] [Press4]

    Open-set Adversarial Defense with Clean-Adversarial Mutual Learning

    Rui Shao, Pramuditha Perera, Pong C. Yuen, Vishal M. Patel

    International Journal of Computer Vision (IJCV), 2022

    [arXiv] [PDF] [Code]

    Federated Generalized Face Presentation Attack Detection

    Rui Shao, Pramuditha Perera, Pong C. Yuen, Vishal M. Patel

    IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022.

    [arXiv] [PDF] [Code]

    Federated Test-Time Adaptive Face Presentation Attack Detection with Dual-Phase Privacy Preservation

    Rui Shao, Bochao Zhang, Pong C. Yuen, Vishal M. Patel

    IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2021

    [arXiv] [PDF]

    Focusing on Clinically Interpretable Features: Selective Attention Regularization for Liver Biopsy Image Classification

    Chong Yin, Siqi Liu, Rui Shao, Pong C. Yuen,

    Medical Image Computing and Computer Assisted Interventions (MICCAI), 2021

    [PDF]

    Open-set Adversarial Defense

    Rui Shao, Pramuditha Perera, Pong C. Yuen, Vishal M. Patel

    European Conference on Computer Vision (ECCV), 2020

    [arXiv] [PDF] [Code]

    Regularized Fine-grained Meta Face Anti-spoofing

    Rui Shao, Xiangyuan Lan, Pong C. Yuen

    Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 2020

    [arXiv] [PDF] [Poster] [Code]

    Multi-adversarial Discriminative Deep Domain Generalization for Face Presentation Attack Detection

    Rui Shao, Xiangyuan Lan, Jiawei Li, Pong C. Yuen

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019

    [PDF] [Poster] [Code] [Model]

    Joint Discriminative Learning of Deep Dynamic Textures for 3D Mask Face Anti-spoofing

    Rui Shao, Xiangyuan Lan, Pong C. Yuen

    IEEE Transactions on Information Forensics and Security (TIFS), 2019

    [PDF] [Code]

    Adversarial Auto-encoder for Unsupervised Deep Domain Adaptation

    Rui Shao, Xiangyuan Lan

    IET Image Processing. (IET-IPR), 2019

    [PDF]

    Feature Constrained by Pixel: Hierarchical Adversarial Deep Domain Adaptation

    Rui Shao, Xiangyuan Lan, Pong C. Yuen

    ACM international conference on Multimedia (ACM MM), 2018

    [PDF] [Poster] [Code]


    Deep Convolutional Dynamic Texture Learning with Adaptive Channel-discriminability for 3D Mask Face Anti-spoofing

    Rui Shao, Xiangyuan Lan, Pong C. Yuen

    International Joint Conference on Biometrics (IJCB), 2017

    [PDF]


    Learning Modality-Consistency Feature Templates: A Robust RGB-Infrared Tracking System

    Xiangyuan Lan, Mang Ye, Rui Shao, Bineng Zhong, Pong C. Yuen, Huiyu Zhou

    IEEE Transactions on Industrial Electronics (TIE), 2019

    [PDF]

    Services

    Area Chair:
  • ACM MM 2024
  • BMVC 2024

  • Collaborators

  • Prof. Ziwei Liu, Nanyang Technological University
  • Prof. Vishal M Patel, Johns Hopkins University
  • Dr. Xiangyuan Lan, Peng Cheng Laboratory
  • Dr. Pramuditha Perera, Johns Hopkins University, AWS AI Lab