Selected Publications

* denotes corresponding authors. Full list available at DBLP, Google Scholar.

2025

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers

Renshan Zhang, Rui Shao*, Gongwei Chen, Kaiwen Zhou, Weili Guan, Liqiang Nie*

International Conference on Computer Vision (ICCV), 2025.

[arXiv] [Code] [Project Page]


Less is More: Empowering GUI Agent with Context-Aware Simplification

Gongwei Chen, Xurui Zhou, Rui Shao*, Yibo Lyu, Kaiwen Zhou, Shuai Wang, WenTao Li, Yinchuan Li, Zhongang Qi, Liqiang Nie*

International Conference on Computer Vision (ICCV), 2025.

[arXiv] [Code] [Project Page]


Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation

Jiaer Xia, Bingkui Tong, Yuhang Zang, Rui Shao, Kaiyang Zhou

International Conference on Computer Vision (ICCV), 2025.

[arXiv] [Code]


PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning

Yibo Lyu, Rui Shao*, Gongwei Chen, Yijie Zhu, Weili Guan, Liqiang Nie*

ACM International Conference on Multimedia (ACM MM), 2025.

[arXiv] [Code] [Project Page]


EmoSym: A Symbiotic Framework for Unified Emotional Understanding and Generation via Latent Reasoning

Yijie Zhu, Yibo Lyu, Zitong YU*, Rui Shao*, Kaiyang Zhou, Liqiang Nie

ACM International Conference on Multimedia (ACM MM), 2025.

[arXiv] [Code] [Project Page]


CAT+: Investigating and Enhancing Audio-visual Understanding in Large Language Models

Qilang Ye, Zitong Yu, Rui Shao, Yawen Cui, Xiangui Kang, Xin Liu, Philip Torr, and Xiaochun Cao

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025

[arXiv] [Code] [Project Page]


GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent

Bin Xie, Rui Shao*, Gongwei Chen, Kaiwen Zhou, Yinchuan Li, Jie Liu, Min Zhang, Liqiang Nie

Annual Meeting of the Association for Computational Linguistics (ACL), 2025.

[arXiv] [Code] [Project Page]


STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization

Hao Li, Qi Lv, Rui Shao*, Xiang Deng, Yinchuan Li, Jianye Hao, Liqiang Nie

International Conference on Machine Learning (ICML), 2025. Spotlight (2.6%)

[arXiv] [Code]


LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant

Wei Li, Bing Hu, Rui Shao*, Leyang Shen, Liqiang Nie

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025.

[arXiv] [Code]


Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy

Zaijing Li, Yuquan Xie, Rui Shao*, Gongwei Chen, Dongmei Jiang, Liqiang Nie*

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025.

[arXiv] [Code] [Project Page]


Spatial-Temporal Graph Diffusion Policy with Kinematics Modeling for Bimanual Robotic Manipulation

Qi Lv, Hao Li, Xiang Deng, Rui Shao, Yinchuan Li, Jianye Hao, Longxiang Gao, Michael Yu Wang, Liqiang Nie

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025.

[arXiv] [Code]


SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation

Jingxuan Chen, Derek Yuen, Bin Xie, Yuhao Yang, Gongwei Chen, Zhihao Wu, Li Yixing, Xurui Zhou, Weiwen Liu, Shuai Wang, Kaiwen Zhou, Rui Shao*, Liqiang Nie, Yasheng Wang, Jianye Hao, Jun Wang, Kun Shao*

The Thirteenth International Conference on Learning Representations (ICLR), 2025. Spotlight (5.1%)

[arXiv] [Code] [Project Page]


Robust Sequential DeepFake Detection

Rui Shao, Tianxing Wu, Ziwei Liu

International Journal of Computer Vision (IJCV), 2025

[arXiv] [PDF] [Code]


DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection

Rui Shao, Tianxing Wu, Liqiang Nie, Ziwei Liu

International Journal of Computer Vision (IJCV), 2025

[arXiv] [PDF] [Code]


2024

Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks

Zaijing Li, Yuquan Xie, Rui Shao*, Gongwei Chen, Dongmei Jiang, Liqiang Nie*

Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.

[arXiv] [Code] [Project Page]


MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

Leyang Shen, Gongwei Chen, Rui Shao*, Weili Guan, Liqiang Nie*

Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.

[arXiv] [Code]


LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

Gongwei Chen, Leyang Shen, Rui Shao*, Xiang Deng, Liqiang Nie*

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

[arXiv] [Code] [Project Page] [Press]


Detecting and Grounding Multi-Modal Media Manipulation and Beyond

Rui Shao, Tianxing Wu, Jianlong Wu, Liqiang Nie, Ziwei Liu

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024

[arXiv] [Code] [Project Page] [Press]


CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios

Qilang Ye, Zitong Yu, Rui Shao, Xinyu Xie, Philip Torr, Xiaochun Cao

European Conference on Computer Vision (ECCV), 2024.

[arXiv] [Code] [Project Page]


Enhancing Emotional Generation Capability of Large Language Models via Emotional Chain-of-Thought

Zaijing Li, Rui Shao*, Gongwei Chen, Dongmei Jiang, Liqiang Nie

arXiv, 2024.

[arXiv]


Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding

Renshan Zhang, Yibo Lyu, Rui Shao*, Gongwei Chen, Weili Guan, Liqiang Nie*

arXiv, 2024.

[arXiv]


2023

Detecting and Grounding Multi-Modal Media Manipulation

Rui Shao, Tianxing Wu, Ziwei Liu

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

[arXiv] [Code] [Project Page] [Press]


Video Infilling with Rich Motion Prior

Xinyu Hou, Liming Jiang, Rui Shao, Chen Change Loy

British Machine Vision Conference (BMVC), 2023.

[arXiv] [Code]


2022

Detecting and Recovering Sequential DeepFake Manipulation

Rui Shao, Tianxing Wu, Ziwei Liu

European Conference on Computer Vision (ECCV), 2022.

[arXiv] [Code] [Project Page] [Poster] [Press1] [Press2] [Press3] [Press4]


Open-set Adversarial Defense with Clean-Adversarial Mutual Learning

Rui Shao, Pramuditha Perera, Pong C. Yuen, Vishal M. Patel

International Journal of Computer Vision (IJCV), 2022

[arXiv] [PDF] [Code]


Federated Generalized Face Presentation Attack Detection

Rui Shao, Pramuditha Perera, Pong C. Yuen, Vishal M. Patel

IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022.

[arXiv] [PDF] [Code]


2021

Federated Test-Time Adaptive Face Presentation Attack Detection with Dual-Phase Privacy Preservation

Rui Shao, Bochao Zhang, Pong C. Yuen, Vishal M. Patel

IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2021

[arXiv] [PDF]


Focusing on Clinically Interpretable Features: Selective Attention Regularization for Liver Biopsy Image Classification

Chong Yin, Siqi Liu, Rui Shao, Pong C. Yuen

Medical Image Computing and Computer Assisted Interventions (MICCAI), 2021

[PDF]


2020

Open-set Adversarial Defense

Rui Shao, Pramuditha Perera, Pong C. Yuen, Vishal M. Patel

European Conference on Computer Vision (ECCV), 2020

[arXiv] [PDF] [Code]


Regularized Fine-grained Meta Face Anti-spoofing

Rui Shao, Xiangyuan Lan, Pong C. Yuen

Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 2020

[arXiv] [PDF] [Poster] [Code]


2019

Multi-adversarial Discriminative Deep Domain Generalization for Face Presentation Attack Detection

Rui Shao, Xiangyuan Lan, Jiawei Li, Pong C. Yuen

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019

[PDF] [Poster] [Code] [Model]


Joint Discriminative Learning of Deep Dynamic Textures for 3D Mask Face Anti-spoofing

Rui Shao, Xiangyuan Lan, Pong C. Yuen

IEEE Transactions on Information Forensics and Security (TIFS), 2019

[PDF] [Code]


Adversarial Auto-encoder for Unsupervised Deep Domain Adaptation

Rui Shao, Xiangyuan Lan

IET Image Processing (IET-IPR), 2019

[PDF]


Learning Modality-Consistency Feature Templates: A Robust RGB-Infrared Tracking System

Xiangyuan Lan, Mang Ye, Rui Shao, Bineng Zhong, Pong C. Yuen, Huiyu Zhou

IEEE Transactions on Industrial Electronics (TIE), 2019

[PDF]


2018 and before

Feature Constrained by Pixel: Hierarchical Adversarial Deep Domain Adaptation

Rui Shao, Xiangyuan Lan, Pong C. Yuen

ACM International Conference on Multimedia (ACM MM), 2018

[PDF] [Poster] [Code]


Deep Convolutional Dynamic Texture Learning with Adaptive Channel-discriminability for 3D Mask Face Anti-spoofing

Rui Shao, Xiangyuan Lan, Pong C. Yuen

International Joint Conference on Biometrics (IJCB), 2017

[PDF]


© 2025 OrionLab