* denotes corresponding authors. Full list available at DBLP, Google Scholar.
FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Renshan Zhang, Rui Shao*, Gongwei Chen, Kaiwen Zhou, Weili Guan, Liqiang Nie*
International Conference on Computer Vision (ICCV), 2025.
[arXiv] [Code] [Project Page]
Less is More: Empowering GUI Agent with Context-Aware Simplification
Gongwei Chen, Xurui Zhou, Rui Shao*, Yibo Lyu, Kaiwen Zhou, Shuai Wang, WenTao Li, Yinchuan Li, Zhongang Qi, Liqiang Nie*International Conference on Computer Vision (ICCV), 2025.
[arXiv] [Code] [Project Page]
Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation
Jiaer Xia, Bingkui Tong, Yuhang Zang, Rui Shao, Kaiyang ZhouInternational Conference on Computer Vision (ICCV), 2025.
PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning
Yibo Lyu, Rui Shao*, Gongwei Chen, Yijie Zhu, Weili Guan, Liqiang Nie*ACM International Conference on Multimedia (ACM MM), 2025.
[arXiv] [Code] [Project Page]
EmoSym: A Symbiotic Framework for Unified Emotional Understanding and Generation via Latent Reasoning
Yijie Zhu, Yibo Lyu, Zitong YU*, Rui Shao*, Kaiyang Zhou, Liqiang NieACM International Conference on Multimedia (ACM MM), 2025.
[arXiv] [Code] [Project Page]
CAT+: Investigating and Enhancing Audio-visual Understanding in Large Language Models
Qilang Ye, Zitong Yu, Rui Shao, Yawen Cui, Xiangui Kang, Xin Liu, Philip Torr, and Xiaochun Cao
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
[arXiv] [Code] [Project Page]
GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent
Bin Xie, Rui Shao*, Gongwei Chen, Kaiwen Zhou, Yinchuan Li, Jie Liu, Min Zhang, Liqiang Nie
Annual Meeting of the Association for Computational Linguistics (ACL), 2025.
[arXiv] [Code] [Project Page]
STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization
Hao Li, Qi Lv, Rui Shao*, Xiang Deng, Yinchuan Li, Jianye Hao, Liqiang Nie
International Conference on Machine Learning (ICML), 2025. Spotlight (2.6%)
LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
Wei Li, Bing Hu, Rui Shao*, Leyang Shen, Liqiang Nie
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025.
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Zaijing Li, Yuquan Xie, Rui Shao*, Gongwei Chen, Dongmei Jiang, Liqiang Nie*
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025.
[arXiv] [Code] [Project Page]
Spatial-Temporal Graph Diffusion Policy with Kinematics Modeling for Bimanual Robotic Manipulation
Qi Lv, Hao Li, Xiang Deng, Rui Shao, Yinchuan Li, Jianye Hao, Longxiang Gao, Michael Yu Wang, Liqiang Nie
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025.
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
Jingxuan Chen, Derek Yuen, Bin Xie, Yuhao Yang, Gongwei Chen, Zhihao Wu, Li Yixing, Xurui Zhou, Weiwen Liu, Shuai Wang, Kaiwen Zhou, Rui Shao*, Liqiang Nie, Yasheng Wang, Jianye Hao, Jun Wang, Kun Shao*
The Thirteenth International Conference on Learning Representations (ICLR), 2025. Spotlight (5.1%)
[arXiv] [Code] [Project Page]
Robust Sequential DeepFake Detection
Rui Shao, Tianxing Wu, Ziwei Liu
International Journal of Computer Vision (IJCV), 2025
DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection
Rui Shao, Tianxing Wu, Liqiang Nie, Ziwei Liu
International Journal of Computer Vision (IJCV), 2025
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks
Zaijing Li, Yuquan Xie, Rui Shao*, Gongwei Chen, Dongmei Jiang, Liqiang Nie*
Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.
[arXiv] [Code] [Project Page]
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
Leyang Shen, Gongwei Chen, Rui Shao*, Weili Guan, Liqiang Nie*
Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.
LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Gongwei Chen, Leyang Shen, Rui Shao*, Xiang Deng, Liqiang Nie*
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[arXiv] [Code] [Project Page] [Press]
Detecting and Grounding Multi-Modal Media Manipulation and Beyond
Rui Shao, Tianxing Wu, Jianlong Wu, Liqiang Nie, Ziwei Liu
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
[arXiv] [Code] [Project Page] [Press]
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Qilang Ye, Zitong Yu, Rui Shao, Xinyu Xie, Philip Torr, Xiaochun Cao
European Conference on Computer Vision (ECCV), 2024.
[arXiv] [Code] [Project Page]
Enhancing Emotional Generation Capability of Large Language Models via Emotional Chain-of-Thought
Zaijing Li, Rui Shao*, Gongwei Chen, Dongmei Jiang, Liqiang Nie
arXiv, 2024.
[arXiv]
Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding
Renshan Zhang, Yibo Lyu, Rui Shao*, Gongwei Chen, Weili Guan, Liqiang Nie*
arXiv, 2024.
[arXiv]
Detecting and Grounding Multi-Modal Media Manipulation
Rui Shao, Tianxing Wu, Ziwei Liu
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[arXiv] [Code] [Project Page] [Press]
Video Infilling with Rich Motion Prior
Xinyu Hou, Liming Jiang, Rui Shao, Chen Change Loy
British Machine Vision Conference (BMVC), 2023.
Detecting and Recovering Sequential DeepFake Manipulation
Rui Shao, Tianxing Wu, Ziwei Liu
European Conference on Computer Vision (ECCV), 2022.
[arXiv] [Code] [Project Page] [Poster] [Press1] [Press2] [Press3] [Press4]
Open-set Adversarial Defense with Clean-Adversarial Mutual Learning
Rui Shao, Pramuditha Perera, Pong C. Yuen, Vishal M. Patel
International Journal of Computer Vision (IJCV), 2022
Federated Test-Time Adaptive Face Presentation Attack Detection with Dual-Phase Privacy Preservation
Rui Shao, Bochao Zhang, Pong C. Yuen, Vishal M. Patel
IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2021
Focusing on Clinically Interpretable Features: Selective Attention Regularization for Liver Biopsy Image Classification
Chong Yin, Siqi Liu, Rui Shao, Pong C. Yuen
Medical Image Computing and Computer Assisted Interventions (MICCAI), 2021
[PDF]
Open-set Adversarial Defense
Rui Shao, Pramuditha Perera, Pong C. Yuen, Vishal M. Patel
European Conference on Computer Vision (ECCV), 2020
Regularized Fine-grained Meta Face Anti-spoofing
Rui Shao, Xiangyuan Lan, Pong C. Yuen
Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 2020
Multi-adversarial Discriminative Deep Domain Generalization for Face Presentation Attack Detection
Rui Shao, Xiangyuan Lan, Jiawei Li, Pong C. Yuen
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Joint Discriminative Learning of Deep Dynamic Textures for 3D Mask Face Anti-spoofing
Rui Shao, Xiangyuan Lan, Pong C. Yuen
IEEE Transactions on Information Forensics and Security (TIFS), 2019
Adversarial Auto-encoder for Unsupervised Deep Domain Adaptation
Rui Shao, Xiangyuan Lan
IET Image Processing (IET-IPR), 2019
[PDF]
Learning Modality-Consistency Feature Templates: A Robust RGB-Infrared Tracking System
Xiangyuan Lan, Mang Ye, Rui Shao, Bineng Zhong, Pong C. Yuen, Huiyu Zhou
IEEE Transactions on Industrial Electronics (TIE), 2019
[PDF]
Feature Constrained by Pixel: Hierarchical Adversarial Deep Domain Adaptation
Rui Shao, Xiangyuan Lan, Pong C. Yuen
ACM International Conference on Multimedia (ACM MM), 2018
Deep Convolutional Dynamic Texture Learning with Adaptive Channel-discriminability for 3D Mask Face Anti-spoofing
Rui Shao, Xiangyuan Lan, Pong C. Yuen
International Joint Conference on Biometrics (IJCB), 2017
[PDF]