Ji Zhang

I am Ji Zhang (张继), an Assistant Professor in the School of Computing and Artificial Intelligence at Southwest Jiaotong University (SWJTU). I received my Ph.D. degree in 2024 from the School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), where I was very fortunate to be advised by Prof. Jingkuan Song and Prof. Lianli Gao.

My research interests include Robotics, Computer Vision and Few-shot Learning. I am particularly interested in designing advanced algorithms that exploit robotic foundation models (i.e., vision-language-action models (VLAs)) to tackle challenging real-world robotic applications. If you are also interested in related topics, please do not hesitate to reach out.

Scholar  /  Github  /  Email  /  WeChat  /  中文主页

profile photo

News

[04/2025] I have been invited to be a PC member for NeurIPS'25.
[03/2025] I have been invited to be the PC members for ICCV'25 and ACM MM'25.
[02/2025] Our paper about vision-language models (VLMs) was accepted by CVPR'25.
[11/2024] I have been invited to be a PC member for CVPR'25.
[08/2024] I have been invited to be a PC member for ICLR'25.
[02/2024] Our paper about vision-language models (VLMs) was accepted by CVPR'24.
[10/2023] Our paper about out-of-distirbution (OOD) detection was accepted by IEEE TIP.
[10/2023] I'm awarded the "Graduate Chinese National Scholarship".
[07/2023] Our paper about few-shot learning was accepted by ICCV'23.
[04/2023] Our paper about few-shot learning was accepted by ICML'23.
[06/2022] Our papers about few-shot learning and continue learning are accepted by ACM MM'22.
[03/2022] Our paper about meta-learning was accepted by IEEE TCSVT.
[06/2021] Our paper about meta-learning was accepted by ACM MM'21.

    Selected Papers

       (* Equal contribution. # Corresponding author)
InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning
Ji Zhang*, Shihan Wu*, Xu Luo, Hao Wu, Lianli Gao, Heng Tao Shen, Jingkuan Song
arXiv, 2025
[Paper][Project]

Mitigating the adverse effects of spurious correlations by boosting the spatial reasoning ability of vision-language-action (VLA) models.

Policy Contrastive Decoding for Robotic Foundation Models
Shihan Wu*, Ji Zhang*, Xu Luo, Junlin Xie, Lianli Gao, Heng Tao Shen, Jingkuan Song
arXiv, 2025
[Paper][Project]

A simple, training-free, easy-to-implement and plug-and-play scheme for addressing the spurious correlation issue in robot policies.

Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
Shihan Wu, Ji Zhang#, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Heng Tao Shen
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025
[Paper][Code]

Achieving effective and efficient adaptation of large pre-trained vision-language models.

DePT: Decoupled Prompt Tuning
Ji Zhang*, Shihan Wu*, Lianli Gao, Heng Tao Shen, Jingkuan Song
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[Paper][Code]

Overcoming the base-new tradeofff (BNT) problem for existing prompt tuning methods.

From Global to Local: Multi-scale Out-of-distribution Detection
Ji Zhang, Lianli Gao, Bingguang Hao, Hao Huang, Jingkuan Song, Heng Tao Shen
IEEE Transactions on Image Procesing (TIP), 2023
[Paper][Code]

Leveraging both global visual information and local region details of images to maximally benefit OOD detection.

DETA: Denoised Task Adaptation for Few-shot Learning
Ji Zhang, Lianli Gao, Xu Luo, Heng Tao Shen, Jingkuan Song
IEEE International Conference on Computer Vision (ICCV), 2023
[Paper][Code]

Tacking both the X-noise (i.e., image noise) and the Y-noise (i.e., label noise) in a unified framework for test-time few-shot tasks.

A Closer Look at Few-shot Classification Again
Xu Luo*, Hao Wu*, Ji Zhang, Lianli Gao, Jing Xu, Jingkuan Song
International Conference on Machine Learning (ICML), 2023
[Paper][Code]

Empirically proving the disentanglement of training and test-time adaptation algorithms in few-shot classification.

Free-lunch for Cross-domain Few-shot learning: Style-aware Episodic Training with Robust Contrastive Learning
Ji Zhang, Jingkuan Song, Lianli Gao, Heng Tao Shen
ACM International Conference on Multimedia (ACM MM), 2022
[Paper][Code]

Addressing the side-effect of style-shift between tasks from source and target domains.

Progressive Meta-learning with Curriculum
Ji Zhang, Jingkuan Song, Lianli Gao, Ye Liu, Heng Tao Shen
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2022
[Paper][Code]

An extended version the ACM MM'21 paper, where the curriculum is effectively integrated as a regularization term into the objective so that the meta-learner can measure the hardness of tasks adaptively.

Curriculum-based Meta-learning
Ji Zhang, Jingkuan Song, Yazhou Yao, Lianli Gao
ACM International Conference on Multimedia (ACM MM), 2021
[Paper][Code]

Progressively improving the meta-learner by performing episodic training on simulating tasks from easy to hard, i.e., in a curriculum learning manner.

    Academic Service

I'm a reviewer of several top journals/conferences, e.g. CVPR, ICCV, NeurIPS, ICLR, ACM MM, AAAI, IEEE TPAMI, IEEE TIP.

This well-designed template is borrowed from Jonbarron.