You Qin

Hello! I’m You, a Ph.D. student in Computer Science at the National University of Singapore, where I’m fortunate to be advised by Prof. Roger Zimmermann.

My research focuses on multimodal foundation models — including diffusion alignment, video understanding, and audio-visual generation — with an interest in approaches that are simple, scalable, and generalisable.

I am currently a research intern at Tencent Hunyuan, where I work with Dr. Chunyu Wang and Dr. Linqing Wang on diffusion-model alignment. Before that, I was a research associate at the Intelligent Machine Perception Lab at SUTD with Prof. Na Zhao, and a research intern at the Next++ Sea Joint Lab at NUS with Prof. Wei Ji.

Feel free to reach out if you would like to chat or discuss any ideas! You can find my contact information at the bottom of this page.

selected works

NeurIPS 2026
Under Review

SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models

You Qin, Linqing Wang, Hao Fei, Roger Zimmermann, Liefeng Bo, Qinglin Lu, Chunyu Wang

Conference on Neural Information Processing Systems, 2026

A scalable, reward-free post-training framework for rectified-flow diffusion models. Open-sourced at HY-SOAR (400+ ★).

arXiv Code Website HF
arXiv Survey

Audio-Visual Intelligence in Large Foundation Models

You Qin, Kai Liu, Shengqiong Wu, Kai Wang, Shijian Deng, Yapeng Tian, Junbin Xiao, Yazhou Xing, Yinghao Ma, Bobo Li, Roger Zimmermann, Lei Cui, Furu Wei, Jiebo Luo, Hao Fei

arXiv:2605.04045 · 56 pages, 16 figures · 2026

arXiv Awesome List
ICLR 2025

Generalized Video Moment Retrieval

You Qin, Qilong Wu, Yicong Li, Wei Ji, Li Li, Pengcheng Cai, Lina Wei, Roger Zimmermann

In International Conference on Learning Representations, 2025

PDF
ICCV 2025

Secure On-Device Video OOD Detection Without Backpropagation

Li Li, Peilin Cai, Yuxiao Zhou, Zhiyu Ni, Renjie Liang, You Qin, Yi Nian, Zhengzhong Tu, Xiyang Hu, Yue Zhao

In International Conference on Computer Vision, 2025

arXiv Code
IEEE TMM 2026

Grounding is All You Need? Dual Temporal Grounding for Video Dialog

You Qin, Wei Ji, Xinze Lan, Hao Fei, Xun Yang, Dan Guo, Roger Zimmermann, Lizi Liao

IEEE Transactions on Multimedia, 2026

arXiv
AAAI 2024

Panoptic Scene Graph Generation with Semantics-prototype Learning

Li Li, Wei Ji, Yiming Wu, Mengze Li, You Qin, Lina Wei, Roger Zimmermann

In AAAI Conference on Artificial Intelligence, 2024

arXiv Code
ACM MM 2023

Biased-Predicate Annotation Identification via Unbiased Visual Predicate Representation

Li Li*, Chenwei Wang*, You Qin, Wei Ji, Renjie Liang

In ACM International Conference on Multimedia, 2023