You Qin

School of Computing, National University of Singapore

YQ
You Qin

Hello! I’m You, a Ph.D. student in Computer Science at the National University of Singapore, where I’m fortunate to be advised by Prof. Roger Zimmermann.

My research focuses on multimodal foundation models — including diffusion alignment, video understanding, and audio-visual generation — with an interest in approaches that are simple, scalable, and generalisable.

I am currently a research intern at Tencent Hunyuan, where I work with Dr. Chunyu Wang and Dr. Linqing Wang on diffusion-model alignment. Before that, I was a research associate at the Intelligent Machine Perception Lab at SUTD with Prof. Na Zhao, and a research intern at the Next++ Sea Joint Lab at NUS with Prof. Wei Ji.

Feel free to reach out if you would like to chat or discuss any ideas! You can find my contact information at the bottom of this page.

selected works

  1. NeurIPS 2026
    Under Review
    SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models
    You Qin, Linqing Wang, Hao Fei, Roger Zimmermann, Liefeng Bo, Qinglin Lu, Chunyu Wang
    Conference on Neural Information Processing Systems, 2026
    A scalable, reward-free post-training framework for rectified-flow diffusion models. Open-sourced at HY-SOAR (400+ ★).
  2. arXiv Survey
    Audio-Visual Intelligence in Large Foundation Models
    You Qin, Kai Liu, Shengqiong Wu, Kai Wang, Shijian Deng, Yapeng Tian, Junbin Xiao, Yazhou Xing, Yinghao Ma, Bobo Li, Roger Zimmermann, Lei Cui, Furu Wei, Jiebo Luo, Hao Fei
    arXiv:2605.04045 · 56 pages, 16 figures · 2026
  3. ICLR 2025
    Generalized Video Moment Retrieval
    You Qin, Qilong Wu, Yicong Li, Wei Ji, Li Li, Pengcheng Cai, Lina Wei, Roger Zimmermann
    In International Conference on Learning Representations, 2025
  4. ICCV 2025
    Secure On-Device Video OOD Detection Without Backpropagation
    Li Li, Peilin Cai, Yuxiao Zhou, Zhiyu Ni, Renjie Liang, You Qin, Yi Nian, Zhengzhong Tu, Xiyang Hu, Yue Zhao
    In International Conference on Computer Vision, 2025
  5. IEEE TMM 2026
    Grounding is All You Need? Dual Temporal Grounding for Video Dialog
    You Qin, Wei Ji, Xinze Lan, Hao Fei, Xun Yang, Dan Guo, Roger Zimmermann, Lizi Liao
    IEEE Transactions on Multimedia, 2026
  6. AAAI 2024
    Panoptic Scene Graph Generation with Semantics-prototype Learning
    Li Li, Wei Ji, Yiming Wu, Mengze Li, You Qin, Lina Wei, Roger Zimmermann
    In AAAI Conference on Artificial Intelligence, 2024
  7. ACM MM 2023
    Biased-Predicate Annotation Identification via Unbiased Visual Predicate Representation
    Li Li*, Chenwei Wang*, You Qin, Wei Ji, Renjie Liang
    In ACM International Conference on Multimedia, 2023