You Qin
Hello! I’m You, a Ph.D. student in Computer Science at the National University of Singapore, where I’m fortunate to be advised by Prof. Roger Zimmermann.
My research focuses on multimodal foundation models — including diffusion alignment, video understanding, and audio-visual generation — with an interest in approaches that are simple, scalable, and generalisable.
I am currently a research intern at Tencent Hunyuan, where I work with Dr. Chunyu Wang and Dr. Linqing Wang on diffusion-model alignment. Before that, I was a research associate at the Intelligent Machine Perception Lab at SUTD with Prof. Na Zhao, and a research intern at the Next++ Sea Joint Lab at NUS with Prof. Wei Ji.
Feel free to reach out if you would like to chat or discuss any ideas! You can find my contact information at the bottom of this page.
selected works
-
NeurIPS 2026Under Review
-
arXiv SurveyAudio-Visual Intelligence in Large Foundation ModelsarXiv:2605.04045 · 56 pages, 16 figures · 2026
-
ICLR 2025Generalized Video Moment RetrievalIn International Conference on Learning Representations, 2025
-
IEEE TMM 2026Grounding is All You Need? Dual Temporal Grounding for Video DialogIEEE Transactions on Multimedia, 2026
-
ACM MM 2023Biased-Predicate Annotation Identification via Unbiased Visual Predicate RepresentationIn ACM International Conference on Multimedia, 2023