About
I am a Ph.D. student in Computer Science at the National University of Singapore, researching at the Next++ Research Center with Prof. Roger Zimmermann. My work focuses on multimodal understanding — teaching machines to ground language in visual experiences, generate structured scene representations, and reason across vision and language.
My research interests include video moment retrieval, scene graph generation, temporal grounding, and cross-modal diffusion models.
News
- 2025 One paper on Video Moment Retrieval accepted at ICLR 2025.
- 2025 One paper on Video OOD Detection accepted at ICCV 2025.
- 2025 One paper on Video Dialog Grounding accepted at IEEE Transactions on Multimedia.
- 2024 Recognized as Outstanding Reviewer at ACM Multimedia 2024.
- 2024 Started Ph.D. in Computer Science at the National University of Singapore.
- 2024 Two papers accepted at ICASSP 2024.
- 2024 One paper on Panoptic Scene Graph Generation accepted at AAAI 2024.
- 2023 One paper accepted at ACM Multimedia 2023.
Research Experience
Next++ Sea Joint Lab, National University of Singapore
Research Intern
- Multi-modal Information Retrieval for Panoptic Scene Graph Generation
- Multi-modal Understanding for Video Moment Retrieval
- Described Spatial-Temporal Video Detection (DSTVD) benchmark and framework
Intelligent Machine Perception Lab, SUTD
Research Associate
- Fully Sparse Multi-modal 3D Object Detection with Dynamic Prompting
- Pretrained Diffusion for Single-view 3D Scene Generation
Publications
ICCV 2025
Secure On-Device Video OOD Detection Without Backpropagation
International Conference on Computer Vision
IEEE TMM 2025
Grounding is All You Need? Dual Temporal Grounding for Video Dialog
IEEE Transactions on Multimedia
AAAI 2024
Panoptic Scene Graph Generation with Semantics-prototype Learning
Association for the Advancement of Artificial Intelligence
ICASSP 2024
MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding
International Conference on Acoustics, Speech, & Signal Processing
ICASSP 2024
Domain-wise Invariant Learning for Panoptic Scene Graph Generation
International Conference on Acoustics, Speech, & Signal Processing
ACM MM 2023
Biased-Predicate Annotation Identification via Unbiased Visual Predicate Representation
ACM International Conference on Multimedia
Preprints & Under Review
CVPR 2026
Contextual Hashing Meets Lightweight Convolution: Accelerating Retrieval and Refining Localization for Video Corpus Moment Retrieval
— Under Review
IEEE TMM
Dynamic Graph-enhanced Event Refinement for Temporal Sentence Grounding of Micro-moments
— Under Major Revision
IEEE TGRS
SRDiff: A Cross-Modal Diffusion Model for Satellite-to-Radar Transformation in Precipitation Nowcasting
— Under Review
Academic Service
Conference Reviewer:
NeurIPS 2025 ·
ICLR 2025 ·
ICCV 2025 ·
ACM Multimedia 2023, 2024
(Outstanding Reviewer)
Education
National University of Singapore
Ph.D. in Computer Science
Aug 2024 — Present
National University of Singapore
Master of Computing, General Track GPA: 4.50 / 5.0
Aug 2022 — Jan 2024
University of Electronic Science and Technology of China
B.Sc. in Mathematics for Information & Computing Science GPA: 3.71 / 4.0
Sep 2018 — Jun 2022