I'm a CS PhD candidate at the University of Texas at Dallas, advised by Dr. Yunhui Guo. I'm also a part of the Data Efficient Intelligent Learning Lab. I broadly study computer vision and representation learning.
In the initial years of my PhD, I explored continual learning, adaptive, and on-device ML, and designed efficient schemes to make models robust and adaptable in open and dynamic environments. My research is grounded in the belief that AI systems must continually adapt to the world, and not be stagnant. I also spent the Summer of 2025 at Dolby Laboratories working on robust audio-visual learning in continual learning settings.
Lately, I've also been trying to break into the Physical AI space especially for autonomous driving. There's a great deal of work to be done in building systems that can perceive, reason, plan, and generalize in safety-critical environments.
Before my PhD, I completed my Master's in Electrical Engineering from USC and a Bachelor's degree from IIIT Bhubaneswar, India. I've been fortunate to work with Dr. Yonggang Shi (USC), Dr. Shri Narayanan (USC), Dr. Ren Hongliang (NUS), Dr. Prasanta Kumar Ghosh (IISc), and Dr. Aurobinda Routray (IIT-Kharagpur).
I have published at top-tier ML/computer vision/signal processing venues such as TMLR, ICCV, NeurIPS (3x), AAAI, ECCV (2x), and ICASSP (2x).
If my research interests align with yours, I'd love to chat, discuss, and explore potential collaborations. Feel free to reach out!
A selective parameter retrieval mechanism to dynamically retrieve, plugin, and adapt cross-modal fusion parameters for challenging audio-visual CTTA. Largely minimizes source knowledge catastrophic forgetting.
A comprehensive survey on continual test-time adaptation and future directions.
Variational unlearning framework for the unlearning of user-specific classes/concepts in pre-trained diffusion models.
A comprehensive benchmark designed to evaluate the test-time robustness of audio-visual models. We hope this will drive future research on robust, adaptable audio-visual systems in real-world settings.
A voxel-centric submodular approach tailored for active LiDAR semantic segmentation.
Bimodal online test-time adaptation method to improve CLIP's robustness to common corruptions. Also extends to domain generalization settings.
Adaptive learning rate continual test-time adaptation method based on model prediction uncertainty and parameter sensitivity to rapid distributional shifts.
Effectiveness of pre-trained self-supervised learning representations for acoustic-to-articulatory inversion of dysarthric speech.
|
Source code by Jon Barron, with a few other added elements. |