Sarthak Kumar Maharana

Sarthak Kumar Maharana

Email: sarthakmaharana9811@gmail.com
CV & Google Scholar & GitHub & LinkedIn

I'm a CS PhD candidate at the University of Texas at Dallas, advised by Dr. Yunhui Guo. I'm also a part of the Data Efficient Intelligent Learning Lab. I broadly study computer vision and representation learning.

In the initial years of my PhD, I explored continual learning, adaptive, and on-device ML, and designed efficient schemes to make models robust and adaptable in open and dynamic environments. My research is grounded in the belief that AI systems must continually adapt to the world, and not be stagnant. I also spent the Summer of 2025 at Dolby Laboratories working on robust audio-visual learning in continual learning settings.

Lately, I've also been trying to break into the Physical AI space especially for autonomous driving. There's a great deal of work to be done in building systems that can perceive, reason, plan, and generalize in safety-critical environments.

Before my PhD, I completed my Master's in Electrical Engineering from USC and a Bachelor's degree from IIIT Bhubaneswar, India. I've been fortunate to work with Dr. Yonggang Shi (USC), Dr. Shri Narayanan (USC), Dr. Ren Hongliang (NUS), Dr. Prasanta Kumar Ghosh (IISc), and Dr. Aurobinda Routray (IIT-Kharagpur).

I have published at top-tier ML/computer vision/signal processing venues such as TMLR, ICCV, NeurIPS (3x), AAAI, ECCV (2x), and ICASSP (2x).

If my research interests align with yours, I'd love to chat, discuss, and explore potential collaborations. Feel free to reach out!

News

Papers (Preprints included)

avctta
Audio-Visual Continual Test-Time Adaptation without Forgetting
Sarthak Kumar Maharana, Akshay Mehra, Bhavya Ramakrishna, Yunhui Guo, Guan-Ming Su
In ECCV, 2026 & ICML Continual Adaptation at Scale: Towards Sustainable AI (CATS) Workshop 2026

A selective parameter retrieval mechanism to dynamically retrieve, plugin, and adapt cross-modal fusion parameters for challenging audio-visual CTTA. Largely minimizes source knowledge catastrophic forgetting.

ctta
Continual Test-Time Adaptation in Computer Vision: Methods, Benchmarks, and Future Directions
Under Review

A comprehensive survey on continual test-time adaptation and future directions.

VDU
Variational Diffusion Unlearning: A Variational Inference Framework for Unlearning in Diffusion Models under Data Constraints
Subhodip Panda, MS Varun, Shreyans Jain, Sarthak Kumar Maharana, Prathosh AP
In TMLR, 2026 & NeurIPS Safe Generative AI Workshop, 2024

Variational unlearning framework for the unlearning of user-specific classes/concepts in pre-trained diffusion models.

AVROBUSTBENCH
AVROBUSTBENCH: Benchmarking the Robustness of Audio-Visual Recognition Models at Test-Time
In NeurIPS (Datasets and Benchmarks), 2025 & DataMFM Workshop at CVPR 2026

A comprehensive benchmark designed to evaluate the test-time robustness of audio-visual models. We hope this will drive future research on robust, adaptable audio-visual systems in real-world settings.

SELECT
SELECT: A Submodular Approach for Active LiDAR Semantic Segmentation
Ruiyu Mao, Sarthak Kumar Maharana, Xulong Tang, Yunhui Guo
Under Review

A voxel-centric submodular approach tailored for active LiDAR semantic segmentation.

BATCLIP
BATCLIP: Bimodal Online Test-Time Adaptation for CLIP
Sarthak Kumar Maharana, Baoming Zhang, Leonid Karlinsky, Rogerio Schmidt Feris, Yunhui Guo
In ICCV, 2025

Bimodal online test-time adaptation method to improve CLIP's robustness to common corruptions. Also extends to domain generalization settings.

PALM
PALM: Pushing Adaptive Learning Rate Mechanisms for Continual Test-Time Adaptation
Sarthak Kumar Maharana, Baoming Zhang, Yunhui Guo
In AAAI, 2025 (Oral)

Adaptive learning rate continual test-time adaptation method based on model prediction uncertainty and parameter sensitivity to rapid distributional shifts.

STONE
STONE: A Submodular Optimization Framework for Active 3D Object Detection
Ruiyu Mao, Sarthak Kumar Maharana, Rishabh K Iyer, Yunhui Guo
In NeurIPS, 2024

A submodular optimization scheme to handle data imbalance and label distributional coverage for active 3D object detection.

MAT
Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data
Yuxuan Li, Sarthak Kumar Maharana, Yunhui Guo
In ECCV, 2024

Novel watermarking technique based on multi-view data for defending against model extraction attacks.

SASB
Acoustic-to-Articulatory Inversion for Dysarthric Speech: Are Pre-Trained Self-Supervised Representations Favorable?
In ICASSP 2024 Workshop on Self-supervision in Audio, Speech, and Beyond (SASB), 2024

Effectiveness of pre-trained self-supervised learning representations for acoustic-to-articulatory inversion of dysarthric speech.

ICASSP
Acoustic-to-Articulatory Inversion for Dysarthric Speech by Using Cross-Corpus Acoustic-Articulatory Data
Sarthak Kumar Maharana, Aravind Illa, Renuka Mannem, Yamini Bellur, Veeramani Preethish Kumar, Seena Vengalil, Kiran Polavarapu, Nalini Atchayaram, Prasanta Kumar Ghosh
In ICASSP, 2021

Joint and multi-corpus training for acoustic-to-articulatory inversion of dysarthric speech, using x-vectors, at low-resource data conditions.

Academic/Volunteer Work

Miscellaneous


Source code by Jon Barron, with a few other added elements.