Sarthak Kumar Maharana

Sarthak Kumar Maharana

Email: sarthakmaharana9811@gmail.com
CV & Google Scholar & GitHub & LinkedIn

I'm a CS PhD candidate at the University of Texas at Dallas, advised by Dr. Yunhui Guo. I'm also a part of the Data Efficient Intelligent Learning Lab. I broadly study computer vision with an emphasis on continual learning. My research is grounded in the belief that AI systems must continually adapt to the world, and not be stagnant. Lately, I'm interested in a couple of new directions:

    (1) Designing efficient continual learning systems capable of modeling long sequences. There's a dire need to architect AI systems that scale and behave as sub-quadratic alternatives to the ubiquitous transformers.
    (2) Enabling continual multimodal generation (w/ high-fidelity) for customization and personalization.
In the initial years of my PhD, I explored continual learning, at test-time, and designed efficient schemes to make models robust and adaptable in open and dynamic environments. I spent Summer 2025 at Dolby Laboratories where I worked on robust audio-visual learning in continual learning settings.

Before starting my PhD, I completed my Masters in Electrical Engineering from University of Southern California (USC) and a Bachelor's degree in Electrical and Electronics Engineering from IIIT Bhubaneswar (IIIT-Bh), India. During my Masters, I worked with Dr. Yonggang Shi. Previously, I had also worked with Dr. Shri Narayanan. As an undergraduate, I was fortunate enough to work with Dr. Ren Hongliang (NUS), Dr. Prasanta Kumar Ghosh (IISc), and Dr. Aurobinda Routray (IIT-Kharagpur).

I have published at top-tier ML/computer vision/signal processing venues such as TMLR, ICCV, NeurIPS(3x), AAAI, ECCV, and ICASSP(2x).

If some of my new research aims (hopefully) are of interest to you, I'm happy to chat, discuss, and explore potential collaborations. Feel free to contact me.

News

Papers (Preprints included)

avctta
Audio-Visual Continual Test-Time Adaptation without Forgetting
Sarthak Kumar Maharana, Akshay Mehra, Bhavya Ramakrishna, Yunhui Guo, Guan-Ming Su
Under Review

A selective parameter retrieval mechanism to dynamically retrieve, plugin, and adapt cross-modal fusion parameters for challenging audio-visual CTTA. Largely minimizes source knowledge catastrophic forgetting.

ctta
Continual Test-Time Adaptation: A Comprehensive Survey
Under Review

A comprehensive survey on continual test-time adaptation.

VDU
Variational Diffusion Unlearning: A Variational Inference Framework for Unlearning in Diffusion Models under Data Constraints
Subhodip Panda, MS Varun, Shreyans Jain, Sarthak Kumar Maharana, Prathosh AP
In TMLR, 2026 & NeurIPS Safe Generative AI Workshop, 2024

Variational unlearning framework for the unlearning of user-specific classes/concepts in pre-trained diffusion models.

AVROBUSTBENCH
AVROBUSTBENCH: Benchmarking the Robustness of Audio-Visual Recognition Models at Test-Time
In NeurIPS (Datasets and Benchmarks), 2025

A comprehensive benchmark designed to evaluate the test-time robustness of audio-visual models. We hope this will drive future research on robust, adaptable audio-visual systems in real-world settings.

SELECT
SELECT: A Submodular Approach for Active LiDAR Semantic Segmentation
Ruiyu Mao, Sarthak Kumar Maharana, Xulong Tang, Yunhui Guo
Under Review

A voxel-centric submodular approach tailored for active LiDAR semantic segmentation.

BATCLIP
BATCLIP: Bimodal Online Test-Time Adaptation for CLIP
Sarthak Kumar Maharana, Baoming Zhang, Leonid Karlinsky, Rogerio Schmidt Feris, Yunhui Guo
In ICCV, 2025

Bimodal online test-time adaptation method to improve CLIP's robustness to common corruptions. Also extends to domain generalization settings.

PALM
PALM: Pushing Adaptive Learning Rate Mechanisms for Continual Test-Time Adaptation
Sarthak Kumar Maharana, Baoming Zhang, Yunhui Guo
In AAAI, 2025 (Oral)

Adaptive learning rate continual test-time adaptation method based on model prediction uncertainty and parameter sensitivity to rapid distributional shifts.

STONE
STONE: A Submodular Optimization Framework for Active 3D Object Detection
Ruiyu Mao, Sarthak Kumar Maharana, Rishabh K Iyer, Yunhui Guo
In NeurIPS, 2024

A submodular optimization scheme to handle data imbalance and label distributional coverage for active 3D object detection.

MAT
Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data
Yuxuan Li, Sarthak Kumar Maharana, Yunhui Guo
In ECCV, 2024

Novel watermarking technique based on multi-view data for defending against model extraction attacks.

SASB
Acoustic-to-Articulatory Inversion for Dysarthric Speech: Are Pre-Trained Self-Supervised Representations Favorable?
In ICASSP 2024 Workshop on Self-supervision in Audio, Speech, and Beyond (SASB), 2024

Effectiveness of pre-trained self-supervised learning representations for acoustic-to-articulatory inversion of dysarthric speech.

ICASSP
Acoustic-to-Articulatory Inversion for Dysarthric Speech by Using Cross-Corpus Acoustic-Articulatory Data
Sarthak Kumar Maharana, Aravind Illa, Renuka Mannem, Yamini Bellur, Veeramani Preethish Kumar, Seena Vengalil, Kiran Polavarapu, Nalini Atchayaram, Prasanta Kumar Ghosh
In ICASSP, 2021

Joint and multi-corpus training for acoustic-to-articulatory inversion of dysarthric speech, using x-vectors, at low-resource data conditions.

Academic/Volunteer Work

Miscellaneous


Source code by Jon Barron, with a few added elements.