Sarthak Kumar Maharana

I'm a second-year CS PhD student at the University of Texas at Dallas (UTD), advised by Dr. Yunhui Guo. Before this, I obtained my MS in Electrical Engineering from the University of Southern California (USC) and a Bachelor's degree from IIIT Bhubaneswar (IIIT-Bh), India, with an honors degree in Electrical and Electronics Engineering.

My research interests broadly encompass multimodal learning (Vision + X) with a larger focus on the efficient adaptation of models to distributional shifts, improving their robustness, and generalization at test-time. Additional interests lie in data-centric ML (continual, few-shot, transfer learning), and human-centered AI. I am also engaged in projects on active learning and machine unlearning in generative models.

During my Masters, I closely worked with Dr. Yonggang Shi. Previously, I had also worked with Dr. Shri Narayanan. As an undergraduate, I was fortunate enough to work with Dr. Ren Hongliang (NUS), Dr. Prasanta Kumar Ghosh (IISc), and Dr. Aurobinda Routray (IIT-Kharagpur).

I have published at top-tier ML/computer vision/signal processing conferences such as NeurIPS(2x), AAAI, ECCV, and ICASSP(2x).

I'm happy to chat and discuss potential collaborations. Feel free to contact me.

Email  /  CV  /  Google Scholar  /  GitHub  /  LinkedIn

profile photo

Dec '24  

PALM has been accepted to AAAI 2025 for an Oral presentation!

Nov '24  

Serving as a CVPR 2025 reviewer.

Oct '24  

Variational Diffusion Unlearning (VDU) is accepted to the NeurIPS SafeGenAI workshop 2024!

Sep '24  

Our paper on submodular optimization for active 3D object detection has been accepted to NeurIPS 2024!

Aug '24  

Serving as a reviewer for ICLR 2025.

Jul '24  

Our paper on DNN watermarking has been accepted to ECCV 2024!

May '24  

Serving as a reviewer for BMVC 2024.

Mar '24  

Serving as a reviewer for CVPR 2024 Workshop on Test-Time Adaptation: Model, Adapt Thyself! (MAT).

Feb '24  

Serving as a reviewer for ECCV 2024.

Jan '24  

Our paper on SSL features for dysarthric speech has been accepted to the SASB workshop @ ICASSP 2024!

Jan '24  

I am glad to be selected to attend the MLx Representation Learning and Generative AI Oxford Summer School.

First author works are highlighted.

Enhancing Robustness of CLIP to Common Corruptions through Bimodal Test-Time Adaptation
Sarthak Kumar Maharana, Baoming Zhang, Leonid Karlinsky, Rogerio Feris, Yunhui Guo
Under Review

[arXiv]

Bimodal test-time adaptation method designed to improve CLIP's robustness to common image corruptions.

PALM: Pushing Adaptive Learning Rate Mechanisms for Continual Test-Time Adaptation
Sarthak Kumar Maharana, Baoming Zhang, Yunhui Guo
In AAAI 2025 (Oral)

[Paper] [Project] [Code]

Adaptive learning rate continual test-time adaptation method based on model prediction uncertainty and parameter sensitivity to rapid distributional shifts.

Variational Diffusion Unlearning: A Variational Inference Framework for Unlearning in Diffusion Models
Subhodip Panda, MS Varun, Shreyans Jain, Sarthak Kumar Maharana, Prathosh AP
In NeurIPS Safe Generative AI Workshop 2024

[Paper]

Machine unlearning of user-specific classes/concepts in pre-trained diffusion models (DDPMs).

STONE: A Submodular Optimization Framework for Active 3D Object Detection
Ruiyu Mao, Sarthak Kumar Maharana, Rishabh K Iyer, Yunhui Guo
In NeurIPS 2024

[Paper] [Code]

A submodular optimization scheme to handle data imbalance and label distributional coverage for active 3D object detection.

Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data
Yuxuan Li, Sarthak Kumar Maharana, Yunhui Guo
In ECCV 2024

[Paper] [Code]

Novel watermarking technique based on multi-view data for defending against model extraction attacks.

Acoustic-to-Articulatory Inversion for Dysarthric Speech: Are Pre-Trained Self-Supervised Representations Favorable?
Sarthak Kumar Maharana, Krishna Kamal Adidam, Shoumik Nandi, Ajitesh Srivastava
In ICASSP 2024 Workshop on Self-supervision in Audio, Speech, and Beyond (SASB) 2024

[Paper] [Poster]

Effectiveness of pre-trained self-supervised learning representations for acoustic-to-articulatory inversion of dysarthric speech.

Acoustic-to-Articulatory Inversion for Dysarthric Speech by Using Cross-Corpus Acoustic-Articulatory Data
Sarthak Kumar Maharana, Aravind Illa, Renuka Mannem, Yamini Bellur, Veeramani Preethish Kumar, Seena Vengalil, Kiran Polavarapu, Nalini Atchayaram, Prasanta Kumar Ghosh
In ICASSP 2021

[BibTeX] [Paper] [Code] [Video]

Joint and multi-corpus training for acoustic-to-articulatory inversion of dysarthric speech, using x-vectors, at low-resource data conditions.

Harmonics analysis of a PV integrated hysteresis current control inverter connected with grid and without grid
Jayanta Kumar Sahu, Sudhakar Sahu, J.P Patra, Sarthak Kumar Maharana, Bhagabat Panda
In ICSSIT 2019

[BibTeX] [Paper]

Harmonics analysis of a PV integrated hysteresis current control inverter connected with grid and without grid.

  • Reviewer - CVPR 2025, ICLR 2025, NeurIPS Workshops 2024, BMVC 2024, ECCV 2024, CVPR Workshops 2024, AAAI 2024
  • Building CORD.ai, a deep learning research community, as a core member and volunteer researcher.
  • I'm a cis male.
  • I consider myself lucky to have grown up in two beautiful cities in India - Bangalore and Bhubaneswar, that have infused in me a lot of character and development. I've also spent two quality years in the vibrant, diverse, gently warm, and sprawling city of Los Angeles, California. Absolutely look forward to staying in new places and experiencing different cultures.
  • I'm a HUGE fan of the classical formats of cricket. You'd often find me watching old test match highlights or SRT straight drives. Nothing can get more sublime than that. I bet! I don't consider IPL/T20 cricket as a thing AT ALL.
  • I think mobile photography is like a side gig for me? My phone instantly comes out the moment my eyes catch sight of a beautiful view.
  • I also spend a lot of time in quality humor - dark humor per se. We could talk about that later.

Source code by Jon Barron.