Sarthak Kumar Maharana

I'm a first-year Computer Science PhD student at the University of Texas at Dallas (UTD), advised by Dr. Yunhui Guo. Before this, I obtained my MS in Electrical Engineering from the University of Southern California (USC) and a Bachelor of Technology (BTech) from International Institute of Information Technology Bhubaneswar (IIIT-Bh), India, with an honors degree in Electrical and Electronics Engineering.

During my Masters, I closely worked with Dr. Yonggang Shi. Previously, I had also worked with Dr. Shrikanth Narayanan. During my undergraduate studies, I was fortunate enough to work with Dr. Ren Hongliang (NUS), Dr. Prasanta Kumar Ghosh (IISc), and Dr. Aurobinda Routray (IIT-Kharagpur).

My current research focuses on computer vision and learning. Specifically, I'm more interested in making models robust and adaptive to rapid distributional shifts (continual/lifelong learning). Steering towards privacy and real-world machine perception systems, I'm also actively working at the intersection of continual learning and machine unlearning.

I'm happy to chat and discuss potential collaborations. Feel free to contact me.

Email  /  CV  /  Google Scholar  /  Github  /  LinkedIn

profile photo

May '24  

Serving as a reviewer for BMVC 2024!

Mar '24  

Serving as a reviewer for CVPR 2024 Workshop on Test-Time Adaptation: Model, Adapt Thyself! (MAT)!

Feb '24  

Serving as a reviewer for ECCV 2024!

Jan '24  

Our paper on SSL features for dysarthric speech has been accepted to the SASB workshop @ ICASSP 2024!

Jan '24  

I am glad to be selected to attend the MLx Representation Learning and Generative AI Oxford Summer School.

Aug '23  

Started PhD @ UTD.

May '23  

Graduated from USC with an MS in Electrical Engineering.

Mar '23  

Accepted the CS PhD offer from UTD.

Aug '21  

Started MS in Electrical Engineering at USC.

June '21  

Virtually presented the paper on acoustic-to-articulatory inversion of dysarthric speech at IEEE ICASSP 2021.

Mar '21  

Our paper got accepted to IEEE ICASSP 2021!

Jun '20  

Graduated from IIIT-Bh with a BTech (Honors) in Electrical and Electronics Engineering.
  • Continual/Lifelong learning.
  • Data and parameter-efficient deep learning, model robustness, and adaptation.
  • General ML and computer vision.
  • Human-centered AI, which includes multi-modal machine learning with applications to speech and medical images.
PALM: Pushing Adaptive Learning Rate Mechanisms for Continual Test-Time Adaptation
Sarthak Kumar Maharana, Baoming Zhang, Yunhui Guo

[Paper]

Real-world vision models in dynamic environments face rapid shifts in domain distributions, leading to decreased recognition performance. Continual test-time adaptation (CTTA) directly adjusts a pre-trained source discriminative model to these changing domains using test data. A highly effective CTTA method involves applying layer-wise adaptive learning rates, and selectively adapting pre-trained layers. However, it suffers from the poor estimation of domain shift and the inaccuracies arising from the pseudo-labels. In this work, we aim to overcome these limitations by identifying layers through the quantification of model prediction uncertainty without relying on pseudo-labels. We utilize the magnitude of gradients as a metric, calculated by backpropagating the KL divergence between the softmax output and a uniform distribution, to select layers for further adaptation. Subsequently, for the parameters exclusively belonging to these selected layers, with the remaining ones frozen, we evaluate their sensitivity in order to approximate the domain shift, followed by adjusting their learning rates accordingly. Overall, this approach leads to a more robust and stable optimization than prior approaches. We conduct extensive image classification experiments on CIFAR-10C, CIFAR-100C, and ImageNet-C and demonstrate the efficacy of our method against standard benchmarks and prior methods.

Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data
Yuxuan Li, Sarthak Kumar Maharana, Yunhui Guo

[Paper]

With the increasing prevalence of Machine Learning as a Service (MLaaS) platforms, there is a growing focus on deep neural network (DNN) watermarking techniques. These methods are used to facilitate the verification of ownership for a target DNN model to protect intellectual property. One of the most widely employed watermarking techniques involves embedding a trigger set into the source model. Unfortunately, existing methodologies based on trigger sets are still susceptible to functionality-stealing attacks, potentially enabling adversaries to steal the functionality of the source model without a reliable means of verifying ownership. In this paper, we first introduce a novel perspective on trigger set-based watermarking methods from a feature learning perspective. Specifically, we demonstrate that by selecting data exhibiting multiple features, also referred to as multi-view data, it becomes feasible to effectively defend functionality stealing attacks. Based on this perspective, we introduce a novel watermarking technique based on Multi-view dATa, called MAT, for efficiently embedding watermarks within DNNs. This approach involves constructing a trigger set with multi-view data and incorporating a simple feature-based regularization method for training the source model. We validate our method across various benchmarks and demonstrate its efficacy in defending against model extraction attacks, surpassing relevant baselines by a significant margin.

Acoustic-to-Articulatory Inversion for Dysarthric Speech: Are Pre-Trained Self-Supervised Representations Favorable?
Sarthak Kumar Maharana, Krishna Kamal Adidam, Shoumik Nandi, Ajitesh Srivastava
IEEE International Conference of Acoustics, Speech, and Signal Processing Workshops (ICASSPW) 2024

[Paper] [Poster]

Acoustic-to-articulatory inversion (AAI) involves mapping from the acoustic to the articulatory space. Signal-processing features like the MFCCs, have been widely used for the AAI task. For subjects with dysarthric speech, AAI is challenging because of an imprecise and indistinct pronunciation. In this work, we perform AAI for dysarthric speech using representations from pre-trained selfsupervised learning (SSL) models. We demonstrate the impact of different pre-trained features on this challenging AAI task, at low-resource conditions. In addition, we also condition x-vectors to the extracted SSL features to train a BLSTM network. In the seen case, we experiment with three AAI training schemes (subject-specific, pooled, and fine-tuned).

Acoustic-to-Articulatory Inversion for Dysarthric Speech by Using Cross-Corpus Acoustic-Articulatory Data
Sarthak Kumar Maharana, Aravind Illa, Renuka Mannem, Yamini Bellur, Veeramani Preethish Kumar, Seena Vengalil, Kiran Polavarapu, Nalini Atchayaram, Prasanta Kumar Ghosh
IEEE International Conference of Acoustics, Speech, and Signal Processing (ICASSP) 2021

[BibTeX] [Paper] [Code] [Video]

In this work, we focus on estimating articulatory movements from acoustic features, known as acoustic-to-articulatory inversion (AAI), for dysarthric patients with amyotrophic lateral sclerosis (ALS). Unlike healthy subjects, there are two potential challenges involved in AAI on dysarthric speech. Due to speech impairment, the pronunciation of dysarthric patients is unclear and inaccurate, which could impact the AAI performance. In addition, acoustic-articulatory data from dysarthric patients is limited due to the difficulty in recording. These challenges motivate us to utilize cross-corpus acoustic-articulatory data. In this study, we propose an AAI model by conditioning speaker information using x-vectors at the input, and multi-target articulatory trajectory outputs for each corpus separately.

Harmonics analysis of a PV integrated hysteresis current control inverter connected with grid and without grid
Jayanta Kumar Sahu, Sudhakar Sahu, J.P Patra, Sarthak Kumar Maharana, Bhagabat Panda
IEEE International Conference on Smart Systems and Inventive Technology (ICSSIT) 2019

[BibTeX] [Paper]

Generally, two devices are responsible for the generation of time-variant power. They are alternators and inverters. Harmonics are the unwanted signals generally created on the output of the inverter. In this paper, hysteresis current control inverters are described. Here the HCC inverters are connected with a grid and without a grid and integrated with a photo voltaic panel. The HCC inverters are connected to the grid with the help of a phase lock loop. Finally, the total harmonics distortion is calculated in this model their results are compared based on total harmonics distortion.

The University of Texas at Dallas
Graduate Research Assistant
Richardson, TX
Aug 2023 - Present

  • Supervisor - Dr. Yunhui Guo
  • Activities -
    • Currently working on problems related to efficient model fine-tuning and continual test-time domain adaptation.

University of Southern California
Student Researcher
Los Angeles, CA
May 2022 - July 2023

  • Supervisor - Dr. Yonggang Shi
  • Activities -
    • Developed an end-to-end general software tool to automate the reconstruction of fiber bundles in the brainstem of the human brain, using diffusion MRI images, for the HCP Aging dataset (to be publicly released soon).
    • Leveraged deep learning based registration and label fusion methods to automatically generate the anatomical ROIs that are critical for fiber bundle reconstruction.

University of Southern California
Student Researcher
Los Angeles, CA
Dec 2021 - Dec 2022

  • Supervisor - Dr. Shrikanth (Shri) Narayanan
  • Activities -
    • Performed speaker recognition from rt-MRI videos, based on an unsupervised disentanglement representation learning scheme.
    • Contributed to the development of generating embeddings from 2D sagittal-view rt-MRI videos to distinguish between speakers based on their articulatory representations from vocal tract landmarks.

National University of Singapore
Part-time Research Assistant
Remote
July 2020 - Apr 2021

  • Supervisor - Dr. Ren Hongliang
  • Activities -
    • Experimented with different encoder-decoder architectures (ex. LinkNet) by plugging in spatio-temporal modules (ex. convLSTM) to perform pixel-wise prediction of the needle trajectory in ultrasound images during a kidney biopsy.
    • Proposed the integration of a DGMN (Dynamic Graph Message Passing) network in DGCN (Dual Graph Convolutional Network), for efficient semantic segmentation, to model long-range dependencies in an OCT image.

Indian Institute of Science
Bachelor's Thesis and Student Researcher
Bangalore, India
Dec 2019 - Sep 2020

  • Supervisor - Dr. Prasanta Ghosh
  • Activities -
    • Studied acoustic-to-articulatory inversion (AAI) model’s performance on the dysarthric speech when the model was trained in a corpus dependent manner using a matched low-resource dysarthric corpus or using a mismatched cross-corpus with rich acoustic-articulatory data.
    • Investigated the benefit of utilizing cross-corpus acoustic-articulatory data using transfer learning and joint-training techniques for the articulatory predictions of dysarthric subjects.

Indian Institute of Technology Kharagpur
Summer Research Intern
Kharagpur, India
May 2019 - Jul 2019

  • Supervisor - Dr. Aurobinda Routray
  • Activities -
    • Developed an in-house template matching algorithm, of various phases, to detect breaths in speech recordings using end-to-end deep neural networks.
    • Employed a heuristic technique to join close predicted breath segments, and segments below a certain threshold were removed, for postprocessing and to remove any misclassification errors.

  • Fully funded tuition, with a stipend, to pursue CS PhD at UTD.
  • Governing Body Merit Scholarship (April 2021).
  •    Awarded to top 3 students of each department at IIIT-Bh. Received for the academic year 2019-2020.
  • Indian Academy of Sciences (IAS) - Summer Research Fellowship (April 2019)
  •    An annual research fellowship program (<10% selection rate) conducted by the Indian Academy of Sciences, under IISc Bangalore.
  • Reviewer - ECCV 2024, CVPR Workshops 2024
  • Building CORD.ai, a deep learning research community, as a core member and volunteer researcher.
  • Course Mentor/Grader for graduate level EE 541: An Introduction to Deep Learning (Spring 2022).
  • USC IEEE Graduate Society - Member, strengthen academic and social growth of the members, and host workshops.
  • PyCon India 2020 - Content writer for social media handles, helped the promotions team to reach out to organizations and colleges, and interacted with individuals who have contributed to the language, and also worked on creating virtual swags.
  • I'm a cis male.
  • I consider myself lucky to have grown up in two beautiful cities in India - Bangalore and Bhubaneswar, that have infused in me a lot of character and development. I've also spent two quality years in the vibrant, diverse, gently warm, and sprawling city of Los Angeles, California. Absolutely look forward to staying in new places and experiencing different cultures.
  • I'm a HUGE fan of the classical formats of cricket. You'd often find me watching old test match highlights or SRT straight drives. Nothing can get more sublime than that. I bet! I don't consider IPL/T20 cricket as a thing AT ALL.
  • I think mobile photography is like a side gig for me? My phone instantly comes out the moment my eyes catch sight of a beautiful view.
  • I also spend a lot of time in quality humor - dark humor per se. We could talk about that later.

Source code by Jon Barron.