Academic Project Page

PALM: Pushing Adaptive Learning Rate Mechanisms for Continual Test-Time Adaptation

The University of Texas at Dallas, Richardson, TX
AAAI 2025 (Oral)

Our Framework for Continual Test-Time Adaptation

PALM framework: At time t, input batch x_k is processed by the model, parameterized by θ_t. The KL-divergence between the softmax predictions and a uniform distribution is backpropagated to select the layers with the gradient norm ≤ η, to quantify the uncertainty. The parameter sensitivities of these layers, as an indicator of domain shift meters, are computed to update their learning rates. Finally, with the optimization objective, we update the model with the adjusted learning rates of the parameters.

Abstract

Real-world vision models in dynamic environments face rapid shifts in domain distributions, leading to decreased recognition performance. Using unlabeled test data, continuous test-time adaptation (CTTA) directly adjusts a pre-trained source discriminative model to these changing domains. A highly effective CTTA method involves applying layer-wise adaptive learning rates for selectively adapting pre-trained layers. However, it suffers from the poor estimation of domain shift and the inaccuracies arising from the pseudo-labels. This work aims to overcome these limitations by identifying layers for adaptation via quantifying model prediction uncertainty without relying on pseudo-labels. We utilize the magnitude of gradients as a metric, calculated by backpropagating the KL divergence between the softmax output and a uniform distribution, to select layers for further adaptation. Subsequently, for the parameters exclusively belonging to these selected layers, with the remaining ones frozen, we evaluate their sensitivity to approximate the domain shift and adjust their learning rates accordingly. We conduct extensive image classification experiments on CIFAR-10C, CIFAR-100C, and ImageNet-C, demonstrating the superior efficacy of our method compared to prior approaches.

Ablation Results

Illustration of learning rate importance across different convolutional blocks/stages/layers during CTTA. [Top row] - Variations for the domain “glass blur” across datasets. [Bottom row] - Variations for the domain “snow”. Conclusion: The initial layers are adapted more to the target domain. In addition to that, we also notice more sparsity in the deeper layers.

Ablation results on smoothing factor α and temperature T.

PALM is computationally inexpensive and requires minimal memory overhead. This leads to a large preservation of source domain knowledge. The number of adapted parameters is shown for each dataset.

The mentioned optimizers do not adjust layer-wise LRs based on parameter sensitivity, failing to capture rapid domain shifts, resulting in poorer performance compared to PALM.

BibTeX

@inproceedings{maharana2024palm, title={PALM: Pushing Adaptive Learning Rate Mechanisms for Continual Test-Time Adaptation}, author={Maharana, Sarthak Kumar and Zhang, Baoming and Guo, Yunhui}, booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, year={2025} }

PALM: Pushing Adaptive Learning Rate Mechanisms for Continual Test-Time Adaptation

Our Framework for Continual Test-Time Adaptation

Abstract

CTTA Experimental Results

Mean errors (%) on CIFAR-10C.

Mean errors (%) on CIFAR-100C.

Mean errors (%) on ImageNet-C.

Ablation Results

Ablation results on smoothing factor α and temperature T.

PALM is computationally inexpensive and requires minimal memory overhead. This leads to a large preservation of source domain knowledge. The number of adapted parameters is shown for each dataset.

The mentioned optimizers do not adjust layer-wise LRs based on parameter sensitivity, failing to capture rapid domain shifts, resulting in poorer performance compared to PALM.

BibTeX