Zero-shot cross-domain crowd counting is a challenging task where a crowd counting model is trained on a source domain (i.e., training dataset) and no additional labeled or unlabeled data is available for fine-tuning the model when testing on an unseen target domain (i.e., a different testing dataset). The generalization performance of existing crowd counting methods is typically limited due to the large gap between source and target domains. Here, we propose a novel Crowd Counting framework built upon an external Momentum Template, termed C2MoT, which enables the encoding of domain specific information via an external template representation. Specifically, the Momentum Template (MoT) is learned in a momentum updating way during offline training, and then is dynamically updated for each test image in online cross-dataset evaluation. Thanks to the dynamically updated MoT, our C2MoT effectively generates dense target correspondences that explicitly accounts for head regions, and then effectively predicts the density map based on the normalized correspondence map. Experiments on large scale datasets show that our proposed C2MoT achieves leading zero-shot cross-domain crowd counting performance without model fine-tuning, while also outperforming domain adaptation methods that use fine-tuning on target domain data. Moreover, C2MoT also obtains state-of-the-art counting performance on the source domain.
- Dynamic Momentum Adaptation for Zero-Shot Cross-Domain Crowd Counting.
In: ACM Multimedia (MM), Oct 2021.