
This paper addresses the underexplored problem of crowd localization from density maps in crowd counting tasks. While modern deep learning methods excel at predicting density maps, they often rely on simplistic localization strategies such as detecting local peaks. To overcome this limitation, we introduce Optimal Transport Minimization (OT-M), an iterative, parameter-free algorithm that recovers object locations by minimizing the Sinkhorn distance between a predicted density map and a target point map. OT-M is universally applicable to any density map without requiring additional training.
Building on OT-M, we develop a semi-supervised crowd counting framework that leverages hard pseudo-labels (discrete point maps generated by OT-M) instead of the soft pseudo-labels (density maps) used in prior work. These hard labels provide stronger supervision and enable the use of advanced density-to-point loss functions, such as the Generalized Loss (GL), on unlabeled data. This aligns our semi-supervised counting approach with common practices in other semi-supervised learning domains, where hard labels are typically preferred for consistency and training stability.
To address the potential noise and inaccuracies in pseudo-labels, we further propose a confidence-weighted Generalized Loss that dynamically downweights unreliable predictions based on the agreement between teacher and student models. This robust loss mitigates the negative impact of erroneous pseudo-labels during training. Comprehensive experiments demonstrate that our overall approach achieves state-of-the-art performance in both crowd localization accuracy and semi-supervised counting, highlighting the effectiveness of OT-M and the proposed loss formulation.
Selected Publications
,
In: IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Jun 2023 (highlight). [code]
