Crowd counting has many applications in real life, such as crowd control, traffic scheduling or retail shop management, etc. Recently, multi-view crowd counting (MVCC) has been proposed to fuse multiple camera views to mitigate shortcomings of single-view image counting, such as the limited field-of-view of a single camera. The current MVCC methods rely on camera calibrations (both intrinsic and extrinsic camera parameters) to project features or density map predictions from the single camera views to the common ground-plane for fusion. The camera calibration is also required to obtain the ground-truth people locations on the ground-plane to build scene-level density maps for supervision. However, requiring camera calibrations limits MVCC in real application scenarios. Therefore, it is important to explore calibration-free multi-view counting methods.
To extend and apply MVCC to more practical situations, in this paper, we propose a calibration-free multi-view crowd counting (CF-MVCC) method, which obtains the scene-level count as a weighted summation over the predicted density maps from the camera-views. The weight maps applied to each density map consider the number of cameras in which the given pixel is visible (to avoid double counting) and the confidence of each pixel (to avoid poorly predicted regions such as those far from the camera). The weight maps are generated using estimated pairwise homographies in the testing stage, and thus CF-MVCC can be applied to a novel scene without camera calibrations.
In summary, our contributions are three-fold:
- We propose a calibration-free multi-view counting model (CF-MVCC) to further extend the application of MVCC methods to more unconstrained scenarios, which can be applied to new scenes without camera calibrations.
- The proposed method uses single-view density map predictions to directly estimate the scene crowd count without pixel-level supervision, via a weighting map with confidence score that is guided by camera-view content and distance information.
- We conduct extensive experiments on multi-view counting datasets and achieve better performance than calibration-free baselines, and promising performance compared to well-calibrated MVCC methods. Furthermore, our model trained on a large synthetic dataset can be applied to real novel scenes with domain adaptation.
- Code coming soon!