Prompt-Based Counting

We propose A Fixed-Point Approach to Unified Prompt-Based Counting, a unified framework that supports counting objects specified by diverse prompt types, including bounding boxes, points, and natural language text, within a single model architecture. Unlike prior category-agnostic counting methods, which are typically limited to one prompt modality, our approach aims to generate accurate density maps for objects of interest indicated by any of these prompt forms, significantly enhancing flexibility and usability in real-world applications.

Central to our method is a fixed-point inference mechanism: given an image and a prompt, we iteratively refine a latent representation of the target object’s density map until it reaches a stable fixed point that is consistent with both the visual content and the semantic or spatial guidance from the prompt. This iterative refinement is trained with a contrastive learning objective that encourages the model to distinguish correct prompt-density alignments from distractors, thereby improving generalization across prompt types and object categories.

Extensive experiments on benchmarks such as FSC-147 demonstrate that our unified model achieves state-of-the-art performance across all prompt modalities without requiring separate networks or fine-tuning per prompt type. By unifying box, point, and text prompts under a common fixed-point formulation, our work represents a significant step toward more adaptable and practical visual counting systems.

 

Selected Publications

A Fixed-Point Approach to Unified Prompt-Based Counting.
Wei Lin and Antoni B. Chan,
In: AAAI Conference on Artificial Intelligence (AAAI), Vancouver, Feb 2024. [supplemental | code]