DropMAE: Ablation Study

Comparison with Other Pre-training Approaches:

Table 1: Comparison of pre-training methods on downstream VOT and VOS tasks on GOT-10k and DAVIS-17. All methods adopt the ViT-B/16 model with 224×224 input images for pre-training. The pre-training time is measured on 64 NVIDIA V100 GPUs. The best two results are shown in red and blue.

Ablation study of the dropout ratio P on the GOT-10k (VOT) and DAVIS-17 (VOS) datasets:

Usage of Various Source Training Datasets:

DropMAE: Ablation Study

Comparison with Other Pre-training Approaches:

Ablation study of the dropout ratio P on the GOT-10k (VOT) and DAVIS-17 (VOS) datasets:

Usage of Various Source Training Datasets: