A Comparative Survey of Deep Active Learning

We present a comprehensive comparative survey of Deep Active Learning for classification tasks based on our DeepAL+ toolbox, which contains nineteen highly-cited DAL approaches. We first survey and categorize DAL-related works and then perform comparative experiments across ten frequently used image classification and five text classification datasets. Our work is the most extensive comparative study of DAL to date. Additionally, we explore factors that influence the efficacy of DAL, such as batch size and number of epochs in the training process. Our findings provide useful references for researchers designing DAL experiments or conducting DAL-related applications. Our contributions are:

  1. We constructed a user-friendly toolbox for Deep Active Learning:  DeepAL+.
  2. We construct the largest comparative experiments so far, comparing nineteen highly-cited DAL methods across ten image classification and five text classification datasets.
  3. We induce a new branch of DAL methods: enhanced DAL methods. We discuss the potential of such methods to surpass the upper bound of DAL performance, i.e., model performance on the full dataset.
  4. We conduct comparative experiments on more complicated tasks, such as Medical Image Analysis and WILDS-series dataset containing distribution shift data samples.
  5. We explore factors that influence the computation/timing cost of DAL processes, e.g., numbers of training epochs and batch size.
  6. We explore how basic learners with or without pre-training influence the DAL performance.


  • Comparative experiments on CV tasks:
  • Comparative experiments on NLP tasks: