Paper
10 November 2022 Evaluation of query strategy of active learning for text classification
Lijie Huang
Author Affiliations +
Proceedings Volume 12348, 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022); 1234820 (2022) https://doi.org/10.1117/12.2641410
Event: 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022), 2022, Zhuhai, China
Abstract
In many applications, the process of classifying the text document is expensive and time-consuming. Active Learning can reach the same accuracy with only a part of the dataset. In this paper, the evaluation of the probability-based query strategy of Active Learning for text classification was demonstrated. Random Sampling, Least Confidence Sampling, Margin Sampling, Entropy Sampling, Density Weighted Entropy Sampling, Variance Sampling and QBC are compared on the 20NEWSgroup dataset and its subsets with three predictive models, Decision Tree, Naive Bayes, and Logistic Regression as the estimator. The evaluation results can guide people to choose what query strategy can ensure the efficiency and accuracy when classifying text documents by Active Learning. Density Weighted Entropy Sampling and QBC are tested to have the highest prediction accuracy after 1000 iterations. The study confirms that the smoothness of the learning curve largely relies on the estimator and the construction of the committee of QBC has an influence on the performance of QBC. The time consumption for querying is also compared in this paper. Expected Error Reduction Sampling costs too much time in querying, which makes it impossible to implement in a real-word scenario.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Lijie Huang "Evaluation of query strategy of active learning for text classification", Proc. SPIE 12348, 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022), 1234820 (10 November 2022); https://doi.org/10.1117/12.2641410
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Statistical modeling

Data modeling

Error analysis

Machine learning

Mathematical modeling

Performance modeling

Distance measurement

Back to Top