Evaluation of query strategy of active learning for text classification

Lijie Huang

doi:10.1117/12.2641410

10 November 2022 Evaluation of query strategy of active learning for text classification

Lijie Huang

Proceedings Volume 12348, 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022); 1234820 (2022) https://doi.org/10.1117/12.2641410
Event: 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022), 2022, Zhuhai, China

Abstract

In many applications, the process of classifying the text document is expensive and time-consuming. Active Learning can reach the same accuracy with only a part of the dataset. In this paper, the evaluation of the probability-based query strategy of Active Learning for text classification was demonstrated. Random Sampling, Least Confidence Sampling, Margin Sampling, Entropy Sampling, Density Weighted Entropy Sampling, Variance Sampling and QBC are compared on the 20NEWSgroup dataset and its subsets with three predictive models, Decision Tree, Naive Bayes, and Logistic Regression as the estimator. The evaluation results can guide people to choose what query strategy can ensure the efficiency and accuracy when classifying text documents by Active Learning. Density Weighted Entropy Sampling and QBC are tested to have the highest prediction accuracy after 1000 iterations. The study confirms that the smoothness of the learning curve largely relies on the estimator and the construction of the committee of QBC has an influence on the performance of QBC. The time consumption for querying is also compared in this paper. Expected Error Reduction Sampling costs too much time in querying, which makes it impossible to implement in a real-word scenario.

Citation Download Citation

Lijie Huang "Evaluation of query strategy of active learning for text classification", Proc. SPIE 12348, 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022), 1234820 (10 November 2022); https://doi.org/10.1117/12.2641410

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
9 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Statistical modeling

Data modeling

Error analysis

Machine learning

Mathematical modeling

Performance modeling

Distance measurement

Show All Keywords

Keywords/Phrases

Search In:

Publication Years