Feature selection is the core research topic in text categorization. Selected feature subset directly influences
results of text categorization. Firstly, word frequency and document frequency were analyzed. And then, the category
concentration degree based on word frequency and document frequency was proposed. Next, set covering was
introduced into rough sets and an attribute reduction algorithm based on minimal set covering was provided. Finally, a
new feature selection method combined the proposed category concentration degree with the provided attribute reduction
algorithm was presented. The presented feature selection method firstly uses the proposed category concentration degree
to select features and filter out some terms to reduce the sparsity of feature spaces, and then employs the provided
attribute reduction algorithm to eliminate redundancy, so that the more representative feature subset was acquired. The
experimental results show that presented feature selection method is better than the three classical feature selection
methods: information gain (IG), x2 statistics (CHI), mutual information (MI) in time performance, macro-average
F1 and micro-average F1.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.