Paper
27 March 2001 Value-balanced agglomerative connectivity clustering
Gunjan K. Gupta, Joydeep Ghosh
Author Affiliations +
Abstract
In this paper we propose a new clustering framework for transactional data-sets involving large numbers of customers and products. Such transactional data pose particular issues such as very high dimensionality (greater than 10,000), and sparse categorical entries, that have been dealt with more effectively using a graph-based approach to clustering such as ROCK. But large transactional data raises certain other issues such as how to compare diverse products (e.g. milk vs. cars) cluster balancing and outlier removal, that need to be addressed. We first propose a new similarity measure that takes the value of the goods purchased into account, and form a value-based graph representation based on this similarity measure. A novel value-based balancing criterion that allows the user to control the balancing of clusters, is then defined. This balancing criterion is integrated with a value-based goodness measure for merging two clusters in an agglomerative clustering routine. Since graph-based clustering algorithms are very sensitive to outliers, we also propose a fast, effective and simple outlier detection and removal method based on under-clustering or over- partitioning. The performance of the proposed clustering framework is compared with leading graph-theoretic approaches such as ROCK and METIS.
© (2001) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Gunjan K. Gupta and Joydeep Ghosh "Value-balanced agglomerative connectivity clustering", Proc. SPIE 4384, Data Mining and Knowledge Discovery: Theory, Tools, and Technology III, (27 March 2001); https://doi.org/10.1117/12.421079
Lens.org Logo
CITATIONS
Cited by 15 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Binary data

Data acquisition

Data mining

Detection and tracking algorithms

Fuzzy logic

Knowledge discovery

Computer engineering

RELATED CONTENT


Back to Top