Automatic similarity detection and clustering of data

Craig Einstein; Peter Chin

doi:10.1117/12.2267844

22 May 2017 Automatic similarity detection and clustering of data

Craig Einstein, Peter Chin

Proceedings Volume 10185, Cyber Sensing 2017; 101850K (2017) https://doi.org/10.1117/12.2267844
Event: SPIE Defense + Security, 2017, Anaheim, CA, United States

Abstract

An algorithm was created which identifies the number of unique clusters in a dataset and assigns the data to the clusters. A cluster is defined as a group of data which share similar characteristics. Similarity is measured using the dot product between two vectors where the data are input as vectors. Unlike other clustering algorithms such as K-means, no knowledge of the number of clusters is required. This allows for an unbiased analysis of the data. The automatic cluster detection algorithm (ACD), is executed in two phases: an averaging phase and a clustering phase. In the averaging phase, the number of unique clusters is detected. In the clustering phase, data are matched to the cluster to which they are most similar. The ACD algorithm takes a matrix of vectors as an input and outputs a 2D array of the clustered data. The indices of the output correspond to a cluster, and the elements in each cluster correspond to the position of the datum in the dataset. Clusters are vectors in N-dimensional space, where N is the length of the input vectors which make up the matrix. The algorithm is distributed, increasing computational efficiency

Citation Download Citation

Craig Einstein and Peter Chin "Automatic similarity detection and clustering of data", Proc. SPIE 10185, Cyber Sensing 2017, 101850K (22 May 2017); https://doi.org/10.1117/12.2267844

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available