KEYWORDS: Internet, Visualization, Visual analytics, Analytical research, Social network analysis, Algorithm development, Data mining, Optical character recognition, Detection and tracking algorithms, Standards development
In previous papers, we have documented success in determining the key people of interest from a large corpus of real-world evidence. Our recent efforts focus on exploring additional domains and data sources. Internet data sources such as email, web pages, and news feeds make it easier to gather a large corpus of documents for various domains, but detecting people of interest in these sources introduces new challenges. Analyzing these massive sources magnifies entity resolution problems, and demands a storage management strategy that supports efficient algorithmic analysis and visualization techniques. This paper discusses the techniques we used in order to analyze the ENRON email repository, which are also applicable to analyzing web pages returned from our "Buddy" meta-search engine.
KEYWORDS: Optical character recognition, Algorithm development, Data modeling, Error analysis, Data conversion, Analytical research, Logic, Data mining, Receivers, Standards development
In previous work, we introduced a new paradigm called Uni-Party Data Community Generation (UDCG) and a new methodology to discover social groups (a.k.a., community models) called Link Discovery based on Correlation Analysis (LDCA). We further advanced this work by experimenting with a corpus of evidence obtained from a Ponzi scheme investigation. That work identified several UDCG algorithms, developed what we called "Importance Measures" to compare the accuracy of the algorithms based on ground truth, and presented a Concept of Operations (CONOPS) that criminal investigators could use to discover social groups. However, that work used a rather small random sample of manually edited documents because the evidence contained far too many OCR and other extraction errors. Deferring the evidence extraction errors allowed us to continue experimenting with UDCG algorithms, but only used a small fraction of the available evidence. In attempt to discover techniques that are more practical in the near-term, our most recent work focuses on being able to use an entire corpus of real-world evidence to discover social groups. This paper discusses the complications of extracting evidence, suggests a method of performing name resolution, presents a new UDCG algorithm, and discusses our future direction in this area.
KEYWORDS: Data mining, Human-machine interfaces, Analytical research, Social networks, Algorithm development, Optical character recognition, Distance measurement, Visualization, Information security, Defense and security
The challenge of identifying important individuals and their membership as part of a group is a continuing and ever growing problem. In recent years, the data mining community has been identifying and discussing a new paradigm of data analysis using uni-party data. Within this paradigm, a methodology known as Link Discovery based on Correlation Analysis (LDCA), defines a process to compensate for the lack of relational data. CORAL, a specific implementation of LDCA, demonstrated the value of this methodology by identifying suspects involved in a Ponzi scheme with limited success. This paper introduces several new algorithms and analyzes their ability to generate a prioritized ranking of individuals involved in the Ponzi scheme based on their individual activity. To compare the accuracy of each algorithm, we present the experimental results of the algorithms, and conclude with a discussion of open issues and future activities.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.