A visualization approach to automatic text documents categorization based on HAC

Rayner Alfred, and Mohd Norhisham Bin Razali, and Suraya Alias, and Chin, Kim On (2014) A visualization approach to automatic text documents categorization based on HAC. In: The 8th International Conference on Knowledge Management in Organizations.

[img]
Preview
Text
A_Visualization_Approach_to_Automatic_Text_Documents_Categorization_Based_on_HAC.pdf

Download (45kB) | Preview

Abstract

The ability to visualize documents into clusters is very essential. The best data summarization technique could be used to summarize data but a poor representation or visualization of it will be totally misleading. As proposed in many researches, clustering techniques are applied and the results are produced when documents are grouped in clusters. However, in some cases, user may want to know the relationship that exists between clusters. In order to illustrate relationships that exist between clusters, a hierarchical agglomerative clustering (HAC) technique can be applied to build the dendrogram. The dendrogram produced display the relationship between a cluster and its sub-clusters. For this reason, user will be able to view the relationship that exists between clusters. In addition to that, the terms or features that characterize each cluster can also be displayed to assist user in understanding the contents of whole text documents that stored in the database. In this paper, a Text Analyzer (VisualText) that automates the categorization of text documents based on a visualization approach using the Hierarchical Agglomerative Clustering technique is proposed. This paper also studies the effect of using different inter-cluster proximities on the quality of clusters produced. Cophenetic Correlation Coefficient is measured in order to evaluate the quality of clusters produced using these three different inter-cluster distance measurements.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Uncontrolled Keywords: Interactive Visualization, Hierarchical Agglomerative Clustering, Text Analyzer, Text Categorization, Data Summarization, Cophenetic Correlation Coefficient
Subjects: Q Science > QA Mathematics > QA76 Computer software
Divisions: FACULTY > Faculty of Computing and Informatics
Depositing User: Unnamed user with email storage.bpmlib@ums.edu.my
Date Deposited: 26 Nov 2015 05:22
Last Modified: 09 Nov 2017 08:13
URI: http://eprints.ums.edu.my/id/eprint/12240

Actions (login required)

View Item View Item

Browse Repository
Collection
   Articles
   Book
   Speeches
   Thesis
   UMS News
Search
Quick Search

   Latest Repository

Link to other Malaysia University Institutional Repository

Malaysia University Institutional Repository