Enrichment of BOW representation with syntactic and semantic background knowledge

Rayner Alfred, and Patricia Anthony, and Suraya Alias, and Asni Tahir, and Kim, On Chin and Lau, Hui Keng (2013) Enrichment of BOW representation with syntactic and semantic background knowledge. Communications in Computer and Information Science, 24. pp. 283-292. ISSN 1865-0929

[img]
Preview
Text
Enrichment_of_BOW_Representation_with_Syntactic_and_Semantic_Background_Knowledge.pdf

Download (45kB) | Preview

Abstract

The basic Bag of Words (BOW) representation, that is generally used in text documents clustering or categorization, loses important syntactic and semantic information contained in the documents. When the text document contains a lot of stop words or when they are of a short length this may be particularly problematic. In this paper, we study the contribution of incorporating syntactic features and semantic knowledge into the representation in clustering texts corpus. We investigate the quality of clusters produced when incorporating syntactic and semantic information into the representation of text documents by analyzing the internal structure of the cluster using the Davies- Bouldin (DBI) index. This paper studies and compares the quality of the clusters produced when four different sets of text representation used to cluster texts corpus. These text representations include the standard BOW representation, the standard BOW representation integrated with syntactic features, the standard BOW representation integrated with semantic background knowledge and finally the standard BOW representation integrated with both syntactic features and semantic background knowledge. Based on the experimental results, it is shown that the quality of clusters produced is improved by integrating the semantic and syntactic information into the standard bag of words representation of texts corpus.

Item Type: Article
Uncontrolled Keywords: clustering, bag of words, syntactic features, semantic back-ground knowledge, automatic text categorization, knowledge management
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: FACULTY > Faculty of Computing and Informatics
Depositing User: Unnamed user with email storage.bpmlib@ums.edu.my
Date Deposited: 13 Nov 2015 02:58
Last Modified: 12 Oct 2017 07:23
URI: http://eprints.ums.edu.my/id/eprint/12243

Actions (login required)

View Item View Item

Browse Repository
Collection
   Articles
   Book
   Speeches
   Thesis
   UMS News
Search
Quick Search

   Latest Repository

Link to other Malaysia University Institutional Repository

Malaysia University Institutional Repository