Dimensionality reduction in data summarization approach to learning relational data

Rayner Alfred, and Chung , Seng Kheau and Lau, Hui Keng (2013) Dimensionality reduction in data summarization approach to learning relational data.  Intelligent Information and Database Systems, 7802 . pp. 166-175.

[img]
Preview
PDF
41Kb

Official URL: Http://dx.doi.org/10.1007/978-3-642-36546-1_18

Abstract

Due to the growing amount of digital data stored in relational databases, more new approaches are required to learn relational data. The DARA algorithm is designed to summarize data and it is one of the approaches introduced in relational data mining in order to handle data with one-to-many relations. The DARA algorithm transforms data stored in relational databases into a vector space representation by applying the information retrieval theory. Based on the experimental results, the DARA algorithm is proven to be very effective in learning relational data. However, DARA suffers a major drawback when the cardinalities of attributes are very high because the size of the vector space representation depends on the number of unique values that exist for all attributes in the dataset. This paper investigates the effects of discretizing the magnitude of terms computed and applying a feature selection process that reduces the cardinalities of attributes of the relational datasets on the predictive accuracy of the overall classification task. This involves the task of finding the best set of relevant features used to summarize the data, in which the feature selection processed is performed based on the magnitude of terms computed earlier. Based on the results obtained, it shows that the predictive accuracy of the classification task can be improved by improving the quality of the summarized data. The quality of the summarized data can be enhanced by appropriately discretizing the magnitude of terms computed earlier and also appropriately selecting only a certain percentage of the attributes.

Item Type:Article
Uncontrolled Keywords:Relational Data Mining, Data Summarization, Clustering, Dimensionality Reduction, Discretization Numbers, Feature Selection.
Subjects:Q Science > QA Mathematics
Divisions:FACULTY > Faculty of Computing and Informatics
ID Code:12254
Deposited By:IR Admin
Deposited On:13 Nov 2015 11:13
Last Modified:13 Nov 2015 11:13

Repository Staff Only: item control page


Browse Repository
Collection
   Articles
   Book
   Speeches
   Thesis
   UMS News
Search
Quick Search

   Latest Repository

Link to other Malaysia University Institutional Repository

Malaysia University Institutional Repository