Optimized feature construction methods for data summarizations of relational data

Sze, Florence Sia Fui (2014) Optimized feature construction methods for data summarizations of relational data. Masters thesis, Universiti Malaysia Sabah.

	Text 24 PAGES.pdf Download (198kB)
	Text FULLTEXT.pdf Restricted to Registered users only Download (1MB)

Abstract

Many approaches have been developed to discover knowledge (i.e. useful information) from data stored in multiple tables in a relational database. Dynamic Aggregation of Relational Attributes (DARA) algorithm is one of the approaches to summarize data stored in a target table that has a one-to-many relationship with data stored in a non-target table. DARA transforms the data relational representation into a vector space representation and a clustering process is applied to group the data based on their characteristics similarity. The summarized data will then be fed to any classification algorithm to perform the classification task. A classification task is commonly performed to discover frequent patterns in the data that can be used to classify new unknown data. In DARA, the predictive accuracy of the classification task can be affected by the descriptive accuracy of the summarized data, DARA. The descriptive accuracy of the DARA summarized data is highly influenced by the representation of non-target records in the vector space model. Feature construction has shown being able to enrich the representation of non-target records and thus, to improve the descriptive accuracy of the summarized data. However, the existing feature construction method does not explore all possible potential representation of records. In this thesis, novel feature construction methods are introduced and a question of whether or not the descriptive accuracy of the summarized data can benefit from the novel feature construction methods is investigated. The proposed framework involves the application of genetic algorithm which incorporates several feature scoring measures to optimize the process of feature construction. This thesis also presents the study of a method to improve the descriptive accuracy of DARA algorithm by generating multi-instances of summarized data. The empirical results show that the predictive accuracy can be improved and thereby the descriptive accuracy of the summarized data can benefit from the proposed methods. The proposed methods provide wider search space of valuable way to represent records in non-target table.

Item Type:	Thesis (Masters)
Keyword:	Knowledge discovery, Relational database, Dynamic Aggregation of Relational Attributes, Vector space representation, Clustering, Classification task
Subjects:	Q Science > QA Mathematics > QA1-939 Mathematics > QA71-90 Instruments and machines > QA75.5-76.95 Electronic computers. Computer science
Department:	SCHOOL > School of Engineering and Information Technology
Depositing User:	DG MASNIAH AHMAD -
Date Deposited:	13 Jan 2025 16:02
Last Modified:	13 Jan 2025 16:02
URI:	https://eprints.ums.edu.my/id/eprint/42534

Actions (login required)

View Item