Rayner Alfred , and Dimitar Kazakov, (2007) A clustering approach to generalized pattern identification based on multi-instanced objects with DARA. In: Communications of the Eleventh East-European Conference on Advances in Databases and Information Systems, September 29 - October 3, 2007, Varna, Bulgaria.
Clustering is an essential data mining task with various types of applications. Traditional clustering algorithms are based on a vector space model representation. A relational database system often contains multirelational information spread across multiple relations (tables). In order to cluster such data, one would require to restrict the analysis to a single representation, or to construct a feature space comprising all possible representations from the data stored in multiple tables. In this paper, we present a data summarization approach, borrowed from the Information Retrieval theory, to clustering in multi-relational environment. We find that the data summarization technique can be used here to capture the typical high volume of multiple instances and numerous forms of patterns. Our experiments demonstrate a technique to cluster data in a multi-relational environment and show the evaluation results on the mutagenesis dataset. In addition, the effect of varying the number of features considered in clustering on the classification performance is also evaluated.
|Item Type:||Conference Paper (UNSPECIFIED)|
|Uncontrolled Keywords:||Relational Data Mining, Distance - based, Clustering, Multiple Instance, Relational Databas|
|Subjects:||Q Science > QA Mathematics > QA76 Computer software|
|Divisions:||FACULTY > Faculty of Computing and Informatics|
|Deposited By:||IR Admin|
|Deposited On:||17 Nov 2015 15:24|
|Last Modified:||17 Nov 2015 15:24|
Repository Staff Only: item control page