Discretization numbers for multiple-instances problem in relational database

Rayner Alfred and Kazakov, Dimitar L. (2007) Discretization numbers for multiple-instances problem in relational database. In: 11th East European Conference on Advances in Databases and Information Systems, (ADBIS 2007) , 29 September - 3 October 2007, Varna, Northern Bulgaria.

Full text not available from this repository.

Abstract

Handling numerical data stored in a relational database is different from handling those numerical data stored in a single table due to the multiple occurrences of an individual record in the non-target table and non-determinate relations between tables. Most traditional data mining methods only deal with a single table and discretize columns that contain continuous numbers into nominal values. In a relational database, multiple records with numerical attributes are stored separately from the target table, and these records are usually associated with a single structured individual stored in the target table. Numbers in multi-relational data mining (MRDM) are often discretized, after considering the schema of the relational database, in order to reduce the continuous domains to more manageable symbolic domains of low cardinality, and the loss of precision is assumed to be acceptable. In this paper, we consider different alternatives for dealing with continuous attributes in MRDM. The discretization procedures considered in this paper include algorithms that do not depend on the multi-relational structure of the data and also that are sensitive to this structure. In this experiment, we study the effects of taking the one-to-many association issue into consideration in the process of discretizing continuous numbers. We implement a new method of discretization, called the entropy-instance-based discretization method, and we evaluate this discretization method with respect to C4.5 on three varieties of a well-known multi-relational database (Mutagenesis), where numeric attributes play an important role. We demonstrate on the empirical results obtained that entropy-based discretization can be improved by taking into consideration the multiple-instance problem. © Springer-Verlag Berlin Heidelberg 2007.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Keyword: Discretization, Entropy-based, Genetic algorithm, Multiple instance, Semi-supervised clustering
Subjects: Q Science > QA Mathematics > QA1-939 Mathematics > QA71-90 Instruments and machines > QA75.5-76.95 Electronic computers. Computer science
Department: SCHOOL > School of Engineering and Information Technology
Depositing User: ADMIN ADMIN
Date Deposited: 20 May 2011 16:39
Last Modified: 30 Dec 2014 14:52
URI: https://eprints.ums.edu.my/id/eprint/2829

Actions (login required)

View Item View Item