Enhancing Malay stemming algorithm with background knowledge

Leong, Leow Ching and Surayaini Basri, and Rayner Alfred, (2012) Enhancing Malay stemming algorithm with background knowledge. In: 12th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2012, 3-7 September 2012, Kuching, Sarawak.

Full text not available from this repository.

Official URL: http://dx.doi.org/ 10.1007/978-3-642-32695-0_68

Abstract

Stemming is a process of reducing the inflected words to their root form. Stemming algorithm for Malay language is very important especially in building an effective information retrieval system. Although there are many existing Malay stemmers such as Othman's and Fatimah's algorithms, they are not complete stemmers because their algorithms fail to stem all the Malay words as there is still a room for improvement. It is difficult to implement a perfect stemmer for Malay language due to the complexity of words morphology in Malay language. This paper presents a new approach to stem Malay word with higher percentage of correctly stemmed words. In the proposed approach, additional background knowledge is provided in order to increase the accuracy of stemming words in Malay language. This new approach is called a Malay stemmer with background knowledge. Besides having reference to a dictionary that contains all root words, a second reference to a dictionary is added that contains all affixed words. These two files are considered as the background knowledge that will serve as references for the stemming process. A Rule Frequency Order (RFO) is applied as the basis stemming algorithm due to its high accuracy of correctly stemming Malay words. Based on the results obtained, it is proven that the proposed stemmer with background knowledge produces less error in comparison to previously published stemmers that do not apply any background knowledge in stemming Malay words.

Item Type:Conference Paper (UNSPECIFIED)
Uncontrolled Keywords:Affixes, Background Knowledge, Malay stemming, Rule Frequency Order, Rule-based Affix Elimination
Subjects:Q Science > QA Mathematics
Divisions:SCHOOL > School of Engineering and Information Technology
ID Code:5288
Deposited By:IR Admin
Deposited On:31 Oct 2012 17:08
Last Modified:08 Sep 2014 12:40

Repository Staff Only: item control page


Browse Repository
Collection
   Articles
   Book
   Speeches
   Thesis
   UMS News
Search
Quick Search

   Latest Repository

Link to other Malaysia University Institutional Repository

Malaysia University Institutional Repository