Enhancing the natural language processing for Malay language stemming, identifying and correcting misspelled words identifying neologism

Surayaini Basri (2015) Enhancing the natural language processing for Malay language stemming, identifying and correcting misspelled words identifying neologism. Masters thesis, Universiti Malaysia Sabah.

[img] Text
24 PAGES.pdf

Download (336kB)
[img] Text
FULLTEXT.pdf
Restricted to Registered users only

Download (3MB)

Abstract

Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural languages). A good NLP approach is needed because the applications of NLP are used across a wide variety of industries in order to solve critical knowledge problems, such as providing new insights gleaned from massive collection of unstructured content (social media, news, patent filings, financial disclosures, etc.). A weak NLP for a language can cause in irrelevant information being retrieved. The lack of works in building more effective algorithms in performing the stemming process, identifying misspelled words, and identifying neologism has affected the efficiency of retrieving relevant information or articles in Malay language. This is due to the fact that the Malay language is a language that has different and complex morphology structure than other languages and thus, the standard NLP approach used in other languages cannot be easily applied in processing and retrieving relevant information or articles in Malay Language. This work focuses on improving the Malay language stemming process, introducing a new approach in identifying and correcting typo or misspelled words and lastly proposing solution to identify neologism. By improving the Malay stemming process, it will enable the information retrieval process to be performed with more effectively by identifying more affixed word in Malay language because not all affixed words are stored in the standard Malay dictionary. By identifying and correcting typo or misspelled words, it can also prevent the information retrieval system from ignoring several important words just because the words are misspelled. Finally, by identifying neologism, one may assist lexicographer to identify new words that can be considered as part of the lexicon dictionary. Based on the experiments conducted, the proposed approaches are proven to be useful in improving the NLP in Malay language.

Item Type: Thesis (Masters)
Keyword: Natural Language Processing, Natural languages, Malay language
Subjects: Q Science > QA Mathematics > QA1-939 Mathematics > QA71-90 Instruments and machines > QA75.5-76.95 Electronic computers. Computer science > QA76.75-76.765 Computer software
Department: FACULTY > Faculty of Computing and Informatics
Depositing User: DG MASNIAH AHMAD -
Date Deposited: 09 Aug 2024 08:18
Last Modified: 09 Aug 2024 08:18
URI: https://eprints.ums.edu.my/id/eprint/39472

Actions (login required)

View Item View Item