Web-based mel frequency cepstral coefficients and hidden Markov model for Mandarin speech recognition system

Tseu Geanna Afera (2022) Web-based mel frequency cepstral coefficients and hidden Markov model for Mandarin speech recognition system. Universiti Malaysia Sabah. (Unpublished)

[img] Text
WEB-BASED MEL FREQUENCY CEPSTRAL COEFFICIENTS AND HIDDEN MARKOV MODEL FOR MANDARIN SPEECH RECOGNITION SYSTEM.24pages.pdf

Download (296kB)
[img] Text
WEB-BASED MEL FREQUENCY CEPSTRAL COEFFICIENTS AND HIDDEN MARKOV MODEL FOR MANDARIN SPEECH RECOGNITION SYSTEM.pdf
Restricted to Registered users only

Download (2MB)

Abstract

Learning a new language may be difficult for adults, especially the communication aspect of the process because pronunciation accuracy can be a challenge to master. Mandarin Chinese is a language that depends on its tone, where each character has a tone associated with it out of the available five tones. One of the most effective ways to learn the correct pronunciation in Mandarin is to practice reading and speaking with a teacher who will be able to listen and give feedback. However, this is not a practical method as the teachers cannot be available to attend to every student in the classroom, especially if it is a large class. Therefore, a Web-based Mandarin speech recognition system is proposed to help with tackling this problem. The objectives of this project are to investigate Mel Frequency Cepstral Coefficient (MFCC) and Hidden Markov Model for the proposed web-based Mandarin speech recognition system, to develop and evaluate the prototype for the proposed web-based Mandarin speech recognition system. The proposed system is targeted at students in Universiti Malaysia Sabah who are currently taking beginner levels Mandarin language. The prototype consists of several phrases to detect the speakers’ pronunciation. The Mandarin speech recognition system applies the Hidden Markov Model (HMM) as the machine learning model and is implemented in PHP as well as Python in a web-based application. The project allows the user to record their speech and see the accuracy through the prediction output. Based on the training and testing, the MFCC and HMM produce speech model that yields an accuracy probability ranging from 0.80 to 0.94 and for the testing, the phrase prediction may yield different phrases than the spoken phrase but overall the accuracy probability ranges from 0.53 to 1.0, therefore it can be used as basic speech recognition. Future works include gamification of the system to make it more interesting and to train the speech model with word phonemes to enhance the performance of the system.

Item Type: Academic Exercise
Keyword: Speech recognition , Machine learning , Hidden Markov Model , Mel frequency cepstral coefficients , Mandarin Chinese
Subjects: P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania > PL1-8844 Languages of Eastern Asia, Africa, Oceania > PL1001-3208 Chinese language and literature > PL1001-1960 Chinese language
Q Science > QA Mathematics > QA1-939 Mathematics > QA71-90 Instruments and machines > QA75.5-76.95 Electronic computers. Computer science > QA76.75-76.765 Computer software
Department: FACULTY > Faculty of Computing and Informatics
Depositing User: DG MASNIAH AHMAD -
Date Deposited: 18 Jul 2022 19:51
Last Modified: 18 Jul 2022 19:51
URI: https://eprints.ums.edu.my/id/eprint/33332

Actions (login required)

View Item View Item