Khmer battambang font type5/28/2023 ![]() Phonemic and phonetic transcriptions of each word in Datasets 01 and 02 are manually created based on existing phonological principles/regularities postulated by previous scholars. Dataset 02 is a list of 7,654 words drawn from the official Khmer monolingual dictionary published in 1967. Dataset 02 and Dataset 03 serve as testing dataset. ![]() Dataset 01 is a list of manually selected 140 words which covers most spelling and pronunciation cases in native Khmer words. Three datasets are created to manually train the model as well as test it, and two Thrax grammars were written to fulfill the two processes. The approach chosen for this research involves two processes: (1) converting the orthographic words into phonemic transcription which represents careful speech, and (2) converting the phonemic transcription to the phonetic transcription which represents casual speech. ![]() This thesis explores using phonological principles in Khmer to build a model which can automatically transduce orthographic native Khmer words into a phonemic transcription and a close phonetic transcription.
0 Comments
Leave a Reply. |