HYBRID DISTANCE-STATISTICAL-BASED PHRASE ALIGNMENT FOR ANALYZING PARALLEL TEXTS IN STANDARD MALAY AND MALAY  DIALECTS

Jasmina Khaw Yen Min; Tien Ping Tan; Bali Ranaivo-Malancon

doi:10.22452/mjcs.vol37no1.5

Authors

Jasmina Khaw Yen Min Tunku Abdul Rahman University
Tien Ping Tan
Bali Ranaivo-Malancon

DOI:

https://doi.org/10.22452/mjcs.vol37no1.5

Keywords:

Malay dialects; Parallel text; Word alignment

Abstract

Parallel texts corpora are essential resources in linguistics and natural language processing, especially in translation and multilingual information retrieval. The publicly available parallel text corpora are limited to certain genres, types and domains. Furthermore, the parallel dialect text is scarce, even though they are important in the analysis and study of a dialect. Collecting parallel dialect text is challenging because dialects typically appear in the form of speech and very limited dialectic texts exist. Moreover, there is no standard orthography in most dialects. The contributions of this paper are threefold. First, the paper describes a methodology in acquiring a parallel text corpus of Standard Malay and Malay dialects, particularly Kelantan Malay and Sarawak Malay. Second, we propose a hybrid of distance based and statistical-based alignment algorithm to align words and phrases the parallel text. The results show that the precision and recall values of the proposed alignment algorithm are more than 95% and better than the state-of the-art GIZA++. Third, the alignment obtained were compared to find out the lexical similarities and differences between Standard Malay and the two studied Malay dialects, contributing valuable insights into the linguistic variations within the Malay language family.

HYBRID DISTANCE-STATISTICAL-BASED PHRASE ALIGNMENT FOR ANALYZING PARALLEL TEXTS IN STANDARD MALAY AND MALAY DIALECTS

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

Editorial Information

Scope

Submission Guidelines

Indexing

Article Publication Charge

Journal Template

Special Issue

In Press Publication

Awards

Information

Conference

Articles

Top Cited Articles

Most View Articles

Publishing Timeline