In language documentation, many 'texts' are collected in indigenous languages and then translated into more major world languages. These texts are unstructured and often have spelling variations. This project is about using topic modelling on such texts from an endangered Austronesian language of Papua New Guinea. What topics can be discovered in these texts? Are the topics the same using the indigenous language and creole translations? What kind of topic modelling works best for this kind of common data? Discovering common topics can be helpful for understanding community concerns and important aspects of culture. These, in turn, can be used to bring attention to the needs of indigenous people of the area.
Dr. Danielle Barth.
ANU College of Asia and the Pacific
Familiarized with Machine Learning. Good coding skills in Python coding is a plus!
Gain a good understanding of machine learning models for natural language processing, and learn how to implement and apply these techniques in a research project
Nutural Language Processing, Machine Learning, Topic Modeling