Business Speak, Doctors' Latin, and Layers' French: Improving Information Access for People Unfamiliar with Professional Jargon

Research areas

Temporary Supervisor

Professor Hanna Suominen


Motivation: Good text readability is important to ensure efficiency in communication and eliminate risks of misunderstanding.

Approach: Enriched presentation and linkage of clinical records, health pamphlets, medical papers, legislation, patents, regulations, financial statements, quarterly reports, taxation rules, and other difficult, verbose, and scattered information resources for auditors, laypeople, scientists, and other people unfamiliar with the professional jargon.


The project methodology consists of Information Visualisation, Logic, Machine Learning (ML), Natural Language Processing (NLP), and Web Search, either by studying a small set of methods in depth or larger set in combination. Research on state-of-the-art deep, transfer, and active learning methods is called for to minimise the amount of annotated data for method setup whilst maximising the processing correctness and system adaptability. Multi-modal aspects are considered to address visual text summarisation and content explanation. The project is truly interdisciplinary and tightly connected to authentic data, real-life applications, and business/provider internships. Both experimental and theoretical work, in other words, applied and fundamental research, go hand in hand with their emphasis depending on the student’s individual interests and expertise.


This project will appeal to students with excellent skills in experimentation, programming, and teamwork. The preference is on students who have finished/are taking the units of Artificial Intelligence, Document Analysis, and/or Machine Learning in The ANU or similar.

Background Literature

See, for example, the following recent paper:  Kelly L, Goeuriot L, Suominen H, Neveol A, Palotti J, Zuccon G. Overview of the CLEF eHealth Evaluation Lab 2016. In: Lecture Notes in Computer Science: Experimental IR Meets Multilinguality, Multimodality, and Interaction 2016 9822, pp. 255-266.


This student project is a part of the activities of the NLP Team within ML Group in The Australian National University (ANU) and Data61 in Canberra, the capital of Australia. The OECD Regional Well-Being Report 2014 evaluated Canberra as the most livable city in the world.


The ML Group has been recently (in 2014) ranked among the top five in the world in ML, the others being Microsoft Research, Max Planck Institute Tübingen, University of Berkeley, and University of Cambridge. According to the QS World University Rankings for 2015-16, The ANU ranks within the top-20 universities globally with the overall score of 91.0 out of 100.0 (19th) whilst the next best Australian university scored 83.1 (42nd) and for the field of research (FOR) code of Artificial Intelligence and Image Processing, applicable to ML and NLP, under Information and Computer Sciences, The ANU has obtained the top 5 out of 5 score in the Excellence in Research for Australia (ERA) evaluations, both in 2010 and 2012.


The NLP Team is experienced in developing powerful low-cost techniques to free-form text them into structured representations. Our deep and transfer ML methods are able to use less than a hundred expert-annotated sentences to achieve performance comparable to the state-of-the-art systems, initialised with ten times more data. Similarly, our language processing methods have been among the finest elite in the ALTA, CLEF, and TREC shared tasks on automated understanding, use, summarisation, and  translation in difficult genres of “Doctors’ Latin” in electronic health records and “Lawyers’ French” in patents.


Artificial Intelligence, Big Data, Comprehension, Data Storage and Retrieval, Evaluation,  Information Visualisation, Logic, Machine Learning, Natural Language Processing, Record Linkage, Software Design, User-Computer Interface, Web Search

Updated:  1 June 2019/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing