Research Graph Foundation Internship: Knowledge Mining

9 May 2024

This position is offered through the ANU Computing Internship ([COMP3820] /[ COMP4820] / [COMP8830])


Research Graph Foundation (

The Research Graph Foundation is a not-for-profit and collaborative venture to connect scholarly records across global research repositories. The Foundation contributes to developing capabilities that transform disconnected and siloed research activities into a connected network of scholarly works.

**Project **\

Knowledge Mining: Using Prompt Engineering and Large Language Models to analyse and categorise articles

This internship project aims to use AI-based technology to assist in the automatic classification and categorisation of research article documents. Research and development output classification and categorisation are critical steps for processes like bibliometrics, research and development impact analysis, at the national and international level. Its multidimensionality, structure and content differences and its associated complexity make this categorisation task a great challenge. The current process relies on time-consuming semi-manual steps, which can yield potential inaccuracies. We use cutting-edge technologies with large language models (LLMs) and Retrieval Augmented Generation (RAG) to extract information from documents.

There are two main components to this internship:

(a) Learning, reading and writing about prompt engineering, large language models, and AI tools.

(b) Use Python code and advanced prompt engineering methods with GPT or LLAMA3 to process and classify millions of articles.

The successful applicants will be exposed to the end-to-end process of a research project and gain experience in analysing large volumes of document data. The project is managed by the Research Graph Foundation, and it will have support from academia in a number of Australian research institutions.

All generated materials by interns including articles and source code will be publicly accessible under a Creative Commons licence.

**Required technical skills**\

Required: Experience of using Python for data analytics, and functional understanding of large language models and prompt engineering.

Preferred: Linux environment experience, graph database experience

**Required professional/other skills**\

Ability to work independently and take initiative while knowing when to ask for help and communicate with others. Having curiosity and diligence are needed for a research project as the ability to collaborate with a small team.

**Delivery Mode**\


**Type of internship**\

Unpaid placement.

**How to apply**\

Applications are invited from students who have already passed the eligibility checks for the Computing Internship courses COMP3820 or COMP4820 or COMP8830. Further information about the Computing Internships can be found on the Computing Internship page.

You can nominate multiple preferred Internship projects and host organisations through the one application form.

The closing date for Expressions of Interest for internship projects is 19th May, 2024. Students who have passed the eligibility checks would have received the application form.

