Context-aware document analysis

Description

Document management is a vital task in any enterprise. In many domains, a massive number of documents written in natural language are available although they are not well configured, and the relations between them are not determined. Organising this amount of information manually is not feasible in many domains. A method that can assist us in extracting the similarities between documents is the first step toward an autonomous framework for managing documents. 
In NLP, distributed representation (or word embeddings) for text has been widely studied. In this approach, a vector represents natural language elements, including word, phrase, paragraph, or even whole document. The vector representation captures the semantics of NL elements. 
On the other hand, each document has metadata that relates the document to other entities, including people (e.g., authors) and documents (e.g., cited papers). Modelling documents by taking to account both content and context of documents is essential.
In this project, we seek to address the issue of modelling documents based on their content and context.

Requirements

Programming skill in Python

Solid Background in Machine Learning or Natural Language Processing 

Gain

Experience on developing solution for an open research question

Skills in conducting research in ML and NLP

Keywords

Natural language processing

Updated:  10 August 2021/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing