Genomics sequencing has created opportunities and challenges to gain new insights into biology and biomedical research. Genome sequencing data usually consists of millions or billions of short DNA sequences, called reads, that are randomly drawn from genomes. Genome assembly is to put reads back together into a single genome.
Various graph models have been proposed to convert genome sequences into graphs for genome assembly. In this project, we will investigate how to make use of the graph models for sequences and how to incorporate the graph representation learning to address the challenges in comparing large-scale sequences.
- Algorithmic and programming skills.