Project Overview#

Version control systems, such as GitHub and GitLab are essential for open-source software, whether it’s “traditional” (like a mobile game, or ANU’s website) or scientific (like a ML library, or a statistical package). Nevertheless, the people working on those projects is different, their technical background and purpose is not always alike, and their usage of GitHub may vary.

This project will make an exhaustive analysis of version control practices across 2000 Python projects. You’ll compare committing frequency, issue length+sentiment+frequency, pull-request usage, actions usages, and so on.


  • Good Python knowledge, including scikit-learn, numpy, pandas, datasets wrangling, text cleaning.
  • Visualisation knowledge (seaborn, ggplot).
  • Ability to critically analyse large quantitites of data, wrangling datasets, and extracting insights.
  • ML/DL knowledge is not required, as you will only use sentiment analysis (e.g., textblob/nltk).
  • Knowledge of GitHub and version control is fundamental.


bars search times arrow-up