Systems for Data-Intensive Applications

Picture of shoaib-akram.md Shoaib Akram

15 Dec 2024
Keywords

search engines, NoSQL databases, systems programming, Java runtime, persistence, memory management, NVMe SSDs

Description

Data-intensive applications are a critical component of modern online services. At Vertically Integrated Computer Systems (VICS), we aim to optimize such applications and the hardware and software infrastructures that underpin these applications for performance and cost/energy efficiency. Our recent emphasis has been on memory and storage hardware. We are especially interested in optimizing key-value stores, relational databases, and enterprise search engines. We propose new system designs for these applications with a focus on memory management, storage engines, and scheduling. We implement our ideas and prototypes mainly in C/C++ or Java. We also propose general optimizations in hardware (microarchitecture) and software (runtime environments and operating systems). We study hardware modifications in an architectural simulator written in C++.

We always look forward to offering projects to interested students with a solid background in computer systems. We especially welcome inquiries from prospective honors students to engage in our research projects.

Please email Shoaib.Akram@anu.edu.au for an appointment. See below for more details.

Past Projects

Here are selected research papers and honors theses our past students have published.

Process of picking an honors thesis topic

We intentionally do not offer a list of definite topics with expected or predictable outcomes and results. In our group, picking a research topic begins with a conversation. We initially choose a broad area for discussion based on the student’s interests and then narrow the specific details after some literature review and discussions. The conversations are partly driven by students’ prior project experiences and interest in specific coursework. We develop hypotheses and formulate theories. We then plan experimental infrastructure to validate or negate the theories. Our projects are not purely implementation-based or software engineering ones. We intend to answer an important research question.

One aspect of our research is unique and important: we build stuff and do rigorous performance evaluations of our newly developed prototypes! As you can see in the above examples, all students propose a new system which they design and develop from scratch, e.g., SPIRIT, HyperCache, or APCache.

Possible Tasks Include

Our projects intersect with computer architecture, operating systems, runtime environments, and data-intensive applications. The tasks and skills required for your specific project may differ. Some examples

  • writing customized memory allocators for emerging applications (e.g., LLM training) and storage technologies (e.g., persistent memory)
  • developing/hacking database systems for fast storage devices (e.g., using the asynchronous disk I/O APIs in Linux)
  • rigorous performance evaluation and hacking of the Java runtime environment (OpenJDK)
  • conducting computer architecture simulation studies for newly proposed microarchitectures
  • writing Linux kernel code (e.g., new OS modules or modifying page cache)
Example Topics
  • Rethinking enterprise search on fast storage devices
  • A framework for investigating database schedulability
  • Intelligent OS scheduling for data-intensive applications
  • Exploiting Intel memory compression accelerators for databases
Requirements

Background knowledge in computer systems (such as COMP2300 and COMP2310), and a keen interest in building data-intensive applications. Good programming skills (C/C++ or Java) will be required.

arrow-left bars search times