Blazingly Fast Real-Time Analytics over Big Streaming Data

Join and build real-rime search and analytic engines of the future!

Picture of shoaib-akram.md Shoaib Akram

23 Feb 2023

Future of Earth Observation

Today, enabling blazingly fast real-time search and analytics over rapidly originating social media and web content cost-efficiently is critical to the success of many enterprises, including Google, Linkedin, Twitter, and Meta. Real-time search demands fast indexing of massive datasets, stressing both memory and persistent storage, and concurrently running search queries over the index exacerbates the pressure on search infrastructures. Unfortunately, the contemporary hardware/software stack for real-time search is not prepared to deal with the expected response times and growth in social media datasets.

The growth of data is exponentially doubling every year. On the other hand, technology scaling limits can only deliver 10% more memory capacity every year. These trends place a lot of pressure on existing data centric infrastructures for real-time analytics.

At the Vertically Integrated Computer Systems (VICS) research group, we are investigating a new real-time search and analytic engine stack that uses emerging memory technologies with scalable and cost-efficient capacity to power real-time analytics over large datasets. Our initial prototype builds a huge search index in memory over streaming datasets. It offers concurrent indexing and query evaluation. The index is persistent in memory and the system can be gracefully shutdown and restarted without loss of work.

The key advantage of our initial analytic engine is that query evaluation is extremely fast and there is minimal interaction with the operating system during live operation. The engine uses a novel memory management approach with pre-allocation and user-level memory management (no OS interaction). The overall system is more secure and well-performing.

We have a range of student projects to enhance the capabilities of our analytic engine.

  • We are looking to add a high-performance result cache for storing the results of expensive and long-running queries.
  • We are looking to enhance the capacity of the analytic engine by adding NVMe SSD storage.
  • We are looking to make the analytic engine tolerant to unexpected crashes and shutdowns.

The projects require excellent programming skills and a curiosity for systems research. An interest in hardware-level performance analysis is a plus.

These projects are suitable for both course research projects and honours thesis projects.

Please get in touch with Shoaib Akram at shoaib.akram@anu.edu.au.

arrow-left bars search times