Full PhD Scholarship: Query efficiency or scalability improvement of Vector Database

Introduction

In the midst of the AI revolution, most applications rely on vector embedding to capture the semantic information. The vector database is designed to index and store those vector embeddings for fast retrieval and similarity search.

The query processing such as Approximate Nearest Neighbor search in vector database is a significant driver in clicks, views, and sales across several platforms like Google and Amazon. To achieve good performance of vector search, several types of algorithms have been proposed including Locality Sensitive Hashing, IVFSQ index, Hierarchical Navigable Small-world Graph. Nevertheless, with the rapid development of representation learning in AI domain, a huge number of queries emerge and the vector database can be massive, which put forward higher requirements on the query efficiency and scalability of vector database. Therefore, this project focuses on the query efficiency or scalability improvement by exploring and diving into the cutting-edge algorithms and techniques in vector database.

The specific research topic is flexible and can be decided after multiple discussions. Applicants who are interested in this project are welcome to discuss related ideas.

Reference

[1] Pan, James, Jianguo Wang, and Guoliang Li. “Vector Database Management Techniques and Systems.” In Companion of the International Conference on Management of Data (SIGMOD). 2024.

[2] Li, Wen, Ying Zhang, Yifang Sun, Wei Wang, Mingjie Li, Wenjie Zhang, and Xuemin Lin. “Approximate nearest neighbor search on high dimensional data—experiments, analyses, and improvement.” IEEE Transactions on Knowledge and Data Engineering 32, no. 8 (2019): 1475-1488.

[3] Chen, Qi, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. “Spann: Highly-efficient billion-scale approximate nearest neighbor search.” arXiv preprint arXiv:2111.08566 (2021).

Requirement

Background and experience in data structure and graph theory (preferred). Programming experience in C++ is essential. See here for more detailed requirement and guideline about PhD application.

Timing

The staring date is flexible and preferably before mid 2025. Applications will be considered until the position is filled.

Contact

If you are interested in this project, contact Dr. Mengxuan Zhang.