When Learned Index Meets Blockchain: Design, Algorithms, and Performance Evaluation

Project: Research project

Project Details


Blockchain has become increasingly widely adopted for many decentralized applications in recent years. It enables otherwise untrusted peers to collectively maintain a verifiable distributed database. However, to preserve the integrity and support data provenance, every blockchain node must store and maintain entire ledger states, which incurs high storage costs and prolongs data search time.

This project aims to explore emerging learned index technologies to optimize blockchain system performance. Recent studies on learned indexes have shown their advantage over traditional indexes such as B+-tree and R-tree. At its core, a learned index replaces the directing keys in each index node with a learned model to reduce storage overhead and improve search efficiency. However, we cannot apply the existing learned indexes directly to blockchain systems because of the following challenges. First, the existing learned indexes do not support data authentication and provenance, which are essential to blockchain systems. Second, blockchain uses long hash strings as indexing keys, which are different from the numerical keys used by the existing learned indexes. Third, the existing learned indexes focus on read-optimized, in-memory databases, whereas blockchain systems feature frequent state updates and disk-based storage.

To address these challenges, in this project we propose a novel column-based Merkle learned index for blockchain systems. More specifically, we plan to design (1) a two-level column-based Merkle index for supporting efficient data authentication and provenance; (2) two specifically-designed structures and learned models that are tailored for the two levels of indexes; and (3) a multivariate linear regression model to learn the distribution of hash string keys. Besides formulating the basic design, we also plan to investigate efficient algorithms for search within and maintenance of the proposed learned index as well as to develop several optimization techniques that strike a balance between storage and search performance. Finally, we will develop a proof-of-concept prototype system and evaluate its performance with real-world blockchain workloads to assess the practicality of the proposed solutions.

With our rich research experience in blockchain data management and query processing, we expect the outcome of this project to accelerate the growth and adoption of blockchain technologies and decentralized services in the pertinent industries.
Effective start/end date1/01/23 → …


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.