Full Publications

You can also find all the publications on my Google Scholar profile.

Selected Publications:

Journal Articles


SlimChain: Scaling Blockchain Transactions through Off-Chain Storage and Parallel Processing

Published in Proceedings of the VLDB Endowment (PVLDB), 2021

Blockchain technology has emerged as the cornerstone of many decentralized applications operating among otherwise untrusted peers. However, it is well known that existing blockchain systems do not scale well. Transactions are often executed and committed sequentially in order to maintain the same view of the total order. Furthermore, it is necessary to duplicate both transaction data and their executions in every node in the blockchain network for integrity assurance. Such storage and computation requirements put significant burdens on the blockchain system, not only limiting system scalability but also undermining system security and robustness by making the network more centralized. To tackle these problems, in this paper, we propose SlimChain, a novel blockchain system that scales transactions through off-chain storage and parallel processing. Advocating a stateless design, SlimChain maintains only the short commitments of ledger states on-chain while dedicating transaction executions and data storage to off-chain nodes. To realize SlimChain, we propose new schemes for off-chain smart contract execution, on-chain transaction validation, and state commitment. We also propose optimizations to reduce network transmissions and a new sharding technique to improve system scalability further. Extensive experiments are conducted to validate the performance of the proposed SlimChain system. Compared with the existing systems, SlimChain reduces the on-chain storage requirements by 97% ~ 99%, while also improving the peak throughput by 1.4X ~ 15.6X.

Recommended citation: Cheng Xu, Ce Zhang, Jianliang Xu, and Jian Pei. (2021). "SlimChain: Scaling Blockchain Transactions through Off-Chain Storage and Parallel Processing." VLDB 21.
Download Paper | Download Slides

Conference Papers


COLE: A Column-based Learned Storage for Blockchain Systems

Published in 22nd USENIX Conference on File and Storage Technologies (FAST 24), 2024

Blockchain systems suffer from high storage costs as every node needs to store and maintain the entire blockchain data. After investigating Ethereum’s storage, we find that the storage cost mostly comes from the index, i.e., Merkle Patricia Trie (MPT). To support provenance queries, MPT persists the index nodes during the data update, which adds too much storage overhead. To reduce the storage size, an initial idea is to leverage the emerging learned index technique, which has been shown to have a smaller index size and more efficient query performance. However, directly applying it to the blockchain storage results in even higher overhead owing to the requirement of persisting index nodes and the learned index’s large node size. To tackle this, we propose COLE, a novel column-based learned storage for blockchain systems. We follow the column-based database design to contiguously store each state’s historical values, which are indexed by learned models to facilitate efficient data retrieval and provenance queries. We develop a series of write-optimized strategies to realize COLE in disk environments. Extensive experiments are conducted to validate the performance of the proposed COLE system. Compared with MPT, COLE reduces the storage size by up to 94% while improving the system throughput by 1.4×-5.4×.

Recommended citation: Ce Zhang, Cheng Xu, Haibo Hu, and Jianliang Xu. (2024). "COLE: A Column-based Learned Storage for Blockchain Systems." FAST 24.
Download Paper | Download Slides

Authenticated Keyword Search in Scalable Hybrid-Storage Blockchains

Published in 37th IEEE International Conference on Data Engineering (ICDE 21), 2021

Blockchain has emerged as a promising solution for secure data storage and retrieval for decentralized applications. To scale blockchain systems, a prevailing approach is to employ a hybrid storage model, where only small meta-data are stored on-chain while the raw data are outsourced to an off-chain storage service provider. The key issue for query processing in such a system is the design of gas-efficient authenticated data structure (ADS) to authenticate the query results. In this paper, we study novel ADS schemes for authenticated keyword search in hybrid-storage blockchains. We first propose the suppressed Merkle inverted (\(Merkle^{inv}\)) index, which maintains only a partial ADS structure on-chain that can be securely updated with a logarithm-sized cryptographic proof. Moreover, we propose a Chameleon inverted (\(Chameleon^{inv}\)) index that leverages the chameleon vector commitment to achieve a constant maintenance cost. It is further optimized with Bloom filters to enhance the query and verification performance. We prove the security of the proposed ADS schemes and evaluate their performance using real datasets on the Ethereum platform. Experimental results show that, compared to a baseline solution, the proposed \(Merkle^{inv}\) and \(Chameleon^{inv}\) indexes reduce the average on-chain maintenance cost from US$11.21 down to US$2.69 and US$0.24, respectively, without sacrificing much the query performance.

Recommended citation: Ce Zhang, Cheng Xu, Haixin Wang, Jianliang Xu, and Byron Choi. (2021). "Authenticated Keyword Search in Scalable Hybrid-Storage Blockchains." ICDE 21.
Download Paper | Download Slides

vChain: Enabling Verifiable Boolean Range Queries over Blockchain Databases

Published in 2019 ACM SIGMOD International Conference on Management of Data (SIGMOD 19), 2019

Blockchains have recently been under the spotlight due to the boom of cryptocurrencies and decentralized applications. There is an increasing demand for querying the data stored in a blockchain database. To ensure query integrity, the user can maintain the entire blockchain database and query the data locally. However, this approach is not economic, if not infeasible, because of the blockchain’s huge data size and considerable maintenance costs. In this paper, we take the first step toward investigating the problem of verifiable query processing over blockchain databases. We propose a novel framework, called vChain, that alleviates the storage and computing costs of the user and employs verifiable queries to guarantee the results’ integrity. To support verifiable Boolean range queries, we propose an accumulator-based authenticated data structure that enables dynamic aggregation over arbitrary query attributes. Two new indexes are further developed to aggregate intra-block and inter-block data records for efficient query verification. We also propose an inverted prefix tree structure to accelerate the processing of a large number of subscription queries simultaneously. Security analysis and empirical study validate the robustness and practicality of the proposed techniques.

Recommended citation: Cheng Xu, Ce Zhang, and Jianliang Xu. (2019). "vChain: Enabling Verifiable Boolean Range Queries over Blockchain Databases." SIGMOD 19.
Download Paper | Download Slides

GEM^2-Tree: A Gas-Efficient Structure for Authenticated Range Queries in Blockchain

Published in 35th IEEE International Conference on Data Engineering (ICDE 19), 2019

Blockchain technology has attracted much attention due to the great success of the cryptocurrencies. Owing to its immutability property and consensus protocol, blockchain offers a new solution for trusted storage and computation services. To scale up the services, prior research has suggested a hybrid storage architecture, where only small meta-data are stored on-chain and the raw data are outsourced to off-chain storage. To protect data integrity, a cryptographic proof can be constructed online for queries over the data stored in the system. However, the previous schemes only support simple key-value queries. In this paper, we take the first step toward studying authenticated range queries in the hybrid-storage blockchain. The key challenge lies in how to design an authenticated data structure (ADS) that can be efficiently maintained by the blockchain, in which a unique gas cost model is employed. By analyzing the performance of the existing techniques, we propose a novel ADS, called \(GEM^2\)-tree, which is not only gas-efficient but also effective in supporting authenticated queries. To further reduce the ADS maintenance cost without sacrificing much the query performance, we also propose an optimized structure, \(GEM^{2*}\)-tree, by designing a two-level index structure. Theoretical analysis and empirical evaluation validate the performance of the proposed ADSs.

Recommended citation: Ce Zhang, Cheng Xu, Jianliang Xu, Yuzhe Tang, and Byron Choi. (2019). "GEM^2-Tree: A Gas-Efficient Structure for Authenticated Range Queries in Blockchain." ICDE 19.
Download Paper | Download Slides