TY - JOUR
T1 - G-CRS
T2 - GPU Accelerated Cauchy Reed-Solomon Coding
AU - Liu, Chengjian
AU - WANG, Qiang
AU - CHU, Xiaowen
AU - LEUNG, Yiu Wing
N1 - Funding Information:
We would like to thank the anonymous reviewers for their valuable comments. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research. This work is partially supported by Shenzhen Basic Research Grant SCI-2015-SZTIC-002 and Hong Kong ITF Grant ITS/443/16FX.
PY - 2018/7/1
Y1 - 2018/7/1
N2 - Recently, erasure coding has been extensively deployed in large-scale storage systems to replace data replication. With the increase in disk I/O throughput and network bandwidth, the performance of erasure coding becomes a major bottleneck of erasure-coded storage systems. In this paper, we propose a graphics processing unit (GPU)-based implementation of erasure coding named G-CRS, which employs the Cauchy Reed-Solomon (CRS) code, to overcome the aforementioned bottleneck. To maximize the coding performance of G-CRS, we designed and implemented a set of optimization strategies, such as a compact structure to store the bitmatrix in GPU constant memory, efficient data access through shared memory, and decoding parallelism, to fully utilize the GPU resources. In addition, we derived a simple yet accurate performance model to demonstrate the maximum coding performance of G-CRS on GPU. We evaluated the performance of G-CRS through extensive experiments on modern GPU architectures such as Maxwell and Pascal, and compared with other state-of-the-art coding libraries. The evaluation results revealed that the throughput of G-CRS was 10 times faster than most of the other coding libraries. Moreover, G-CRS outperformed PErasure (a recently developed, well optimized CRS coding library on the GPU) by up to 3 times in the same architecture.
AB - Recently, erasure coding has been extensively deployed in large-scale storage systems to replace data replication. With the increase in disk I/O throughput and network bandwidth, the performance of erasure coding becomes a major bottleneck of erasure-coded storage systems. In this paper, we propose a graphics processing unit (GPU)-based implementation of erasure coding named G-CRS, which employs the Cauchy Reed-Solomon (CRS) code, to overcome the aforementioned bottleneck. To maximize the coding performance of G-CRS, we designed and implemented a set of optimization strategies, such as a compact structure to store the bitmatrix in GPU constant memory, efficient data access through shared memory, and decoding parallelism, to fully utilize the GPU resources. In addition, we derived a simple yet accurate performance model to demonstrate the maximum coding performance of G-CRS on GPU. We evaluated the performance of G-CRS through extensive experiments on modern GPU architectures such as Maxwell and Pascal, and compared with other state-of-the-art coding libraries. The evaluation results revealed that the throughput of G-CRS was 10 times faster than most of the other coding libraries. Moreover, G-CRS outperformed PErasure (a recently developed, well optimized CRS coding library on the GPU) by up to 3 times in the same architecture.
KW - Cauchy reed-solomon code
KW - distributed storage system
KW - erasure coding
KW - graphics processing unit
UR - http://www.scopus.com/inward/record.url?scp=85041234931&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2018.2791438
DO - 10.1109/TPDS.2018.2791438
M3 - Journal article
AN - SCOPUS:85041234931
SN - 1045-9219
VL - 29
SP - 1484
EP - 1498
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 7
ER -