TY - JOUR
T1 - ESetStore
T2 - An erasure-coded storage system with fast data recovery
AU - Liu, Chengjian
AU - Wang, Qiang
AU - Chu, Xiaowen
AU - LEUNG, Yiu Wing
AU - Liu, Hai
N1 - Funding Information:
The authors would like to thank the anonymous reviewers for their valuable comments. This work was supported by Hong Kong Innovation and Technology Fund ITS/443/ 16FX. Chengjian Liu and Qiang Wang contributed equally to this work.
PY - 2020/9/1
Y1 - 2020/9/1
N2 - Erasure codes have been used extensively in large-scale storage systems to reduce the storage overhead of triplication-based storage systems. One key performance issue introduced by erasure codes is the long time needed to recover from a single failure, which occurs constantly in large-scale storage systems. We present ESetStore, a prototype erasure-coded storage system that aims to achieve fast recovery from failures. ESetStore is novel in the following aspects. We proposed a data placement algorithm named ESet for our ESetStore that can aggregate adequate I/O resources from available storage servers to recover from each single failure. We designed and implemented efficient read and write operations on our erasure-coded storage system via effective use of available I/O and computation resources. We evaluated the performance of ESetStore with extensive experiments on a cluster with 50 storage servers. The evaluation results demonstrate that our recovery performance can obtain linear performance growth by harvesting available I/O resources. With our defined parameter recovery I/O parallelism under some mild conditions, we can achieve optimal recovery performance, in which ESet enables minimal recovery time. Rather than being an alternative to improve recovery performance, our work can be an enhancement for existing solutions, such as Partial-parallel-repair (PPR), to further improve recovery performance.
AB - Erasure codes have been used extensively in large-scale storage systems to reduce the storage overhead of triplication-based storage systems. One key performance issue introduced by erasure codes is the long time needed to recover from a single failure, which occurs constantly in large-scale storage systems. We present ESetStore, a prototype erasure-coded storage system that aims to achieve fast recovery from failures. ESetStore is novel in the following aspects. We proposed a data placement algorithm named ESet for our ESetStore that can aggregate adequate I/O resources from available storage servers to recover from each single failure. We designed and implemented efficient read and write operations on our erasure-coded storage system via effective use of available I/O and computation resources. We evaluated the performance of ESetStore with extensive experiments on a cluster with 50 storage servers. The evaluation results demonstrate that our recovery performance can obtain linear performance growth by harvesting available I/O resources. With our defined parameter recovery I/O parallelism under some mild conditions, we can achieve optimal recovery performance, in which ESet enables minimal recovery time. Rather than being an alternative to improve recovery performance, our work can be an enhancement for existing solutions, such as Partial-parallel-repair (PPR), to further improve recovery performance.
KW - Erasure coded storage systems
KW - ESet
KW - ESetStore
KW - Fast data recovery
UR - http://www.scopus.com/inward/record.url?scp=85084111916&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2020.2983411
DO - 10.1109/TPDS.2020.2983411
M3 - Journal article
AN - SCOPUS:85084111916
SN - 1045-9219
VL - 31
SP - 2001
EP - 2016
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 9
M1 - 9051846
ER -