The Apache Ozone: A Distributed Object Storage System is with Erasure Coding

Abstract

Apache Ozone is a highly scalable distributed object storage system and also provides the file system interface. Distributed storage systems typically use replication to provide high reliability and Ozone supports the replication model for the same. However replication is expensive in terms of storage space and other resources ( ex: network bandwidth etc). Erasure Coding(EC) is a proven technique to save storage space and throughput requirements. Apache Ozone implemented the EC support. With the EC in place, Apache Ozone can reduce the storage cost by ~50% compared to traditional 3-way replication storages by providing the same level of reliability. In Apache Ozone, the replication unit is a Container. The Container is nothing but a logical batch of data blocks. Here EC uses the same Container abstraction, but uses d data Containers and p parity Containers(d>p) and places them into distinct storage nodes. The actual data block chunks stored into the d data Container blocks in order and the encoded parity chunks stored into the p Container blocks. In this talk we deep dive into the detailed EC architecture which covers the data layout and decoding sections as well. Also we will discuss some of the design challenges we faced and how we solved them.

Related Sessions