Apache Ozone is a highly scalable distributed object storage system and also provides the file system interface. Distributed storage systems typically use replication to provide high reliability and Ozone supports the replication model for the same. However replication is expensive in terms of storage space and other resources ( ex: network bandwidth etc). Erasure Coding(EC) is a proven technique to save storage space and throughput requirements. Apache Ozone implemented the EC support. With the EC in place, Apache Ozone can reduce the storage cost by ~50% compared to traditional 3-way replication storages by providing the same level of reliability. In Apache Ozone, the replication unit is a Container. The Container is nothing but a logical batch of data blocks. Here EC uses the same Container abstraction, but uses d data Containers and p parity Containers(d>p) and places them into distinct storage nodes. The actual data block chunks stored into the d data Container blocks in order and the encoded parity chunks stored into the p Container blocks. In this talk we deep dive into the detailed EC architecture which covers the data layout and decoding sections as well. Also we will discuss some of the design challenges we faced and how we solved them.
You are here
The Apache Ozone: A Distributed Object Storage System is with Erasure Coding
With the ongoing work in the CS TWG, the chairs will present the latest updates from the membership of the working group.
- Scott ShadleySolidigm Technology, SNIA
Learn what is happening in NVMe to support Computational Storage devices.
Computational Storage is a new field that is addressing performance and scaling issues for compute with traditional server architectures.
NVMe and SNIA are both working on standards related to Computational Storage. The question that is continually asked is are these efforts are compatible or at odds with each other.
This presentation looks at a computational storage use-case within the Human Cell Atlas genomics research and discovers that the deployed HW CS engine is insufficient and why this is the case.
Computational Storage offers near-data acceleration, and it is gaining popularity with recent commercialization and standardization efforts.
- Changwoo MinVirginia Tech
The exploration of computation near flash storage has been prompted by the advent of network-attached flash-based storage enclosures operating at tens of gigabytes/sec, server memory bandwidths str
- Sean GibbEideticom
- Andrew MaierEideticom
Data center systems power consumption is currently one of the biggest concern and green computing is main industry interest.
- Yangwook KangSamsung Semiconductor, Inc.
Large-scale data analytics, machine learning, and big data applications often require the storage of a massive amount of data.
Computational storage in general can bring unique benefits in increasing the efficiency of CPU utilization in a data processing system.
We examine the benefits of using computational storage devices like Xilinx SmartSSD to offload the compression to achieve an ideal compression scheme where higher compression ratios are achieved wi