Debugging of Flash Issues Observed in Hyperscale Environment at Scale

Abstract

A deep dive of the methodology and tooling that we use at Meta, to improve debuggability of failures in the datacenters, especially for failures on components like SSDs where privacy requirements might prohibit us from sending the components back for FA or add custom instrumentations in our datacenter. In particular, we will talk about how the tool tracewatch coupled with Latency Monitoring log page helps us trigger trace collection on failures using BPF based triggers. We will present the retrace tool which can then be used to analyze the captures in a variety of format, convert between the different formats and filter down to the stack of a single I/O from application layer down to the drive. We will present dialog, our collection mechanism for file system based logging, the sanitization process, etc. Finally we will talk about ways in which we’re collaborating with the industry to design efficient logging built into flash drives.

Download Presentation

Vineet Parekh

Meta

Venkat Ramesh
Meta

Related Sessions

Storage Architecture

The Challenges in Creating a Clustered Software Defined Fileserver from Scratch on HCI

This presentation will outline the trials and tribulations encountered over the last 5 years creating the first software defined Fileserver on our hyperconverged platform.

Dan Chilton

Nutanix

Will Strickland
Nutanix

Favorites

Storage Architecture

24G SAS Advancements for Hyperscale Environments

Capacity requirements and power consumption are becoming increasingly challenging for hyperscale deployments.

Rick Kutcipal

SCSI Trade Association

Favorites

Storage Architecture

Building Flash-aware Applications with the Software-Enabled Flash™ SDK

Software-Enabled Flash™ (SEF) technology, a vendor-neutral Linux Foundation open source project, fundamentally redefines how flash memory is used for cloud and enterprise applications, providing st

Rory Bolt

KIOXIA America

Favorites

Storage Architecture

Next Generation Architecture for Scale-out Block Storage

We are in the midst of a major technology shift as Storage Devices and Networks are outpacing general purpose Compute.

Jaspal Kohli

Fungible

Favorites

Storage Architecture

An approach for impact analysis of flash behavior on QoS in DC/Enterprise SSDs

In consumer and enterprise world, SSD Performance is the main quality constraint. SSD performance parameters are classified in terms of lOPS, Throughput, latency and Quality of service(QoS).

Yogesh Khurana

Samsung Electronics (SSIR)

Ravishankar Singh
Samsung Electronics (SSIR)

Favorites

Storage Architecture

SNIA SDXI Internals and its Journey Towards Standardizing Memory to Memory Data Movement

Software memory copies have been the gold standard for applications performing memory data movement or operations in system memory.

William Moyes

Advanced Micro Devices

Favorites

Storage Architecture

SODA Architecture for Data and Storage Management

SODA Foundation is an open source project aimed to foster an ecosystem of open source data management and storage software for data autonomy.

Lawrence Lai

Futurewei Technologies, Inc.

Rakesh Jain
IBM

Favorites

Storage Architecture

Building an Object based STaaS solution with Poseidon Storage

Samsung recently contributed Poseidon project, which is an OCP-based industry collaboration between component vendor (Samsung), system vendor(Inspur) and data center.

Swati Chawdhary

Samsung

Favorites

Storage Architecture

Why KV SSD will replace ZNS

In this presentation we will discuss why KV SSD is the ultimate storage solution, and why it will replace ZNS.

Andy Tomlin

QiStor

Favorites

Storage Architecture

The Path to Autonomous Storage is Broken

Today, storage and memory hierarchies are manually tuned and sized at design time. But tomorrow’s workloads are increasingly dynamic, multi-tenant and variable.

Eric Wright

Magnition

Favorites

Storage Architecture

Accelerating operations on persistent memory device via hardware based memory offloading technique

With more and more fast devices (especially persistent memory, aka.

Ziye Yang

Intel

Favorites

Storage Architecture

Improving flash storage on Android phones

"There has been tremendous growth in the use of smartphones. Today, there are more than 130 million Android users in the world. Android smartphones leverage flash storage.

Tejas Chopra

Netflix, Inc.

Favorites

Storage Architecture

A New Adapter for Zoned Namespace SSD

Introduce the characteristics of ZNS SSDs and current Linux SW ecosystem for ZNS SSDs. Describe an adapter xZTL that enables the host to access ZNS SSDs easily.

Hui Qi

Samsung R&D Institute China Xi'an (SRCX)

Bhanu Gollapudi
Samsung Electronics

Favorites

Main menu

You are here

Debugging of Flash Issues Observed in Hyperscale Environment at Scale