Fabric Attached Memory – Hardware and Software Architecture

Abstract

HPC architectures increasingly handle workloads where the working data set cannot be easily partitioned or is too large to fit into node local memory. We have defined a system architecture and a software stack to enable large data sets to be held in fabric-attached memory (FAM) that is accessible to all compute nodes across a Slingshot-connected HPC cluster, thus providing a new approach to handling large data sets. Emerging AI and data analytics workloads are increasingly becoming important for HPC architectures because HPC clusters provide computation capabilities needed at scale; however a divide still exists between traditional HPC, AI, and data analytics applications, because the three communities use very different programming models. The architecture leverages emerging hardware capabilities such as CXL along with ideas from both HPC and high performance data analytics software to support AI and data analytics on HPC clusters. This presentation will cover the architecture, the software stack and its value using a use case: an Arkouda-based proxy application for real-time data analytics.

David Emberson
Hewlett Packard Enterprise
Related Sessions