Smart Storage Management: Towards Higher HDFS Storage Efficiency – Wei Zhou
All kinds of data volume increases dramatically in recent years, new storage devices (NVMe SSD, flash SSD, etc.) can be utilized to improve data access performance. HDFS provides methodologies like HDFS Cache, Heterogeneous Storage Management (HSM) and Erasure Coding (EC) to provide such support, but it remains a big challenge to define and adjust different storage strategies for different data in a dynamic environment.
To overcome the challenge and improve the storage efficiency of HDFS, we will introduce a comprehensive solution, aka Smart Storage Management (SSM) in Apache Hadoop. HDFS operation data and system state information are collected from the cluster, based on the metrics collected SSM can extract some äóìdata access patternsäó and based on these patterns SSM will automatically make sophisticated usage of these methodologies to optimize HDFS storage efficiency.