(Cassandra NGCC) Apache Cassandra on-disk storage model is based on immutable SSTables that are, depending on the compaction strategy, optimized for some general use cases. The main issue with optimally reading on-disk data is that a single partition data most of the time needs to be fetched from multiple partitions. None of the existing compaction strategies are optimized to provide a single file read guarantees on cold data. Partition based compaction strategy addresses that issue with effectively compacting all data that belongs to a single partition into the same SSTable. In this presentation, we will talk about proposed implementation details for such compaction strategy and a couple of solutions addressing the issues like write amplification and higher disk io requirements compared to default size tiered compaction.
Cassandra: Partition Management Matija Gobec Asias He
September 12, 2019