The voice of The Apache Software Foundation

Cassandra: Partition Management Matija Gobec Asias He

September 12, 2019

(Cassandra NGCC) Apache Cassandra on-disk storage model is based on immutable SSTables that are, depending on the compaction strategy, optimized for some general use cases. The main issue with optimally reading on-disk data is that a single partition data most of the time needs to be fetched from multiple partitions. None of the existing compaction strategies are optimized to provide a single file read guarantees on cold data. Partition based compaction strategy addresses that issue with effectively compacting all data that belongs to a single partition into the same SSTable. In this presentation, we will talk about proposed implementation details for such compaction strategy and a couple of solutions addressing the issues like write amplification and higher disk io requirements compared to default size tiered compaction.

Leave a Reply

Required fields are marked *.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.
%d bloggers like this: