[database] Difference between partition key, composite key and clustering key in Cassandra?

The primary key in Cassandra usually consists of two parts - Partition key and Clustering columns.

primary_key((partition_key), clustering_col )

Partition key - The first part of the primary key. The main aim of a partition key is to identify the node which stores the particular row.

CREATE TABLE phone_book ( phone_num int, name text, age int, city text, PRIMARY KEY ((phone_num, name), age);

Here, (phone_num, name) is the partition key. While inserting the data, the hash value of the partition key is generated and this value decides which node the row should go into.

Consider a 4 node cluster, each node has a range of hash values it can store. (Write) INSERT INTO phone_book VALUES (7826573732, ‘Joey’, 25, ‘New York’);

Now, the hash value of the partition key is calculated by Cassandra partitioner. say, hash value(7826573732, ‘Joey’) ? 12 , now, this row will be inserted in Node C.

(Read) SELECT * FROM phone_book WHERE phone_num=7826573732 and name=’Joey’;

Now, again the hash value of the partition key (7826573732,’Joey’) is calculated, which is 12 in our case which resides in Node C, from which the read is done.

  1. Clustering columns - Second part of the primary key. The main purpose of having clustering columns is to store the data in a sorted order. By default, the order is ascending.

There can be more than one partition key and clustering columns in a primary key depending on the query you are solving.

primary_key((pk1, pk2), col 1,col2)