[database] What is the recommended way to delete a large number of items from DynamoDB?

I'm writing a simple logging service in DynamoDB.

I have a logs table that is keyed by a user_id hash and a timestamp (Unix epoch int) range.

When a user of the service terminates their account, I need to delete all items in the table, regardless of the range value.

What is the recommended way of doing this sort of operation (Keeping in mind there could be millions of items to delete)?

My options, as far as I can see are:

A: Perform a Scan operation, calling delete on each returned item, until no items are left

B: Perform a BatchGet operation, again calling delete on each item until none are left

Both of these look terrible to me as they will take a long time.

What I ideally want to do is call LogTable.DeleteItem(user_id) - Without supplying the range, and have it delete everything for me.

The answer is


The answer of this question depends on the number of items and their size and your budget. Depends on that we have following 3 cases:

1- The number of items and size of items in the table are not very much. then as Steffen Opel said you can Use Query rather than Scan to retrieve all items for user_id and then loop over all returned items and either facilitate DeleteItem or BatchWriteItem. But keep in mind you may burn a lot of throughput capacity here. For example, consider a situation where you need delete 1000 items from a DynamoDB table. Assume that each item is 1 KB in size, resulting in Around 1MB of data. This bulk-deleting task will require a total of 2000 write capacity units for query and delete. To perform this data load within 10 seconds (which is not even considered as fast in some applications), you would need to set the provisioned write throughput of the table to 200 write capacity units. As you can see its doable to use this way if its for less number of items or small size items.

2- We have a lot of items or very large items in the table and we can store them according to the time into different tables. Then as jonathan Said you can just delete the table. this is much better but I don't think it is matched with your case. As you want to delete all of users data no matter what is the time of creation of logs, so in this case you can't delete a particular table. if you wanna have a separate table for each user then I guess if number of users are high then its so expensive and it is not practical for your case.

3- If you have a lot of data and you can't divide your hot and cold data into different tables and you need to do large scale delete frequently then unfortunately DynamoDB is not a good option for you at all. It may become more expensive or very slow(depends on your budget). In these cases I recommend to find another database for your data.


According to the DynamoDB documentation you could just delete the full table.

See below:

"Deleting an entire table is significantly more efficient than removing items one-by-one, which essentially doubles the write throughput as you do as many delete operations as put operations"

If you wish to delete only a subset of your data, then you could make separate tables for each month, year or similar. This way you could remove "last month" and keep the rest of your data intact.

This is how you delete a table in Java using the AWS SDK:

DeleteTableRequest deleteTableRequest = new DeleteTableRequest()
  .withTableName(tableName);
DeleteTableResult result = client.deleteTable(deleteTableRequest);

If you want to delete items after some time, e.g. after a month, just use Time To Live option. It will not count write units.

In your case, I would add ttl when logs expire and leave those after a user is deleted. TTL would make sure logs are removed eventually.

When Time To Live is enabled on a table, a background job checks the TTL attribute of items to see if they are expired.

DynamoDB typically deletes expired items within 48 hours of expiration. The exact duration within which an item truly gets deleted after expiration is specific to the nature of the workload and the size of the table. Items that have expired and not been deleted will still show up in reads, queries, and scans. These items can still be updated and successful updates to change or remove the expiration attribute will be honored.

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html


My approach to delete all rows from a table i DynamoDb is just to pull all rows out from the table, using DynamoDbs ScanAsync and then feed the result list to DynamoDbs AddDeleteItems. Below code in C# works fine for me.

        public async Task DeleteAllReadModelEntitiesInTable()
    {
        List<ReadModelEntity> readModels;

        var conditions = new List<ScanCondition>();
        readModels = await _context.ScanAsync<ReadModelEntity>(conditions).GetRemainingAsync();

        var batchWork = _context.CreateBatchWrite<ReadModelEntity>();
        batchWork.AddDeleteItems(readModels);
        await batchWork.ExecuteAsync();
    }

Note: Deleting the table and then recreating it again from the web console may cause problems if using YAML/CloudFormation to create the table.


We don't have option to truncate dynamo tables. we have to drop the table and create again . DynamoDB Charges are based on ReadCapacityUnits & WriteCapacityUnits . If we delete all items using BatchWriteItem function, it will use WriteCapacityUnits.So better to delete specific records or delete the table and start again .


Thought about using the test to pass in the vars? Something like:

Test input would be something like:

{
  "TABLE_NAME": "MyDevTable",
  "PARTITION_KEY": "REGION",
  "SORT_KEY": "COUNTRY"
}

Adjusted your code to accept the inputs:

const AWS = require('aws-sdk');
const docClient = new AWS.DynamoDB.DocumentClient({ apiVersion: '2012-08-10' });

exports.handler = async (event) => {
    const TABLE_NAME = event.TABLE_NAME;
    const PARTITION_KEY = event.PARTITION_KEY;
    const SORT_KEY = event.SORT_KEY;
    let params = {
        TableName: TABLE_NAME,
    };
    console.log(`keys: ${PARTITION_KEY} ${SORT_KEY}`);

    let items = [];
    let data = await docClient.scan(params).promise();
    items = [...items, ...data.Items];
    
    while (typeof data.LastEvaluatedKey != 'undefined') {
        params.ExclusiveStartKey = data.LastEvaluatedKey;

        data = await docClient.scan(params).promise();
        items = [...items, ...data.Items];
    }

    let leftItems = items.length;
    let group = [];
    let groupNumber = 0;

    console.log('Total items to be deleted', leftItems);

    for (const i of items) {
        // console.log(`item: ${i[PARTITION_KEY] } ${i[SORT_KEY]}`);
        const deleteReq = {DeleteRequest: {Key: {},},};
        deleteReq.DeleteRequest.Key[PARTITION_KEY] = i[PARTITION_KEY];
        deleteReq.DeleteRequest.Key[SORT_KEY] = i[SORT_KEY];

        // console.log(`DeleteRequest: ${JSON.stringify(deleteReq)}`);
        group.push(deleteReq);
        leftItems--;

        if (group.length === 25 || leftItems < 1) {
            groupNumber++;

            console.log(`Batch ${groupNumber} to be deleted.`);

            const params = {
                RequestItems: {
                    [TABLE_NAME]: group,
                },
            };

            await docClient.batchWrite(params).promise();

            console.log(
                `Batch ${groupNumber} processed. Left items: ${leftItems}`
            );

            // reset
            group = [];
        }
    }

    const response = {
        statusCode: 200,
        //  Uncomment below to enable CORS requests
        headers: {
            "Access-Control-Allow-Origin": "*"
        },
        body: JSON.stringify('Hello from Lambda!'),
    };
    return response;
};

Examples related to database

Implement specialization in ER diagram phpMyAdmin - Error > Incorrect format parameter? Authentication plugin 'caching_sha2_password' cannot be loaded Room - Schema export directory is not provided to the annotation processor so we cannot export the schema SQL Query Where Date = Today Minus 7 Days MySQL Error: : 'Access denied for user 'root'@'localhost' SQL Server date format yyyymmdd How to create a foreign key in phpmyadmin WooCommerce: Finding the products in database TypeError: tuple indices must be integers, not str

Examples related to nosql

Firestore Getting documents id from collection What is Hash and Range Primary Key? Mongodb: Failed to connect to 127.0.0.1:27017, reason: errno:10061 Explanation of JSONB introduced by PostgreSQL DynamoDB vs MongoDB NoSQL Querying DynamoDB by date Delete all nodes and relationships in neo4j 1.8 When to use CouchDB over MongoDB and vice versa Difference between scaling horizontally and vertically for databases NoSQL Use Case Scenarios or WHEN to use NoSQL

Examples related to amazon-web-services

How to specify credentials when connecting to boto3 S3? Is there a way to list all resources in AWS Access denied; you need (at least one of) the SUPER privilege(s) for this operation Job for mysqld.service failed See "systemctl status mysqld.service" What is difference between Lightsail and EC2? AWS S3 CLI - Could not connect to the endpoint URL boto3 client NoRegionError: You must specify a region error only sometimes How to write a file or data to an S3 object using boto3 Missing Authentication Token while accessing API Gateway? The AWS Access Key Id does not exist in our records

Examples related to cloud

Get Public URL for File - Google Cloud Storage - App Engine (Python) What does ECU units, CPU core and memory mean when I launch a instance What is SaaS, PaaS and IaaS? With examples how to use free cloud database with android app? What is the difference between Cloud, Grid and Cluster? What is the recommended way to delete a large number of items from DynamoDB? Which programming language for cloud computing? What is the difference between Cloud Computing and Grid Computing?

Examples related to amazon-dynamodb

What is Hash and Range Primary Key? How to get item count from DynamoDB? Hive ParseException - cannot recognize input near 'end' 'string' DynamoDB vs MongoDB NoSQL Querying DynamoDB by date How can I fetch all items from a DynamoDB table without specifying the primary key? Is it possible to ORDER results with query or scan in DynamoDB? What is the recommended way to delete a large number of items from DynamoDB?