Querying DynamoDB by date

Question

I m coming from a relational database background and trying to work with amazon s DynamoDB  I have a table with a hash key  DataID  and a range  CreatedAt  and a bunch of items in it   I m trying to get all the items that were created after a specific date and sorted by date  Which is pretty straightforward in a relational database   In DynamoDB the closest thing i could find is a query and using the range key greater than filter  The only issue is that to perform a query i need a hash key which defeats the purpose   So what am I doing wrong  Is my table schema wrong  shouldn t the hash key be unique  or is there another way to query

User · Answer

You could make the Hash key something along the lines of a  product category  id  then the range key as a combination of a timestamp with a unique id appended on the end  That way you know the hash key and can still query the date with greater than

User · Answer

Updated Answer   DynamoDB allows for specification of secondary indexes to aid in this sort of query   Secondary indexes can either be global  meaning that the index spans the whole table across hash keys  or local meaning that the index would exist within each hash key partition  thus requiring the hash key to also be specified when making the query   For the use case in this question  you would want to use a global secondary index on the  CreatedAt  field   For more on DynamoDB secondary indexes see the secondary index documentation  Original Answer   DynamoDB does not allow indexed lookups on the range key only   The hash key is required such that the service knows which partition to look in to find the data   You can of course perform a scan operation to filter by the date value  however this would require a full table scan  so it is not ideal   If you need to perform an indexed lookup of records by time across multiple primary keys  DynamoDB might not be the ideal service for you to use  or you might need to utilize a separate table  either in DynamoDB or a relational store  to store item metadata that you can perform an indexed lookup against

User · Answer

Updated Answer There is no convenient way to do this using Dynamo DB Queries with predictable throughput  One  sub optimal  option is to use a GSI with an artificial HashKey  amp  CreatedAt  Then query by HashKey alone and mention ScanIndexForward to order the results  If you can come up with a natural HashKey  say the category of the item etc  then this method is a winner  On the other hand  if you keep the same HashKey for all items  then it will affect the throughput mostly when when your data set grows beyond 10GB  one partition   Original Answer  You can do this now in DynamoDB by using GSI  Make the  CreatedAt  field as a GSI and issue queries like  GT some date   Store the date as a number  msecs since epoch  for this kind of queries   Details are available here  Global Secondary Indexes - Amazon DynamoDB   http   docs aws amazon com amazondynamodb latest developerguide GSI html GSI Using  This is a very powerful feature  Be aware that the query is limited to  EQ   LE   LT   GE   GT   BEGINS WITH   BETWEEN   Condition - Amazon DynamoDB   http   docs aws amazon com amazondynamodb latest APIReference API Condition html

User · Answer

Your Hash key  primary of sort  has to be unique  unless you have a range like stated by others    In your case  to query your table you should have a secondary index      ID    DataID   Created   Data    ------ -------- --------- ------    hash   xxxxx    1234567   blah     Your Hash Key is ID Your secondary index is defined as  DataID-Created-index  that s the name that DynamoDB will use   Then  you can make a query like this   var params         TableName   Table       IndexName   DataID-Created-index       KeyConditionExpression   DataID    v ID AND Created  gt   v created       ExpressionAttributeValues     v ID    S   some id                                      v created    N   timestamp              ProjectionExpression   ID  DataID  Created  Data      ddb query params  function err  data        if  err           console log err       else           data Items sort function a  b                return parseFloat a Created N  - parseFloat b Created N                          More code here             Essentially your query looks like    SELECT   FROM TABLE WHERE DataID    some id  AND Created  gt  timestamp    The secondary Index will increase the read write capacity units required so you need to consider that  It still is a lot better than doing a scan  which will be costly in reads and in time  and is limited to 100 items I believe    This may not be the best way of doing it but for someone used to RD  I m also used to SQL  it s the fastest way to get productive  Since there is no constraints in regards to schema  you can whip up something that works and once you have the bandwidth to work on the most efficient way  you can change things around

User · Answer

Approach I followed to solve this problem is by created a Global Secondary Index as below  Not sure if this is the best approach but hopefully if it is useful to someone   Hash Key                   Range Key ------------------------------------ Date value of CreatedAt    CreatedAt   Limitation imposed on the HTTP API user to specify the number of days to retrieve data  defaulted to 24 hr   This way  I can always specify the HashKey as Current date s day and RangeKey can use   and  lt  operators while retrieving  This way the data is also spread across multiple shards

User · Answer

You can have multiple identical hash keys  but only if you have a range key that varies   Think of it like file formats  you can have 2 files with the same name in the same folder as long as their format is different   If their format is the same  their name must be different   The same concept applies to DynamoDB s hash range keys  just think of the hash as the name and the range as the format   Also  I don t recall if they had these at the time of the OP  I don t believe they did   but they now offer Local Secondary Indexes   My understanding of these is that it should now allow you to perform the desired queries without having to do a full scan   The downside is that these indexes have to be specified at table creation  and also  I believe  cannot be blank when creating an item   In addition  they require additional throughput  though typically not as much as a scan  and storage  so it s not a perfect solution  but a viable alternative  for some   I do still recommend Mike Brant s answer as the preferred method of using DynamoDB  though  and use that method myself   In my case  I just have a central table with only a hash key as my ID  then secondary tables that have a hash and range that can be queried  then the item points the code to the central table s  item of interest   directly   Additional data regarding the secondary indexes can be found in Amazon s DynamoDB documentation here for those interested   Anyway  hopefully this will help anyone else that happens upon this thread

User · Answer

Given your current table structure this is not currently possible in DynamoDB  The huge challenge is to understand that the Hash key of the table  partition  should be treated as creating separate tables   In some ways this is really powerful  think of partition keys as creating a new table for each user or customer  etc       Queries can only be done in a single partition  That s really the end of the story   This means if you want to query by date  you ll want to use msec since epoch   then all the items you want to retrieve in a single query must have the same Hash  partition key    I should qualify this   You absolutely can scan by the criterion you are looking for  that s no problem  but that means you will be looking at every single row in your table  and then checking if that row has a date that matches your parameters   This is really expensive  especially if you are in the business of storing events by date in the first place  i e  you have a lot of rows    You may be tempted to put all the data in a single partition to solve the problem  and you absolutely can  however your throughput will be painfully low  given that each partition only receives a fraction of the total set amount   The best thing to do is determine more useful partitions to create to save the data    Do you really need to look at all the rows  or is it only the rows by a specific user  Would it be okay to first narrow down the list by Month  and do multiple queries  one for each month   Or by Year  If you are doing time series analysis there are a couple of options  change the partition key to something computated on PUT to make the query easier  or use another aws product like kinesis which lends itself to append-only logging

[amazon-web-services] Querying DynamoDB by date

Examples related to amazon-web-services

Examples related to nosql

Examples related to amazon-dynamodb