For the indexes in Amazon’s DynamoDB service, we can fit multiple indexes in the same attribute, as long as we split them by namespace. We can do this via use of the BEGIN_WITH operator on the sort key. We can further extends this to namespace ranges with the BETWEEN operator.
An index is comprised of a hash_key
/ partition key
, which can only do equals, and a range_key
which can do operators =, >, <, <=, >=, betweens or
begins_with. A key can be a string, a binary base 64 or a number.
Secondary Indexes are sparse meaning if the DynamoDB item doesn’t have the attributes associated with the index, then it won’t appear on the index.
We are restricted to a maximum of 5 Local Secondary Indexes (LSIs), that must be created when the table is. These share the same partition key, but have different attributes for the range key. They can only be created with the DynamoDB table, not while it exists. They allow retrieval of attributes on the item that are not are on the index.
Global Secondary Indexes on the other hand can be created at any time, but are copies of the table, increasing read/write/storage costs. AWS allows an initial limit of 20 global secondary indexes per table, which can increased on request. Projected attributes are restricted to those configured when creating the index.
Using begins with on a range_key
is very powerful as it can be used to query hierarchical data, with the most specific data at the end of the key.
For example - addresses
London:Westminster:Downing St:10
London:Westminster:Buckingham Palace
We can do a range key of BEGINS_WITH London:
to get all items with London addresses
London:Westminster:Downing St:10
London::Westminster:Buckingham Palace
Or in Westminister - BEGINS_WITH London:Westminister:
London:Westminster:Downing St:10
London:Westminster:Buckingham Palace
Or just in Downing Street BEGINS_WITH London:Westminister:Downing St:
London:Westminster:Buckingham Palace
We can use this BEGINS_WITH
property to store mutually exclusive data in the same index.
For example, say we wanted to store both dog and cat attributes in our database. Dogs can follow many commands, whereas cats ignore commands. On the other hand cats climb trees while dogs don’t. We can thus store these together in the same attribute, and call it something like index1
.
We could prefix the attribute with the appropriate names:
dog:command:roll over
dog:command:sit
dog:command:beg
cat:treeclimbed:spruce
cat:treeclimbed:elm
cat:treeclimbed:oak
Thus we can query on the same index with BEGINS_WITH dog:command:
to get:
dog:command:roll over
dog:command:sit
dog:command::beg
Or BEGINS_WITH cat:treeclimbed:
to get
cat:treeclimbed:spruce
cat:treeclimbed:elm
cat:treeclimbed:oak
However this all very useful for hierarchies, but what if we wanted numbers and ranges in our index? What if we wanted to also track parrots in the same index, counting how many different words, and query all those greater than 5?
The BETWEEN
operator allows selection of a higher and lower values in a range and returns exclusive values in that range. It works on numbers, and most importantly, strings. If our data has a maximum and minimum value we can store it in index1
.
If we take our parrots values, prefix them with parrot:words:
and pad the zeros up to say a 999999, we can do:
parrot:words:000003
parrot:words:000101
parrot:words:000201
Now if we want query the index and get all parrots whose words are between 2 and 3, we can do
BETWEEN parrot:words:000002 AND parrot:words:000005
to get
parrot:words:000003
parrot:words:000101
parrot:words:000201
dog:command:roll over
If we wanted to do greater than 5 we could do BETWEEN parrot:words:000005 AND parrot:words:999999
parrot:words:000003
parrot:words:000101
parrot:words:000201
dog:command:roll over
If we wanted to do less than 5 we could do BETWEEN parrot:words:000000 AND parrot:words:000004
parrot:words:000003
parrot:words:000101
parrot:words:000201
dog:command:roll over
This approach can also be extended to dates in UTC in the ISO8601 format.
There is no limit to how many indexes you can fit into the same table, as long as the below rules are followed:
Having to iterate between each namespace as a separate query – e.g. getting all
animals birthdates would need a query for each type of animal, assuming the index is namespaced.
If multiple things are stuffed in the same attribute e.g. both number of legs and number of whiskers: e.g. animal:legs:4:whiskers:8
To get all animals with whiskers 8 would need to run all number of legs
as well.