DynamoDB Indexing Examples
Here are detailed examples of DynamoDB indexing, including Global Secondary Indexes (GSIs) and Local Secondary Indexes (LSIs), with explanations.
Example 1: E-commerce Product Catalog
Table: Products
Primary Key: ProductID
(Partition Key), SKU
(Sort Key)
Attributes: Name
, Category
, Price
, Brand
, Color
, Size
Scenario
We want to efficiently query products by category and brand. While we could scan the entire table and filter, this becomes very inefficient as the table grows. Instead, we create a Global Secondary Index (GSI).
GSI Definition
{
"IndexName": "CategoryBrandIndex",
"KeySchema": [
{ "AttributeName": "Category", "KeyType": "HASH" }, // Partition key of the GSI
{ "AttributeName": "Brand", "KeyType": "RANGE" } // Sort key of the GSI
],
"Projection": {
"ProjectionType": "INCLUDE",
"NonKeyAttributes": ["Name", "Price", "Color", "Size"] // Attributes projected into the GSI
},
"ProvisionedThroughput": { // Important for write-heavy applications
"ReadCapacityUnits": 5,
"WriteCapacityUnits": 10
}
}
Explanation
CategoryBrandIndex
**: The name of the GSI.Category
(HASH): The partition key for the GSI. Items with the same category will be stored together.Brand
(RANGE): The sort key for the GSI. Within a category, items will be sorted by brand.Projection
**: Defines which attributes are copied from the base table into the GSI.INCLUDE
**: Specifies that only the listedNonKeyAttributes
in addition to the primary key of the base table and the key attributes of the GSI are projected.NonKeyAttributes
**: The attributes we want to retrieve when querying the GSI. We includeName
,Price
,Color
, andSize
because we often want to display these when listing products.
ProvisionedThroughput
**: GSIs have their own read/write capacity. You need to specify this, or use on-demand capacity mode.
Querying the GSI
Now, we can efficiently query for all “Electronics” products of the “Sony” brand:
// Example AWS SDK code (Conceptual)
dynamoDB.query({
TableName: "Products",
IndexName: "CategoryBrandIndex",
KeyConditionExpression: "Category = :category AND Brand = :brand",
ExpressionAttributeValues: {
":category": "Electronics",
":brand": "Sony"
},
ProjectionExpression: "Name, Price, Color, Size" // Redundant, but good practice to be explicit.
});
This query will be much faster than scanning the entire Products
table.
Code Example
import boto3
# Create a DynamoDB client
dynamodb = boto3.client('dynamodb')
table_name = 'Products'
index_name = 'CategoryBrandIndex'
# Query the GSI
response = dynamodb.query(
TableName=table_name,
IndexName=index_name,
KeyConditionExpression='Category = :category AND Brand = :brand',
ExpressionAttributeValues={
':category': {'S': 'Electronics'},
':brand': {'S': 'Sony'}
},
ProjectionExpression='Name, Price, Color, Size'
)
items = response['Items']
for item in items:
print(f"Name: {item['Name']['S']}, Price: {item['Price']['N']}, Color: {item['Color']['S']}, Size: {item['Size']['S']}")
Example 2: Game Scores with Local Secondary Index
Table: GameScores
Primary Key: UserID
(Partition Key), Timestamp
(Sort Key)
Attributes: GameID
, Score
, Level
, Character
Scenario
We want to retrieve a user’s game scores, sorted by level. Because we want to sort by level within each user, a Local Secondary Index (LSI) is appropriate.
Important: LSIs must be created when the table is created.
Table and LSI Definition
{
"TableName": "GameScores",
"KeySchema": [
{ "AttributeName": "UserID", "KeyType": "HASH" }, // Partition key of the table
{ "AttributeName": "Timestamp", "KeyType": "RANGE" } // Sort key of the table
],
"AttributeDefinitions": [
{ "AttributeName": "UserID", "AttributeType": "S" },
{ "AttributeName": "Timestamp", "AttributeType": "N" },
{ "AttributeName": "Level", "AttributeType": "N" } // Attribute for the LSI
],
"ProvisionedThroughput": {
"ReadCapacityUnits": 5,
"WriteCapacityUnits": 5
},
"LocalSecondaryIndexes": [
{
"IndexName": "LevelIndex",
"KeySchema": [
{ "AttributeName": "UserID", "KeyType": "HASH" }, // Must be the same as the table's partition key
{ "AttributeName": "Level", "KeyType": "RANGE" } // Sort key for the LSI
],
"Projection": {
"ProjectionType": "ALL" // Project all attributes from the base table into the LSI
}
}
]
}
Explanation
LevelIndex
**: The name of the LSI.UserID
(HASH): The partition key for the LSI must be the same as the base table’s partition key.Level
(RANGE): The sort key for the LSI. Scores for a given user will be sorted by level.Projection
**:ALL
**: All attributes from the base table are projected into the LSI. This provides maximum flexibility but consumes more storage.
Querying the LSI
To get a user’s scores sorted by level:
// Example AWS SDK code (Conceptual)
dynamoDB.query({
TableName: "GameScores",
IndexName: "LevelIndex",
KeyConditionExpression: "UserID = :user_id",
ExpressionAttributeValues: {
":user_id": "user123"
},
ScanIndexForward: true // Sort ascending by level
});
Code Example
import boto3
# Create a DynamoDB client
dynamodb = boto3.client('dynamodb')
table_name = 'GameScores'
index_name = 'LevelIndex'
user_id = 'user123'
# Query the LSI
response = dynamodb.query(
TableName=table_name,
IndexName=index_name,
KeyConditionExpression='UserID = :user_id',
ExpressionAttributeValues={
':user_id': {'S': user_id}
},
ScanIndexForward=True # Sort ascending by level
)
items = response['Items']
for item in items:
print(f"UserID: {item['UserID']['S']}, Timestamp: {item['Timestamp']['N']}, Level: {item['Level']['N']}, Score: {item['Score']['N']}")
Example 3: Using a GSI for a Different Access Pattern
Table: Orders
Primary Key: OrderID
(Partition Key)
Attributes: CustomerID
, OrderDate
, OrderStatus
, TotalAmount
Scenario
We want to be able to query orders by CustomerID
to see a user’s order history. Since the primary key is OrderID
, we create a GSI.
GSI Definition
{
"IndexName": "CustomerOrderIndex",
"KeySchema": [
{ "AttributeName": "CustomerID", "KeyType": "HASH" },
{ "AttributeName": "OrderDate", "KeyType": "RANGE" } // Sort by order date
],
"Projection": {
"ProjectionType": "KEYS_ONLY" // Efficient, only project the key attributes.
}
}
Explanation
CustomerOrderIndex
**: Name of the GSI.CustomerID
(HASH): Partition key for the GSI.OrderDate
(RANGE): Sort key for the GSI.Projection
**:KEYS_ONLY
**: Only the key attributes of the table and the index are projected. In this case,OrderID
(from the table),CustomerID
, andOrderDate
are projected. This is the most efficient projection if you only need to identify the matching items, and can then fetch the full item from the table if needed.
Querying the GSI
To get all orders for a specific customer, sorted by order date:
// Example AWS SDK Code (Conceptual)
dynamoDB.query({
TableName: "Orders",
IndexName: "CustomerOrderIndex",
KeyConditionExpression: "CustomerID = :customer_id",
ExpressionAttributeValues: {
":customer_id": "cust123"
},
ScanIndexForward: false // Sort descending to get most recent orders first
});
Code Example
import boto3
from datetime import datetime
# Create a DynamoDB client
dynamodb = boto3.client('dynamodb')
table_name = 'Orders'
index_name = 'CustomerOrderIndex'
customer_id = 'cust123'
# Query the GSI
response = dynamodb.query(
TableName=table_name,
IndexName=index_name,
KeyConditionExpression='CustomerID = :customer_id',
ExpressionAttributeValues={
':customer_id': {'S': customer_id}
},
ScanIndexForward=False # Sort descending to get most recent orders first
)
items = response['Items']
for item in items:
order_id = item.get('OrderID', {'S': 'N/A'})['S']
order_date_str = item.get('OrderDate', {'S': 'N/A'})['S']
# Convert OrderDate string to datetime object (assuming YYYY-MM-DD format)
try:
order_date = datetime.strptime(order_date_str, '%Y-%m-%d').date()
except ValueError:
order_date = 'Invalid Date'
print(f"OrderID: {order_id}, OrderDate: {order_date}")
Choosing Between LSI and GSI
- Local Secondary Index (LSI)
- The partition key of the LSI is the same as the base table’s partition key.
- The sort key of the LSI is different from the base table’s sort key.
- LSIs *must* be created when the table is created.
- LSIs are stored in the same partition as the base table, which allows for strongly consistent reads.
- LSIs have a size limit of 10 GB per partition key value.
- LSIs are generally used when you need an alternate sort order for a table’s data within each partition.
- Global Secondary Index (GSI)
- The partition key and sort key of the GSI can be different from the base table’s primary key.
- GSIs can be created or deleted at any time.
- GSIs are stored in different partitions from the base table.
- Reads from GSIs are eventually consistent (strongly consistent reads are not supported).
- GSIs do not have the 10 GB size limit per partition key value.
- GSIs are the most flexible type of index and are used to support query patterns that differ from the table’s primary key.
More Real-Life Examples
Here are some additional real-world scenarios where DynamoDB indexing is crucial:
1. Social Media Connections
- Table:
Connections
- Primary Key:
UserID1
,UserID2
(Composite Key) - Attributes:
ConnectionDate
,Status
- Scenario: You want to efficiently find all connections for a given user.
- GSI: Create a GSI with
UserID1
as the partition key andConnectionDate
as the sort key. This allows you to quickly retrieve a user’s connections, ordered by the date they were established. - Query: You can then query the GSI using the
UserID1
to get all connections for that user.
Code Example
import boto3
from datetime import datetime
# Create a DynamoDB client
dynamodb = boto3.client('dynamodb')
table_name = 'Connections'
index_name = 'User1ConnectionDateIndex'
# Assuming you have a GSI named 'User1ConnectionDateIndex'
try:
response = dynamodb.query(
TableName=table_name,
IndexName=index_name,
KeyConditionExpression='UserID1 = :user_id',
ExpressionAttributeValues={
':user_id': {'S': 'user123'} # Replace with the user ID you want to query
},
ScanIndexForward=True # Sort ascending, False for descending
)
connections = response['Items']
if connections:
print(f"Connections for UserID1: user123")
for connection in connections:
user_id2 = connection['UserID2']['S']
connection_date_str = connection['ConnectionDate']['S']
status = connection['Status']['S']
try:
connection_date = datetime.strptime(connection_date_str, '%Y-%m-%d').date()
except ValueError:
connection_date = 'Invalid Date'
print(f" UserID2: {user_id2}, ConnectionDate: {connection_date}, Status: {status}")
else:
print("No connections found for the specified user.")
except dynamodb.exceptions.ResourceNotFoundException:
print(f"Error: Table '{table_name}' or index '{index_name}' not found.")
except Exception as e:
print(f"An error occurred: {e}")
2. E-commerce Order History
- Table:
Orders
- Primary Key:
OrderID
- Attributes:
CustomerID
,OrderDate
,OrderStatus
,TotalAmount
,ShippingAddress
- Scenario: You want to retrieve all orders for a specific customer, sorted by order date.
- GSI: Create a GSI with
CustomerID
as the partition key andOrderDate
as the sort key. - Query: You can then query the GSI using the
CustomerID
to get the order history for that customer, sorted by date.
Code Example
import boto3
from datetime import datetime
# Create a DynamoDB client
dynamodb = boto3.client('dynamodb')
table_name = 'Orders'
index_name = 'CustomerOrderDateIndex'
def get_order_history(customer_id):
"""
Retrieves the order history for a given customer, sorted by order date.
Args:
customer_id (str): The ID of the customer.
Returns:
list: A list of order dictionaries, sorted by order date (most recent first).
Returns an empty list if no orders are found or an error occurs.
"""
try:
response = dynamodb.query(
TableName=table_name,
IndexName=index_name,
KeyConditionExpression='CustomerID = :cust_id',
ExpressionAttributeValues={
':cust_id': {'S': customer_id}
},
ScanIndexForward=False # Sort descending (most recent first)
)
orders = response.get('Items', []) # Ensure we get an empty list if no items
# Convert date strings to datetime objects for easier handling if needed
for order in orders:
order_date_str = order.get('OrderDate', {'S': 'N/A'})['S'] #handle missing
try:
order['OrderDate'] = datetime.strptime(order_date_str, '%Y-%m-%d').date()
except ValueError:
order['OrderDate'] = 'Invalid Date' # Handle invalid date format
return orders
except dynamodb.exceptions.ResourceNotFoundException:
print(f"Error: Table '{table_name}' or index '{index_name}' not found.")
return [] # Return empty list on table/index error
except Exception as e:
print(f"An error occurred: {e}")
return [] # Return empty list on other errors
if __name__ == "__main__":
customer_id_to_query = 'cust123' # Replace with the desired customer ID
order_history = get_order_history(customer_id_to_query)
if order_history:
print(f"Order history for Customer ID: {customer_id_to_query}")
for order in order_history:
print(f" OrderID: {order.get('OrderID', {'S': 'N/A'})['S']}, OrderDate: {order['OrderDate']}, OrderStatus: {order.get('OrderStatus', {'S': 'N/A'})['S']}, TotalAmount: {order.get('TotalAmount', {'N': 'N/A'})['N'] }, ShippingAddress: {order.get('ShippingAddress', {'S': 'N/A'})['S']}")
else:
print(f"No orders found for Customer ID: {customer_id_to_query}")
3. IoT Device Data
- Table:
DeviceData
- Primary Key:
DeviceID
,Timestamp
- Attributes:
Temperature
,Humidity
,Pressure
,Location
- Scenario: You want to query device data for a specific time range for a given device.
- LSI: You could use an LSI with the same partition key
DeviceID
andTimestamp
as the sort key. - Query: You can then query the LSI using the
DeviceID
and a timestamp range to get the device data within that time frame.
Code Example
import boto3
from datetime import datetime
# Create a DynamoDB client
dynamodb = boto3.client('dynamodb')
table_name = 'DeviceData'
index_name = 'DeviceIDTimestampIndex' # LSI name
device_id = 'device001'
def query_device_data(device_id, start_time, end_time):
"""Queries device data for a specific device within a timestamp range using an LSI.
Args:
device_id (str): The ID of the device.
start_time (datetime): The start time for the query.
end_time (datetime): The end time for the query.
Returns:
list: A list of device data items within the specified time range.
"""
try:
response = dynamodb.query(
TableName=table_name,
IndexName=index_name,
KeyConditionExpression='DeviceID = :device_id AND #ts BETWEEN :start_ts AND :end_ts',
ExpressionAttributeValues={
':device_id': {'S': device_id},
':start_ts': {'N': str(start_time.timestamp())},
':end_ts': {'N': str(end_time.timestamp())}
},
ExpressionAttributeNames={ # need to define a placeholder for reserved keyword Timestamp
'#ts': 'Timestamp'
}
)
return response.get('Items', [])
except dynamodb.exceptions.ResourceNotFoundException:
print(f"Error: Table '{table_name}' or index '{index_name}' not found.")
return []
except Exception as e:
print(f"An error occurred: {e}")
return []
if __name__ == "__main__":
# Define the time range
start_time = datetime(2023, 1, 1, 0, 0, 0)
end_time = datetime(2023, 1, 1, 12, 0, 0)
# Query the device data
device_data = query_device_data(device_id, start_time, end_time)
if device_data:
print(f"Device Data for Device ID: {device_id} between {start_time} and {end_time}:")
for item in device_data:
timestamp = datetime.fromtimestamp(float(item['Timestamp']['N']))
temperature = item.get('Temperature', {'N': 'N/A'})['N']
humidity = item.get('Humidity', {'N': 'N/A'})['N']
pressure = item.get('Pressure', {'N': 'N/A'})['N']
location = item.get('Location', {'S': 'N/A'})['S']
print(f" Timestamp: {timestamp}, Temperature: {temperature}, Humidity: {humidity}, Pressure: {pressure}, Location: {location}")
else:
print(f"No data found for Device ID: {device_id} between {start_time} and {end_time}")
```
4. Content Management System
- Table:
Content
- Primary Key:
ContentID
- Attributes:
Title
,Type
,PublishDate
,Author
,Status
- Scenario: You want to find all content of a specific type (e.g., “article”, “blog post”) ordered by publication date.
- GSI: Create a GSI with
Type
as the partition key andPublishDate
as the sort key. - Query: A query on this GSI with
Type = "article"
will return all articles, sorted by publication date.
Code Example
import boto3
from datetime import datetime
# Create a DynamoDBclient
dynamodb = boto3.client('dynamodb')
table_name = 'Content'
index_name = 'TypePublishDateIndex'
def get_content_by_type(content_type):
"""
Retrieves content of a specific type, ordered by publication date.
Args:
content_type (str): The type of content to retrieve (e.g., "article", "blog post").
Returns:
list: A list of content items of the specified type, sorted by publication date.
"""
try:
response = dynamodb.query(
TableName=table_name,
IndexName=index_name,
KeyConditionExpression='ContentType = :content_type',
ExpressionAttributeValues={
':content_type': {'S': content_type}
},
ScanIndexForward=False # Sort descending (most recent first)
)
content_items = response.get('Items', [])
# Convert date strings to datetime objects
for item in content_items:
publish_date_str = item.get('PublishDate', {'S': 'N/A'})['S']
try:
item['PublishDate'] = datetime.strptime(publish_date_str, '%Y-%m-%d').date()
except ValueError:
item['PublishDate'] = 'Invalid Date'
return content_items
except dynamodb.exceptions.ResourceNotFoundException:
print(f"Error: Table '{table_name}' or index '{index_name}' not found.")
return []
except Exception as e:
print(f"An error occurred: {e}")
return []
if __name__ == "__main__":
content_type_to_query = 'article' # content type
articles = get_content_by_type(content_type_to_query)
if articles:
print(f"Articles of type '{content_type_to_query}', sorted by Publish Date:")
for article in articles:
print(f" ContentID: {article.get('ContentID', {'S': 'N/A'})['S']}, Title: {article.get('Title', {'S': 'N/A'})['S']}, PublishDate: {article['PublishDate']}, Author: {article.get('Author', {'S': 'N/A'})['S']}, Status: {article.get('Status', {'S': 'N/A'})['S']}")
else:
print(f"No articles found with type '{content_type_to_query}'.")
Leave a Reply