DynamoDB Indexing Examples

DynamoDB Indexing Examples

Examples

Here are detailed examples of DynamoDB indexing, including Global Secondary Indexes (GSIs) and Local Secondary Indexes (LSIs), with explanations.

Example 1: E-commerce Product Catalog

Table: Products

Primary Key: ProductID (Partition Key), SKU (Sort Key)

Attributes: Name, Category, Price, Brand, Color, Size

Scenario

We want to efficiently query products by category and brand. While we could scan the entire table and filter, this becomes very inefficient as the table grows. Instead, we create a Global Secondary (GSI).

GSI Definition


{
    "IndexName": "CategoryBrandIndex",
    "KeySchema": [
        { "AttributeName": "Category", "KeyType": "HASH" },  // Partition key of the GSI
        { "AttributeName": "Brand", "KeyType": "RANGE" }    // Sort key of the GSI
    ],
    "Projection": {
        "ProjectionType": "INCLUDE",
        "NonKeyAttributes": ["Name", "Price", "Color", "Size"] // Attributes projected into the GSI
    },
    "ProvisionedThroughput": { // Important for write-heavy applications
        "ReadCapacityUnits": 5,
        "WriteCapacityUnits": 10
    }
}
    

Explanation

  • CategoryBrandIndex**: The name of the GSI.
  • Category (HASH): The partition key for the GSI. Items with the same category will be stored together.
  • Brand (RANGE): The sort key for the GSI. Within a category, items will be sorted by brand.
  • Projection**: Defines which attributes are copied from the base table into the GSI.
    • INCLUDE**: Specifies that only the listed NonKeyAttributes in addition to the primary key of the base table and the key attributes of the GSI are projected.
    • NonKeyAttributes**: The attributes we want to retrieve when querying the GSI. We include Name, Price, Color, and Size because we often want to display these when listing products.
  • ProvisionedThroughput**: GSIs have their own read/write capacity. You need to specify this, or use on-demand capacity mode.

Querying the GSI

Now, we can efficiently query for all “Electronics” products of the “Sony” brand:


// Example  SDK code (Conceptual)
dynamoDB.query({
    TableName: "Products",
    IndexName: "CategoryBrandIndex",
    KeyConditionExpression: "Category = :category AND Brand = :brand",
    ExpressionAttributeValues: {
        ":category": "Electronics",
        ":brand": "Sony"
    },
    ProjectionExpression: "Name, Price, Color, Size" // Redundant, but good practice to be explicit.
});
    

This query will be much faster than scanning the entire Products table.

Code Example


import boto3

# Create a DynamoDB client
dynamodb = boto3.client('dynamodb')

table_name = 'Products'
index_name = 'CategoryBrandIndex'

# Query the GSI
response = dynamodb.query(
    TableName=table_name,
    IndexName=index_name,
    KeyConditionExpression='Category = :category AND Brand = :brand',
    ExpressionAttributeValues={
        ':category': {'S': 'Electronics'},
        ':brand': {'S': 'Sony'}
    },
    ProjectionExpression='Name, Price, Color, Size'
)

items = response['Items']
for item in items:
    print(f"Name: {item['Name']['S']}, Price: {item['Price']['N']}, Color: {item['Color']['S']}, Size: {item['Size']['S']}")
    

Example 2: Game Scores with Local Secondary Index

Table: GameScores

Primary Key: UserID (Partition Key), Timestamp (Sort Key)

Attributes: GameID, Score, Level, Character

Scenario

We want to retrieve a user’s game scores, sorted by level. Because we want to sort by level within each user, a Local Secondary Index (LSI) is appropriate.

Important: LSIs must be created when the table is created.

Table and LSI Definition


{
    "TableName": "GameScores",
    "KeySchema": [
        { "AttributeName": "UserID", "KeyType": "HASH" },       // Partition key of the table
        { "AttributeName": "Timestamp", "KeyType": "RANGE" }    // Sort key of the table
    ],
    "AttributeDefinitions": [
        { "AttributeName": "UserID", "AttributeType": "S" },
        { "AttributeName": "Timestamp", "AttributeType": "N" },
        { "AttributeName": "Level", "AttributeType": "N" }     // Attribute for the LSI
    ],
    "ProvisionedThroughput": {
        "ReadCapacityUnits": 5,
        "WriteCapacityUnits": 5
    },
    "LocalSecondaryIndexes": [
        {
            "IndexName": "LevelIndex",
            "KeySchema": [
                { "AttributeName": "UserID", "KeyType": "HASH" },   // Must be the same as the table's partition key
                { "AttributeName": "Level", "KeyType": "RANGE" }    // Sort key for the LSI
            ],
            "Projection": {
                "ProjectionType": "ALL" // Project all attributes from the base table into the LSI
            }
        }
    ]
}
    

Explanation

  • LevelIndex**: The name of the LSI.
  • UserID (HASH): The partition key for the LSI must be the same as the base table’s partition key.
  • Level (RANGE): The sort key for the LSI. Scores for a given user will be sorted by level.
  • Projection**:
    • ALL**: All attributes from the base table are projected into the LSI. This provides maximum flexibility but consumes more storage.

Querying the LSI

To get a user’s scores sorted by level:


// Example AWS SDK code (Conceptual)
dynamoDB.query({
    TableName: "GameScores",
    IndexName: "LevelIndex",
    KeyConditionExpression: "UserID = :user_id",
    ExpressionAttributeValues: {
        ":user_id": "user123"
    },
    ScanIndexForward: true // Sort ascending by level
});
    

Code Example


import boto3

# Create a DynamoDB client
dynamodb = boto3.client('dynamodb')

table_name = 'GameScores'
index_name = 'LevelIndex'
user_id = 'user123'

# Query the LSI
response = dynamodb.query(
    TableName=table_name,
    IndexName=index_name,
    KeyConditionExpression='UserID = :user_id',
    ExpressionAttributeValues={
        ':user_id': {'S': user_id}
    },
    ScanIndexForward=True  # Sort ascending by level
)

items = response['Items']
for item in items:
    print(f"UserID: {item['UserID']['S']}, Timestamp: {item['Timestamp']['N']}, Level: {item['Level']['N']}, Score: {item['Score']['N']}")
    

Example 3: Using a GSI for a Different Access Pattern

Table: Orders

Primary Key: OrderID (Partition Key)

Attributes: CustomerID, OrderDate, OrderStatus, TotalAmount

Scenario

We want to be able to query orders by CustomerID to see a user’s order history. Since the primary key is OrderID, we create a GSI.

GSI Definition


{
    "IndexName": "CustomerOrderIndex",
    "KeySchema": [
      { "AttributeName": "CustomerID", "KeyType": "HASH" },
      { "AttributeName": "OrderDate", "KeyType": "RANGE" }  // Sort by order date
    ],
    "Projection": {
        "ProjectionType": "KEYS_ONLY" // Efficient, only project the key attributes.
    }
}
    

Explanation

  • CustomerOrderIndex**: Name of the GSI.
  • CustomerID (HASH): Partition key for the GSI.
  • OrderDate (RANGE): Sort key for the GSI.
  • Projection**:
    • KEYS_ONLY**: Only the key attributes of the table and the index are projected. In this case, OrderID (from the table), CustomerID, and OrderDate are projected. This is the most efficient projection if you only need to identify the matching items, and can then fetch the full item from the table if needed.

Querying the GSI

To get all orders for a specific customer, sorted by order date:


// Example AWS SDK Code (Conceptual)
dynamoDB.query({
    TableName: "Orders",
    IndexName: "CustomerOrderIndex",
    KeyConditionExpression: "CustomerID = :customer_id",
    ExpressionAttributeValues: {
        ":customer_id": "cust123"
    },
    ScanIndexForward: false // Sort descending to get most recent orders first
});
    

Code Example


import boto3
from datetime import datetime

# Create a DynamoDB client
dynamodb = boto3.client('dynamodb')

table_name = 'Orders'
index_name = 'CustomerOrderIndex'
customer_id = 'cust123'

# Query the GSI
response = dynamodb.query(
    TableName=table_name,
    IndexName=index_name,
    KeyConditionExpression='CustomerID = :customer_id',
    ExpressionAttributeValues={
        ':customer_id': {'S': customer_id}
    },
    ScanIndexForward=False  # Sort descending to get most recent orders first
)

items = response['Items']
for item in items:
    order_id = item.get('OrderID', {'S': 'N/A'})['S']
    order_date_str = item.get('OrderDate', {'S': 'N/A'})['S']
    # Convert OrderDate string to datetime object (assuming YYYY-MM-DD format)
    try:
        order_date = datetime.strptime(order_date_str, '%Y-%m-%d').date()
    except ValueError:
        order_date = 'Invalid Date'

    print(f"OrderID: {order_id}, OrderDate: {order_date}")
    

Choosing Between LSI and GSI

  • Local Secondary Index (LSI)
    • The partition key of the LSI is the same as the base table’s partition key.
    • The sort key of the LSI is different from the base table’s sort key.
    • LSIs *must* be created when the table is created.
    • LSIs are stored in the same partition as the base table, which allows for strongly consistent reads.
    • LSIs have a size limit of 10 GB per partition key value.
    • LSIs are generally used when you need an alternate sort order for a table’s data within each partition.
  • Global Secondary Index (GSI)
    • The partition key and sort key of the GSI can be different from the base table’s primary key.
    • GSIs can be created or deleted at any time.
    • GSIs are stored in different partitions from the base table.
    • Reads from GSIs are eventually consistent (strongly consistent reads are not supported).
    • GSIs do not have the 10 GB size limit per partition key value.
    • GSIs are the most flexible type of index and are used to support query patterns that differ from the table’s primary key.

More Real- Examples

Here are some additional real-world scenarios where DynamoDB indexing is crucial:

1. Social Media Connections

  • Table: Connections
  • Primary Key: UserID1, UserID2 (Composite Key)
  • Attributes: ConnectionDate, Status
  • Scenario: You want to efficiently find all connections for a given user.
  • GSI: Create a GSI with UserID1 as the partition key and ConnectionDate as the sort key. This allows you to quickly retrieve a user’s connections, ordered by the date they were established.
  • Query: You can then query the GSI using the UserID1 to get all connections for that user.

Code Example


import boto3
from datetime import datetime

# Create a DynamoDB client
dynamodb = boto3.client('dynamodb')
table_name = 'Connections'
index_name = 'User1ConnectionDateIndex'

# Assuming you have a GSI named 'User1ConnectionDateIndex'
try:
    response = dynamodb.query(
        TableName=table_name,
        IndexName=index_name,
        KeyConditionExpression='UserID1 = :user_id',
        ExpressionAttributeValues={
            ':user_id': {'S': 'user123'}  # Replace with the user ID you want to query
        },
        ScanIndexForward=True # Sort ascending, False for descending
    )

    connections = response['Items']
    if connections:
        print(f"Connections for UserID1: user123")
        for connection in connections:
            user_id2 = connection['UserID2']['S']
            connection_date_str = connection['ConnectionDate']['S']
            status = connection['Status']['S']
            try:
                connection_date = datetime.strptime(connection_date_str, '%Y-%m-%d').date()
            except ValueError:
                connection_date = 'Invalid Date'
            print(f"  UserID2: {user_id2}, ConnectionDate: {connection_date}, Status: {status}")
    else:
        print("No connections found for the specified user.")

except dynamodb.exceptions.ResourceNotFoundException:
    print(f"Error: Table '{table_name}' or index '{index_name}' not found.")
except Exception as e:
    print(f"An error occurred: {e}")
    

2. E-commerce Order History

  • Table: Orders
  • Primary Key: OrderID
  • Attributes: CustomerID, OrderDate, OrderStatus, TotalAmount, ShippingAddress
  • Scenario: You want to retrieve all orders for a specific customer, sorted by order date.
  • GSI: Create a GSI with CustomerID as the partition key and OrderDate as the sort key.
  • Query: You can then query the GSI using the CustomerID to get the order history for that customer, sorted by date.

Code Example


import boto3
from datetime import datetime

# Create a DynamoDB client
dynamodb = boto3.client('dynamodb')
table_name = 'Orders'
index_name = 'CustomerOrderDateIndex'

def get_order_history(customer_id):
    """
    Retrieves the order history for a given customer, sorted by order date.

    Args:
        customer_id (str): The ID of the customer.

    Returns:
        list: A list of order dictionaries, sorted by order date (most recent first).
              Returns an empty list if no orders are found or an error occurs.
    """
    try:
        response = dynamodb.query(
            TableName=table_name,
            IndexName=index_name,
            KeyConditionExpression='CustomerID = :cust_id',
            ExpressionAttributeValues={
                ':cust_id': {'S': customer_id}
            },
            ScanIndexForward=False  # Sort descending (most recent first)
        )
        orders = response.get('Items', [])  # Ensure we get an empty list if no items

        # Convert date strings to datetime objects for easier handling if needed
        for order in orders:
            order_date_str = order.get('OrderDate', {'S': 'N/A'})['S']  #handle missing
            try:
                order['OrderDate'] = datetime.strptime(order_date_str, '%Y-%m-%d').date()
            except ValueError:
                order['OrderDate'] = 'Invalid Date'  # Handle invalid date format

        return orders

    except dynamodb.exceptions.ResourceNotFoundException:
        print(f"Error: Table '{table_name}' or index '{index_name}' not found.")
        return []  # Return empty list on table/index error
    except Exception as e:
        print(f"An error occurred: {e}")
        return []  # Return empty list on other errors

if __name__ == "__main__":
    customer_id_to_query = 'cust123'  # Replace with the desired customer ID
    order_history = get_order_history(customer_id_to_query)

    if order_history:
        print(f"Order history for Customer ID: {customer_id_to_query}")
        for order in order_history:
            print(f"  OrderID: {order.get('OrderID', {'S': 'N/A'})['S']}, OrderDate: {order['OrderDate']}, OrderStatus: {order.get('OrderStatus', {'S': 'N/A'})['S']}, TotalAmount: {order.get('TotalAmount', {'N': 'N/A'})['N'] }, ShippingAddress: {order.get('ShippingAddress', {'S': 'N/A'})['S']}")
    else:
        print(f"No orders found for Customer ID: {customer_id_to_query}")
    

3. IoT Device Data

  • Table: DeviceData
  • Primary Key: DeviceID, Timestamp
  • Attributes: Temperature, Humidity, Pressure, Location
  • Scenario: You want to query device data for a specific time range for a given device.
  • LSI: You could use an LSI with the same partition key DeviceID and Timestamp as the sort key.
  • Query: You can then query the LSI using the DeviceID and a timestamp range to get the device data within that time frame.

Code Example


import boto3
from datetime import datetime

# Create a DynamoDB client
dynamodb = boto3.client('dynamodb')
table_name = 'DeviceData'
index_name = 'DeviceIDTimestampIndex'  #  LSI name
device_id = 'device001'

def query_device_data(device_id, start_time, end_time):
    """Queries device data for a specific device within a timestamp range using an LSI.

    Args:
        device_id (str): The ID of the device.
        start_time (datetime): The start time for the query.
        end_time (datetime): The end time for the query.

    Returns:
        list: A list of device data items within the specified time range.
    """
    try:
        response = dynamodb.query(
            TableName=table_name,
            IndexName=index_name,
            KeyConditionExpression='DeviceID = :device_id AND #ts BETWEEN :start_ts AND :end_ts',
            ExpressionAttributeValues={
                ':device_id': {'S': device_id},
                ':start_ts': {'N': str(start_time.timestamp())},
                ':end_ts': {'N': str(end_time.timestamp())}
            },
            ExpressionAttributeNames={  #  need to define a placeholder for reserved keyword Timestamp
                '#ts': 'Timestamp'
            }
        )
        return response.get('Items', [])
    except dynamodb.exceptions.ResourceNotFoundException:
        print(f"Error: Table '{table_name}' or index '{index_name}' not found.")
        return []
    except Exception as e:
        print(f"An error occurred: {e}")
        return []

if __name__ == "__main__":
    # Define the time range
    start_time = datetime(2023, 1, 1, 0, 0, 0)
    end_time = datetime(2023, 1, 1, 12, 0, 0)

    # Query the device data
    device_data = query_device_data(device_id, start_time, end_time)

    if device_data:
        print(f"Device Data for Device ID: {device_id} between {start_time} and {end_time}:")
        for item in device_data:
            timestamp = datetime.fromtimestamp(float(item['Timestamp']['N']))
            temperature = item.get('Temperature', {'N': 'N/A'})['N']
            humidity = item.get('Humidity', {'N': 'N/A'})['N']
            pressure = item.get('Pressure', {'N': 'N/A'})['N']
            location = item.get('Location', {'S': 'N/A'})['S']
            print(f"  Timestamp: {timestamp}, Temperature: {temperature}, Humidity: {humidity}, Pressure: {pressure}, Location: {location}")
    else:
        print(f"No data found for Device ID: {device_id} between {start_time} and {end_time}")
```

4. Content Management System

  • Table: Content
  • Primary Key: ContentID
  • Attributes: Title, Type, PublishDate, Author, Status
  • Scenario: You want to find all content of a specific type (e.g., “article”, “blog post”) ordered by publication date.
  • GSI: Create a GSI with Type as the partition key and PublishDate as the sort key.
  • Query: A query on this GSI with Type = "article" will return all articles, sorted by publication date.

Code Example


import boto3
from datetime import datetime

# Create a DynamoDBclient
dynamodb = boto3.client('dynamodb')
table_name = 'Content'
index_name = 'TypePublishDateIndex'

def get_content_by_type(content_type):
    """
    Retrieves content of a specific type, ordered by publication date.

    Args:
        content_type (str): The type of content to retrieve (e.g., "article", "blog post").

    Returns:
        list: A list of content items of the specified type, sorted by publication date.
    """
    try:
        response = dynamodb.query(
            TableName=table_name,
            IndexName=index_name,
            KeyConditionExpression='ContentType = :content_type',
            ExpressionAttributeValues={
                ':content_type': {'S': content_type}
            },
            ScanIndexForward=False  # Sort descending (most recent first)
        )
        content_items = response.get('Items', [])

        # Convert date strings to datetime objects
        for item in content_items:
            publish_date_str = item.get('PublishDate', {'S': 'N/A'})['S']
            try:
                item['PublishDate'] = datetime.strptime(publish_date_str, '%Y-%m-%d').date()
            except ValueError:
                item['PublishDate'] = 'Invalid Date'

        return content_items

    except dynamodb.exceptions.ResourceNotFoundException:
        print(f"Error: Table '{table_name}' or index '{index_name}' not found.")
        return []
    except Exception as e:
        print(f"An error occurred: {e}")
        return []

if __name__ == "__main__":
    content_type_to_query = 'article'  #  content type
    articles = get_content_by_type(content_type_to_query)

    if articles:
        print(f"Articles of type '{content_type_to_query}', sorted by Publish Date:")
        for article in articles:
            print(f"  ContentID: {article.get('ContentID', {'S': 'N/A'})['S']}, Title: {article.get('Title', {'S': 'N/A'})['S']}, PublishDate: {article['PublishDate']}, Author: {article.get('Author', {'S': 'N/A'})['S']}, Status: {article.get('Status', {'S': 'N/A'})['S']}")
    else:
        print(f"No articles found with type '{content_type_to_query}'.")
    

Agentic AI (9) AI (178) AI Agent (21) airflow (4) Algorithm (36) Algorithms (31) apache (41) API (108) Automation (11) Autonomous (26) auto scaling (3) AWS (30) Azure (22) BigQuery (18) bigtable (3) Career (7) Chatbot (21) cloud (87) cosmosdb (1) cpu (24) database (82) Databricks (13) Data structure (17) Design (76) dynamodb (4) ELK (1) embeddings (14) emr (4) flink (10) gcp (16) Generative AI (8) gpu (11) graphql (4) image (6) index (10) indexing (12) interview (6) java (39) json (54) Kafka (19) Life (43) LLM (25) LLMs (10) Mcp (2) monitoring (55) Monolith (6) N8n (12) Networking (14) NLU (2) node.js (9) Nodejs (6) nosql (14) Optimization (38) performance (54) Platform (87) Platforms (57) postgres (17) productivity (7) programming (17) pseudo code (1) python (55) RAG (132) rasa (3) rdbms (2) ReactJS (2) realtime (1) redis (6) Restful (6) rust (6) Spark (27) sql (43) time series (6) tips (1) tricks (13) Trie (62) vector (22) Vertex AI (11) Workflow (52)

Leave a Reply

Your email address will not be published. Required fields are marked *