Azure Cosmos DB Index Comparison: GSI vs. LSI

Azure Cosmos DB Index Comparison

Cosmos DB offers two main types of indexes to optimize query : Global Secondary Indexes (GSIs) and Local Secondary Indexes (LSIs). This article provides a detailed comparison.

Key Differences

Feature Global Secondary (GSI) Local Secondary Index (LSI)
Partition Key Can be different from the base container’s partition key Must be the same as the base container’s partition key
Sort Key Can be different from the base container’s sort key Can be different from the base container’s sort key
Provisioned Throughput Has its own provisioned throughput Shares the base container’s provisioned throughput
Storage Stored separately from the base container Stored within the same partition as the base container
Query Scope Can query data across partitions Can only query data within the same partition
Creation Can be created at any time Must be created when the container is created
Consistency Supports eventual consistency Supports strong consistency within the partition
Use Cases Best for queries that span multiple partitions or use a different partition key Best for queries within a single partition that use a different sort key
Number of Indexes Limited to 500 per container Limited to 5 per container partition key value

Benefits of Global Secondary Indexes (GSIs)

  • Flexible querying: GSIs allow you to query data using attributes other than the primary key, enabling diverse query patterns.
  • Scalability: GSIs have their own provisioned throughput, allowing you to scale read/write operations on the index independently of the base container.
  • Performance: GSIs can significantly improve query performance for non-key attributes, as they are optimized for specific query patterns.
  • Schema flexibility: You can add GSIs to an existing container without having to recreate it.

Benefits of Local Secondary Indexes (LSIs)

  • Strong consistency: LSIs support strong consistency within the partition, ensuring you get the most up-to-date data for queries within the same partition.
  • Cost-effective for specific use cases: LSIs can be more cost-effective than GSIs if your query patterns align with their limitations (same partition key).
  • Performance: LSIs can offer very fast read performance for queries that use the container’s partition key and an alternate sort key.

Real- Use Cases

Global Secondary Index (GSI)

  • E-commerce product catalog:

    • Container: Products (Partition Key: categoryId, Sort Key: productId)
    • GSI: name-price-index (Partition Key: name, Sort Key: price)
    • Use case: Querying products by name and price, allowing users to find products like “Electronics” sorted by price.
    • Code example (Azure CLI):
    
    az   container create --account-name <account_name> ---name <database_name> --name Products --partition-key-path "/categoryId"
    az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --container-name Products --index-policy '{"includedPaths": [{"path": "/*"}], "excludedPaths": [{"path": "/\"_etag\"/?"}], "compositeIndexes": [], "spatialIndexes": []}'
    az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --container-name Products --gsi-name "name-price-index" --partition-key-path "/name" --sort-key-path "/price"
                        
  • Social media connections:

    • Container: Connections (Partition Key: userId1, Sort Key: userId2)
    • GSI: user1-connectionDate-index (Partition Key: userId1, Sort Key: connectionDate)
    • Use case: Finding all connections for a given user, ordered by the date they were established.
    • Code Example (Azure CLI):
    
    az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --name Connections --partition-key-path "/userId1"
    az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --container-name Connections --index-policy '{"includedPaths": [{"path": "/*"}], "excludedPaths": [{"path": "/\"_etag\"/?"}], "compositeIndexes": [], "spatialIndexes": []}'
    az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --container-name Connections --gsi-name "user1-connectionDate-index" --partition-key-path "/userId1" --sort-key-path "/connectionDate"
                        
  • Order management:

    • Container: Orders (Partition Key: customerId, Sort Key: orderId)
    • GSI: customer-orderDate-index (Partition Key: customerId, Sort Key: orderDate)
    • Use case: Retrieving all orders for a specific customer, sorted by the order date.
    • Code Example (Azure CLI):
    
    az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --name Orders --partition-key-path "/customerId"
    az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --container-name Orders --index-policy '{"includedPaths": [{"path": "/*"}], "excludedPaths": [{"path": "/\"_etag\"/?"}], "compositeIndexes": [], "spatialIndexes": []}'
    az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --container-name Orders --gsi-name "customer-orderDate-index" --partition-key-path "/customerId" --sort-key-path "/orderDate"
                        
  • Game leaderboards:

    • Container: GameScores (Partition Key: gameId, Sort Key: score)
    • GSI: playerId-score-index (Partition Key: playerId, Sort Key: score)
    • Use Case: Fetching the leaderboard for a specific player, showing their scores across different games.
    • Code Example (Azure CLI):
    
    az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --name GameScores --partition-key-path "/gameId"
    az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --container-name GameScores --index-policy '{"includedPaths": [{"path": "/*"}], "excludedPaths": [{"path": "/\"_etag\"/?"}], "compositeIndexes": [], "spatialIndexes": []}'
    az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --container-name GameScores --gsi-name "playerId-score-index" --partition-key-path "/playerId" --sort-key-path "/score"
                        
  • Local Secondary Index (LSI)

    • Device events within a time range:

      • Container: DeviceEvents (Partition Key: deviceId, Sort Key: timestamp)
      • LSI: event-type-index (Sort Key: eventType)
      • Use case: Retrieving events for a specific device, sorted by event type.
      • Code example (Azure CLI):
      
      az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --name DeviceEvents --partition-key-path "/deviceId" --sort-key-path "/timestamp" --lsi-name "event-type-index" --lsi-sort-key-path "/eventType"
                          
  • E-commerce order history within a date range:

    • Container: Orders (Partition Key: customerId, Sort Key: orderDate)
    • LSI: orderStatus-index (Sort Key: orderStatus)
    • Use Case: Retrieving a customer’s orders, filtered by order status, within a specific date range.
    • Code example (Azure CLI):
    
    az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --name Orders --partition-key-path "/customerId" --sort-key-path "/orderDate" --lsi-name "orderStatus-index" --lsi-sort-key-path "/orderStatus"
                        
  • Choosing the Right Index

    • Use GSIs when your queries need to retrieve data from multiple partitions or use a different partition key than the base container.
    • Use LSIs when your queries are limited to a single partition but need to use a different sort key.

    By understanding the differences and use cases of GSIs and LSIs, you can effectively your Azure Cosmos DB containers and optimize your queries for performance and scalability.

    Agentic AI (9) AI (178) AI Agent (21) airflow (4) Algorithm (36) Algorithms (31) apache (41) API (108) Automation (11) Autonomous (26) auto scaling (3) AWS (30) Azure (22) BigQuery (18) bigtable (3) Career (7) Chatbot (21) cloud (87) cosmosdb (1) cpu (24) database (82) Databricks (13) Data structure (17) Design (76) dynamodb (4) ELK (1) embeddings (14) emr (4) flink (10) gcp (16) Generative AI (8) gpu (11) graphql (4) image (6) index (10) indexing (12) interview (6) java (39) json (54) Kafka (19) Life (43) LLM (25) LLMs (10) Mcp (2) monitoring (55) Monolith (6) N8n (12) Networking (14) NLU (2) node.js (9) Nodejs (6) nosql (14) Optimization (38) performance (54) Platform (87) Platforms (57) postgres (17) productivity (7) programming (17) pseudo code (1) python (55) RAG (132) rasa (3) rdbms (2) ReactJS (2) realtime (1) redis (6) Restful (6) rust (6) Spark (27) sql (43) time series (6) tips (1) tricks (13) Trie (62) vector (22) Vertex AI (11) Workflow (52)

    Leave a Reply

    Your email address will not be published. Required fields are marked *