Estimated reading time: 6 minutes

Azure Cosmos DB Index Comparison: GSI vs. LSI

Azure Cosmos DB Index Comparison

Azure Cosmos DB offers two main types of indexes to optimize query performance: Global Secondary Indexes (GSIs) and Local Secondary Indexes (LSIs). This article provides a detailed comparison.

Key Differences

FeatureGlobal Secondary Index (GSI)Local Secondary Index (LSI)
Partition KeyCan be different from the base container’s partition keyMust be the same as the base container’s partition key
Sort KeyCan be different from the base container’s sort keyCan be different from the base container’s sort key
Provisioned ThroughputHas its own provisioned throughputShares the base container’s provisioned throughput
StorageStored separately from the base containerStored within the same partition as the base container
Query ScopeCan query data across partitionsCan only query data within the same partition
CreationCan be created at any timeMust be created when the container is created
ConsistencySupports eventual consistencySupports strong consistency within the partition
Use CasesBest for queries that span multiple partitions or use a different partition keyBest for queries within a single partition that use a different sort key
Number of IndexesLimited to 500 per containerLimited to 5 per container partition key value

Benefits of Global Secondary Indexes (GSIs)

  • Flexible querying: GSIs allow you to query data using attributes other than the primary key, enabling diverse query patterns.
  • Scalability: GSIs have their own provisioned throughput, allowing you to scale read/write operations on the index independently of the base container.
  • Performance: GSIs can significantly improve query performance for non-key attributes, as they are optimized for specific query patterns.
  • Schema flexibility: You can add GSIs to an existing container without having to recreate it.

Benefits of Local Secondary Indexes (LSIs)

  • Strong consistency: LSIs support strong consistency within the partition, ensuring you get the most up-to-date data for queries within the same partition.
  • Cost-effective for specific use cases: LSIs can be more cost-effective than GSIs if your query patterns align with their limitations (same partition key).
  • Performance: LSIs can offer very fast read performance for queries that use the container’s partition key and an alternate sort key.

Real-Life Use Cases

Global Secondary Index (GSI)

  • E-commerce product catalog:

    • Container: Products (Partition Key: categoryId, Sort Key: productId)
    • GSI: name-price-index (Partition Key: name, Sort Key: price)
    • Use case: Querying products by name and price, allowing users to find products like “Electronics” sorted by price.
    • Code example (Azure CLI):
    • 
      az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --name Products --partition-key-path "/categoryId"
      az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --container-name Products --index-policy '{"includedPaths": [{"path": "/*"}], "excludedPaths": [{"path": "/\"_etag\"/?"}], "compositeIndexes": [], "spatialIndexes": []}'
      az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --container-name Products --gsi-name "name-price-index" --partition-key-path "/name" --sort-key-path "/price"
                          
  • Social media connections:

    • Container: Connections (Partition Key: userId1, Sort Key: userId2)
    • GSI: user1-connectionDate-index (Partition Key: userId1, Sort Key: connectionDate)
    • Use case: Finding all connections for a given user, ordered by the date they were established.
    • Code Example (Azure CLI):
    • 
      az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --name Connections --partition-key-path "/userId1"
      az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --container-name Connections --index-policy '{"includedPaths": [{"path": "/*"}], "excludedPaths": [{"path": "/\"_etag\"/?"}], "compositeIndexes": [], "spatialIndexes": []}'
      az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --container-name Connections --gsi-name "user1-connectionDate-index" --partition-key-path "/userId1" --sort-key-path "/connectionDate"
                          
  • Order management:

    • Container: Orders (Partition Key: customerId, Sort Key: orderId)
    • GSI: customer-orderDate-index (Partition Key: customerId, Sort Key: orderDate)
    • Use case: Retrieving all orders for a specific customer, sorted by the order date.
    • Code Example (Azure CLI):
    • 
      az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --name Orders --partition-key-path "/customerId"
      az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --container-name Orders --index-policy '{"includedPaths": [{"path": "/*"}], "excludedPaths": [{"path": "/\"_etag\"/?"}], "compositeIndexes": [], "spatialIndexes": []}'
      az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --container-name Orders --gsi-name "customer-orderDate-index" --partition-key-path "/customerId" --sort-key-path "/orderDate"
                          
  • Game leaderboards:

    • Container: GameScores (Partition Key: gameId, Sort Key: score)
    • GSI: playerId-score-index (Partition Key: playerId, Sort Key: score)
    • Use Case: Fetching the leaderboard for a specific player, showing their scores across different games.
    • Code Example (Azure CLI):
    • 
      az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --name GameScores --partition-key-path "/gameId"
      az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --container-name GameScores --index-policy '{"includedPaths": [{"path": "/*"}], "excludedPaths": [{"path": "/\"_etag\"/?"}], "compositeIndexes": [], "spatialIndexes": []}'
      az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --container-name GameScores --gsi-name "playerId-score-index" --partition-key-path "/playerId" --sort-key-path "/score"
                          

Local Secondary Index (LSI)

  • Device events within a time range:

    • Container: DeviceEvents (Partition Key: deviceId, Sort Key: timestamp)
    • LSI: event-type-index (Sort Key: eventType)
    • Use case: Retrieving events for a specific device, sorted by event type.
    • Code example (Azure CLI):
    • 
      az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --name DeviceEvents --partition-key-path "/deviceId" --sort-key-path "/timestamp" --lsi-name "event-type-index" --lsi-sort-key-path "/eventType"
                          
  • E-commerce order history within a date range:

    • Container: Orders (Partition Key: customerId, Sort Key: orderDate)
    • LSI: orderStatus-index (Sort Key: orderStatus)
    • Use Case: Retrieving a customer’s orders, filtered by order status, within a specific date range.
    • Code example (Azure CLI):
    • 
      az cosmosdb sql container create --account-name <account_name> --database-name <database_name> --name Orders --partition-key-path "/customerId" --sort-key-path "/orderDate" --lsi-name "orderStatus-index" --lsi-sort-key-path "/orderStatus"
                          

Choosing the Right Index

  • Use GSIs when your queries need to retrieve data from multiple partitions or use a different partition key than the base container.
  • Use LSIs when your queries are limited to a single partition but need to use a different sort key.

By understanding the differences and use cases of GSIs and LSIs, you can effectively design your Azure Cosmos DB containers and optimize your queries for performance and scalability.

Agentic AI (45) AI (2) AI Agent (25) airflow (3) Algorithm (45) Algorithms (108) apache (32) apex (11) API (118) Automation (68) Autonomous (84) auto scaling (5) AWS (63) aws bedrock (1) Azure (56) Banks (1) BigQuery (23) bigtable (3) blockchain (9) Career (9) Chatbot (26) cloud (166) cpu (54) cuda (13) Cybersecurity (30) database (89) Databricks (20) Data structure (22) Design (109) dynamodb (12) ELK (3) embeddings (49) emr (3) Finance (4) flink (10) gcp (21) Generative AI (40) gpu (41) graph (57) graph database (15) graphql (3) Healthcare (2) image (87) indexing (40) interview (11) java (45) json (39) Kafka (20) LLM (51) LLMs (75) market analysis (2) Market report (1) market summary (2) Mcp (6) monitoring (130) Monolith (3) mulesoft (8) N8n (9) Networking (18) NLU (5) node.js (19) Nodejs (3) nosql (22) Optimization (104) performance (254) Platform (149) Platforms (124) postgres (5) productivity (39) programming (71) pseudo code (1) python (89) pytorch (33) Q&A (4) RAG (51) rasa (5) rdbms (6) ReactJS (1) realtime (2) redis (11) Restful (7) rust (3) S3 (1) salesforce (25) Spark (32) spring boot (4) sql (79) stock (14) stock analysis (1) stock market (2) tensor (15) time series (17) tips (11) tricks (20) undervalued stocks (2) use cases (144) vector (73) vector db (8) Vertex AI (23) Workflow (68)