AI Agent with Short-Term Memory on Google Cloud

AI Agent with Short-Term Memory on Google Cloud

Creating AI agents capable of handling complex tasks and maintaining context requires implementing short-term memory, often referred to as “scratchpad” or working memory. This allows agents to temporarily store and process information relevant to their immediate goals. Google Platform () offers a range of services that can be utilized to build effective short-term memory for AI agents.

Why Short-Term Memory is Crucial for AI Agents:

  • Contextual Understanding: Enables agents to remember previous parts of a conversation or steps in a task.
  • Task Decomposition: Helps in breaking down complex tasks into smaller steps and tracking progress.
  • Tool Integration: Facilitates the management of inputs and outputs when using external tools or APIs.
  • Dynamic Planning: Allows agents to adjust their actions based on immediate feedback or new information.
  • Improved Reasoning: Provides a space for temporary storage of intermediate thoughts and calculations.

GCP Services for Implementing Short-Term Memory (with Conceptual Examples):

GCP provides various services suitable for building short-term memory for AI agents:

1. In-Memory Storage within Compute Services:

  • Compute Engine VMs: RAM within the virtual machine instances provides direct and fast short-term memory for running processes.
  • Cloud Functions: Variables within the function’s execution scope offer ephemeral short-term memory per invocation. Local filesystem access within the function can also be used for very short-term needs.
  • Cloud Run / Google Kubernetes Engine (GKE): Container memory within these managed compute platforms can serve as an in-memory scratchpad for the duration of a request or pod lifecycle.

Use Case: Simple, localized state management within a single session or task execution.


import time

class InMemoryAgent:
    def __init__(self):
        self.scratchpad = {}

    def process_data(self, session_id, data_point):
        if session_id not in self.scratchpad:
            self.scratchpad[session_id] = []
        self.scratchpad[session_id].append(data_point)
        print(f"Session {session_id}'s data: {self.scratchpad[session_id]}")

# Example usage (within a Compute Engine VM, Cloud Function, or container)
agent = InMemoryAgent()
agent.process_data("session123", {"step": 1, "value": "initial"})
agent.process_data("session123", {"step": 2, "value": "intermediate"})
    

2. Managed In-Memory Data Stores:

  • Cloud Memorystore for : A fully managed, scalable, and highly available in-memory data store service based on the popular open-source Redis. Offers various data structures suitable for session management, caching, and temporary data storage with low latency.
  • Cloud Memorystore for Memcached: A fully managed, scalable, and highly available in-memory key-value store service compatible with the Memcached protocol, ideal for caching frequently accessed data for short durations.

Use Case: Maintaining user session data for conversational agents, caching intermediate responses, storing temporary task states.


import redis
import 
import time

# Configuration (replace with your actual Cloud Memorystore for Redis configuration)
REDIS_HOST = "your-redis-instance-ip" # Or hostname
REDIS_PORT = 6379

try:
    r = redis.Redis(host=REDIS_HOST, port=REDIS_PORT)

    r.set('chat:user789', json.dumps({'last_message': 'Hello', 'timestamp': time.time()}))
    user_chat_data = json.loads(r.get('chat:user789'))
    print(f"User 789 Chat Data from Redis: {user_chat_data}")

    r.delete('chat:user789')

except redis.exceptions.ConnectionError as e:
    print(f"Error connecting to Cloud Memorystore for Redis: {e}")
    

3. Databases with Time-to-Live (TTL):

  • Cloud Firestore with TTL: Offers Time-to-Live (TTL) functionality where documents can be automatically deleted after a specified duration. Suitable for temporary data that needs persistence with automatic cleanup.
  • Cloud Bigtable with TTL: Supports column family-level TTL, allowing you to automatically expire data after a defined period. Useful for managing temporary or time-sensitive information.

Use Case: Storing temporary authentication tokens, managing the state of short-lived workflows, caching data with an expiration policy.


# Conceptual  example using the google-cloud-firestore library with TTL
from google.cloud import firestore
from datetime import datetime, timedelta

db = firestore.Client()
temp_data_collection = db.collection('temp_data')

def store_temporary_data(item_id, data, expiry_seconds=3600):
    expiry_time = datetime.utcnow() + timedelta(seconds=expiry_seconds)
    item = {
        'data': data,
        'expiry': expiry_time
    }
    temp_data_collection.document(item_id).set(item)
    print(f"Temporary data '{item_id}' stored in Firestore with TTL.")

# Note: Firestore TTL is configured on the collection level and might take some time to activate.
    

4. Leveraging Language Model Context Windows (for Vertex AI PaLM API):

  • Vertex AI PaLM API: Provides access to Google’s powerful large language models. The context window of these models acts as a form of short-term memory during an interaction. Effective prompt engineering is crucial for managing this context.
  • Cloud Functions / Cloud Run / GKE: The compute environments for running applications that interact with the Vertex AI PaLM API. Context is managed within the input and output to the model.
  • Vertex AI Search (for RAG): Can be used for Retrieval-Augmented Generation (RAG), where relevant information is retrieved and injected into the prompt context, effectively extending the model’s short-term memory with external knowledge.

Use Case: Building conversational AI, question answering systems, and agents that require reasoning over specific documents within a single interaction.


# Conceptual Python example using the google-cloud-aiplatform library
from google.cloud import aiplatform

aiplatform.init(project="your-gcp-project-id", location="your-gcp-region")
chat_model = aiplatform.ChatModel.from_pretrained("chat-bison")
chat = chat_model.start_chat(
    context="You are a helpful AI assistant.",
    examples=[
        aiplatform.ChatExample(
            input="What is the capital of France?",
            output="The capital of France is Paris."
        )
    ]
)
response = chat.send_message("What is the population of that city?")
print(response.text)
    

Key Aspects: Careful prompt and managing the token limit of the context window are essential. Vertex AI Vector Search can enhance the context with relevant information.

5. Workflows (for Orchestrated Short-Term State):

  • Cloud Workflows: A fully managed orchestration service that allows you to combine and automate Google Cloud and third-party services. The state of a workflow execution, including variables and outputs of steps, can act as a form of short-term memory for the duration of the workflow instance.

Use Case: Implementing complex, multi-step processes where the agent needs to remember the state across different service invocations.


# Conceptual Python example interacting with Cloud Workflows (Illustrative - requires a defined workflow)
# import google.auth
# from googleapiclient.discovery import build

# credentials, project = google.auth.default()
# workflow_client = build('workflows', 'v1', credentials=credentials)
# parent = f"projects/{project}/locations/your-gcp-region"
# workflow_id = "your_workflow_id"
# body = {"argument": json.dumps({"user_id": "user123", "task": "process_data"})}

# try:
#     response = workflow_client.projects().locations().workflows().executions().create(
#         parent=parent,
#         workflowId=workflow_id,
#         body=body
#     ).execute()
#     print(f"Started workflow execution: {response['name']}")
# except Exception as e:
#     print(f"Error starting workflow: {e}")
    

Key Aspects: Cloud Workflows manage the state of an automated process, allowing for stateful logic across different components.

Live Use Cases in Summary:

  • Contextual Customer Support Bot: A customer support bot running on Cloud Run uses Cloud Memorystore for Redis to store the conversation history within a session, enabling it to understand follow-up questions.
  • Real-time Recommendation Engine: A recommendation service uses in-memory storage within Compute Engine VMs or Cloud Memorystore for Memcached to track a user’s recent interactions and provide immediate, personalized suggestions.
  • Workflow Automation Agent: An AI agent automating a business process utilizes Cloud Workflows to manage the state of a multi-step workflow, remembering the outputs of previous tasks to inform subsequent actions.
  • Dynamic Pricing Service: A pricing service uses Cloud Functions and Cloud Memorystore for Redis to maintain a short-term cache of recent demand fluctuations and adjust prices in real-time.

Choosing the Right Approach:

The selection of GCP services for implementing short-term memory depends on factors such as latency requirements, data volume, persistence needs across requests or sessions, complexity of the data structures, and the overall architecture of your AI agent on GCP.

Developers have access to a powerful and versatile set of GCP services to build AI agents with effective short-term memory capabilities, enabling more interactive and intelligent applications.

Agentic AI (26) AI Agent (20) airflow (7) Algorithm (22) Algorithms (20) apache (46) API (101) Automation (43) Autonomous (6) auto scaling (3) AWS (44) aws bedrock (1) Azure (22) BigQuery (11) bigtable (7) Career (2) Chatbot (10) cloud (50) code (123) cosmosdb (3) cpu (26) database (83) Databricks (10) Data structure (16) Design (62) dynamodb (16) ELK (1) embeddings (9) emr (10) examples (47) flink (9) gcp (18) Generative AI (7) gpu (7) graph (55) graph database (14) image (29) index (32) indexing (11) interview (5) java (37) json (54) Kafka (28) LLM (29) LLMs (10) monitoring (64) Monolith (10) Networking (6) NLU (2) node.js (10) Nodejs (1) nosql (21) Optimization (45) performance (101) Platform (48) Platforms (22) postgres (15) productivity (10) programming (34) python (59) RAG (105) rasa (3) rdbms (4) ReactJS (3) redis (21) Restful (3) rust (12) Spark (21) spring boot (1) sql (42) time series (13) tips (6) tricks (2) vector (15) Vertex AI (14) Workflow (21)

Leave a Reply

Your email address will not be published. Required fields are marked *