Sample Project demonstrating moving Data from Kafka into Tableau

Here we demonstrate connection from Tableau to using a most practical approach using a as a sink via Kafka Connect and then connecting Tableau to that database.

Here’s a breakdown with conceptual configuration and code snippets:

Scenario: We’ll stream JSON data from a Kafka topic (user_activity) into a PostgreSQL database table (user_activity_table) using Kafka Connect. Then, we’ll connect Tableau to this PostgreSQL database.

Part 1: Kafka Data (Conceptual)

Assume your Kafka topic user_activity contains JSON messages like this:

JSON

{
  "user_id": "user123",
  "event_type": "page_view",
  "page_url": "/products",
  "timestamp": "2025-04-23T14:30:00Z"
}

Part 2: PostgreSQL Database Setup

  1. Install PostgreSQL: If you don’t have it already, install PostgreSQL.
  2. Create a Database and Table: Create a database (e.g., kafka_data) and a table (user_activity_table) to store the Kafka data:
    • SQL
      • CREATE DATABASE kafka_data;
      • CREATE TABLE user_activity_table ( user_id VARCHAR(255), event_type VARCHAR(255), page_url TEXT, timestamp TIMESTAMP WITH TIME ZONE );

Part 3: Kafka Connect Setup and Configuration

  1. Install Kafka Connect: Kafka Connect is usually included with your Kafka distribution.
  2. Download PostgreSQL JDBC Driver: Download the PostgreSQL JDBC driver (postgresql-*.jar) and place it in the Kafka Connect plugin path.
  3. Configure a JDBC Sink Connector: Create a configuration file (e.g., postgres_sink.properties) for the JDBC Sink Connector:
    • Properties
      • name=postgres-sink-connector connector.class=io.confluent.connect.jdbc.JdbcSinkConnector tasks.max=1 topics=user_activity connection.url=jdbc:postgresql://your_postgres_host:5432/kafka_data connection.user=your_postgres_user connection.password=your_postgres_password table.name.format=user_activity_table insert.mode=insert pk.mode=none value.converter=org.apache.kafka.connect.json.JsonConverter value.converter.schemas.enable=false
        • Replace your_postgres_host, your_postgres_user, and your_postgres_password with your PostgreSQL connection details.
        • topics: Specifies the Kafka topic to consume from.
        • connection.url: JDBC connection string for PostgreSQL.
        • table.name.format: The name of the table to write to.
        • value.converter: Specifies how to convert the Kafka message value (we assume JSON).
  4. Start Kafka Connect: Run the Kafka Connect worker, pointing it to your connector configuration:
  • Bash
    • ./bin/connect-standalone.sh config/connect-standalone.properties config/postgres_sink.properties
    • config/connect-standalone.properties would contain the basic Kafka Connect worker configuration (broker list, plugin paths, etc.).

Part 4: Producing Sample Data to Kafka (Python)

Here’s a simple Python script using the kafka-python library to produce sample JSON data to the user_activity topic:

Python

from kafka import KafkaProducer
import json
import datetime
import time

KAFKA_BROKER = 'your_kafka_broker:9092'  
# Replace with your Kafka broker address
KAFKA_TOPIC = 'user_activity'

producer = KafkaProducer(
    bootstrap_servers=[KAFKA_BROKER],
    value_serializer=lambda x: json.dumps(x).encode('utf-8')
)

try:
    for i in range(5):
        timestamp = datetime.datetime.utcnow().isoformat() + 'Z'
        user_activity_data = {
            "user_id": f"user{100 + i}",
            "event_type": "click",
            "page_url": f"/item/{i}",
            "timestamp": timestamp
        }
        producer.send(KAFKA_TOPIC, value=user_activity_data)
        print(f"Sent: {user_activity_data}")
        time.sleep(1)

except Exception as e:
    print(f"Error sending data: {e}")
finally:
    producer.close()
  • Replace your_kafka_broker:9092 with the actual address of your Kafka broker.
  • This script sends a few sample JSON messages to the user_activity topic.

Part 5: Connecting Tableau to PostgreSQL

  1. Open Tableau Desktop.
  2. Under “Connect,” select “PostgreSQL.”
  3. Enter the connection details:
    • Server: your_postgres_host
    • Database: kafka_data
    • User: your_postgres_user
    • Password: your_postgres_password
    • Port: 5432 (default)
  4. Click “Connect.”
  5. Select the public schema (or the schema where user_activity_table resides).
  6. Drag the user_activity_table to the canvas.
  7. You can now start building visualizations in Tableau using the data from the user_activity_table, which is being populated in near real-time by Kafka Connect.

Limitations and Considerations:

  • Not True Real-time in Tableau: Tableau will query the PostgreSQL database based on its refresh settings (live connection or scheduled extract). It won’t have a direct, push-based real-time stream from Kafka.
  • Complexity: Setting up Kafka Connect and a database adds complexity compared to a direct connector.
  • Data Transformation: You might need to perform more complex transformations within PostgreSQL or Tableau.
  • Error Handling: Robust error handling is crucial in a production Kafka Connect setup.

Alternative (Conceptual – No Simple Code): Using a Real-time Data Platform (e.g., Rockset)

While providing a full, runnable code example for a platform like Rockset is beyond a simple snippet, the concept involves:

  1. Rockset Kafka Integration: Configuring Rockset to connect to your Kafka cluster and continuously ingest data from the user_activity topic. Rockset handles schema discovery and indexing.
  2. Tableau Rockset Connector: Using Tableau’s native Rockset connector (you’d need a Rockset account and key) to directly query the real-time data in Rockset.

This approach offers lower latency for real-time analytics in Tableau compared to the database sink method but involves using a third-party service.

In conclusion, while direct Kafka connectivity in Tableau is limited, using Kafka Connect to pipe data into a Tableau-supported database (like PostgreSQL) provides a practical way to visualize near real-time data with the help of configuration and standard database connection methods. For true low-latency real-time visualization, exploring dedicated real-time data platforms with Tableau connectors is the more suitable direction.