Here we demonstrate connection from Tableau to Kafka using a most practical approach using a database as a sink via Kafka Connect and then connecting Tableau to that database.
Here’s a breakdown with conceptual configuration and Python code snippets:
Scenario: We’ll stream JSON data from a Kafka topic (user_activity
) into a PostgreSQL database table (user_activity_table
) using Kafka Connect. Then, we’ll connect Tableau to this PostgreSQL database.
Part 1: Kafka Data (Conceptual)
Assume your Kafka topic user_activity
contains JSON messages like this:
JSON
{
"user_id": "user123",
"event_type": "page_view",
"page_url": "/products",
"timestamp": "2025-04-23T14:30:00Z"
}
Part 2: PostgreSQL Database Setup
- Install PostgreSQL: If you don’t have it already, install PostgreSQL.
- Create a Database and Table: Create a database (e.g.,
kafka_data
) and a table (user_activity_table
) to store the Kafka data:- SQL
CREATE DATABASE kafka_data;
CREATE TABLE user_activity_table ( user_id VARCHAR(255), event_type VARCHAR(255), page_url TEXT, timestamp TIMESTAMP WITH TIME ZONE );
- SQL
Part 3: Kafka Connect Setup and Configuration
- Install Kafka Connect: Kafka Connect is usually included with your Kafka distribution.
- Download PostgreSQL JDBC Driver: Download the PostgreSQL JDBC driver (
postgresql-*.jar
) and place it in the Kafka Connect plugin path. - Configure a JDBC Sink Connector: Create a configuration file (e.g.,
postgres_sink.properties
) for the JDBC Sink Connector:- Properties
name=postgres-sink-connector connector.class=io.confluent.connect.jdbc.JdbcSinkConnector tasks.max=1 topics=user_activity connection.url=jdbc:postgresql://your_postgres_host:5432/kafka_data connection.user=your_postgres_user connection.password=your_postgres_password table.name.format=user_activity_table insert.mode=insert pk.mode=none value.converter=org.apache.kafka.connect.json.JsonConverter value.converter.schemas.enable=false
- Replace
your_postgres_host
,your_postgres_user
, andyour_postgres_password
with your PostgreSQL connection details. topics
: Specifies the Kafka topic to consume from.connection.url
: JDBC connection string for PostgreSQL.table.name.format
: The name of the table to write to.value.converter
: Specifies how to convert the Kafka message value (we assume JSON).
- Replace
- Properties
Start Kafka Connect: Run the Kafka Connect worker, pointing it to your connector configuration:
- Bash
./bin/connect-standalone.sh config/connect-standalone.properties config/postgres_sink.properties
config/connect-standalone.properties
would contain the basic Kafka Connect worker configuration (broker list, plugin paths, etc.).
Part 4: Producing Sample Data to Kafka (Python)
Here’s a simple Python script using the kafka-python
library to produce sample JSON data to the user_activity
topic:
Python
from kafka import KafkaProducer
import json
import datetime
import time
KAFKA_BROKER = 'your_kafka_broker:9092'
# Replace with your Kafka broker address
KAFKA_TOPIC = 'user_activity'
producer = KafkaProducer(
bootstrap_servers=[KAFKA_BROKER],
value_serializer=lambda x: json.dumps(x).encode('utf-8')
)
try:
for i in range(5):
timestamp = datetime.datetime.utcnow().isoformat() + 'Z'
user_activity_data = {
"user_id": f"user{100 + i}",
"event_type": "click",
"page_url": f"/item/{i}",
"timestamp": timestamp
}
producer.send(KAFKA_TOPIC, value=user_activity_data)
print(f"Sent: {user_activity_data}")
time.sleep(1)
except Exception as e:
print(f"Error sending data: {e}")
finally:
producer.close()
- Replace
your_kafka_broker:9092
with the actual address of your Kafka broker. - This script sends a few sample JSON messages to the
user_activity
topic.
Part 5: Connecting Tableau to PostgreSQL
- Open Tableau Desktop.
- Under “Connect,” select “PostgreSQL.”
- Enter the connection details:
- Server:
your_postgres_host
- Database:
kafka_data
- User:
your_postgres_user
- Password:
your_postgres_password
- Port:
5432
(default)
- Server:
- Click “Connect.”
- Select the
public
schema (or the schema whereuser_activity_table
resides). - Drag the
user_activity_table
to the canvas. - You can now start building visualizations in Tableau using the data from the
user_activity_table
, which is being populated in near real-time by Kafka Connect.
Limitations and Considerations:
- Not True Real-time in Tableau: Tableau will query the PostgreSQL database based on its refresh settings (live connection or scheduled extract). It won’t have a direct, push-based real-time stream from Kafka.
- Complexity: Setting up Kafka Connect and a database adds complexity compared to a direct connector.
- Data Transformation: You might need to perform more complex transformations within PostgreSQL or Tableau.
- Error Handling: Robust error handling is crucial in a production Kafka Connect setup.
Alternative (Conceptual – No Simple Code): Using a Real-time Data Platform (e.g., Rockset)
While providing a full, runnable code example for a platform like Rockset is beyond a simple snippet, the concept involves:
- Rockset Kafka Integration: Configuring Rockset to connect to your Kafka cluster and continuously ingest data from the
user_activity
topic. Rockset handles schema discovery and indexing. - Tableau Rockset Connector: Using Tableau’s native Rockset connector (you’d need a Rockset account and API key) to directly query the real-time data in Rockset.
This approach offers lower latency for real-time analytics in Tableau compared to the database sink method but involves using a third-party service.
In conclusion, while direct Kafka connectivity in Tableau is limited, using Kafka Connect to pipe data into a Tableau-supported database (like PostgreSQL) provides a practical way to visualize near real-time data with the help of configuration and standard database connection methods. For true low-latency real-time visualization, exploring dedicated real-time data platforms with Tableau connectors is the more suitable direction.