Retail Lakehouse with Debezium, Kafka, Iceberg, and Trino¶
A streaming data pipeline that simulates a global e-commerce inventory system. This project uses Debezium and Kafka to capture real-time changes (CDC) from a PostgreSQL database and writes them into a unified Iceberg lakehouse table. Trino enables fast, federated SQL analytics on the evolving dataset.
Architecture Overview¶
Use Case: Global Retail Inventory Sync¶
Imagine an international e-commerce company with separate inventory databases per region (e.g., US, EU, ASIA). This system continuously syncs inventory changes into a central Iceberg table for analytics and reporting — without batch jobs.
Features¶
- Real-time inventory synchronization via Debezium
- Query data instantly with Trino using ANSI SQL
- Pluggable catalog support (e.g., REST, Hive)
- Cloud-native deployment using Kubernetes