Treasure Data¶
Metrics¶
In 2023, Treasure Data currently serves over 700 accounts and more than 6,000 users. Since 2022, platform usage has doubled. Each day, the system handles 2.5 million+ Trino queries and over 100,000 Hive queries. It processes more than 200 trillion rows daily and performs over 10 billion S3 GET requests per day to read partition data from AWS S3.
Overview¶


Tech Stack¶
- Wvlet, similar to BigQuery's pipe query syntax
- Trino
- Hive 4 (Hive 4.0.x comes with Iceberg 1.4.3 included.)
- Plazma (Closed Source)
- PostgreSQL (metadata)
- S3
- Real-time Storage
- Archive Storage
- fluentd
- embulk: Data Connector and Result Export

Data Architecture¶



Data Ingestion¶
Questions¶
- Data Lakehouse Architecture? proprietary (MPC1)
- How to deal with dbt Core? (dbt Fusion)
- SQLMesh
Amazon S3 Parquet Export Integration Amazon Elastic MapReduce Integration
https://www.linkedin.com/in/pramodmanjappa/