Compaction - AWS EMR vs. AWS S3 Tables¶
Useful Information
- It takes EMR ~25 seconds of compute time to compact 1GB of data with an m5.xlarge instance. → \(0.0017/GB** or **\)0.17 for 100GB.
- Running Iceberg compaction with AWS EMR comes out to be ~3x cheaper than if you let AWS S3 Tables do it for you.
Cost of Compaction: AWS EMR vs. AWS S3 Tables
Experiment Settings
The writer produced a nominal 1 GB / minute across 100 partitions, resulting in approximately 100 files / minute, each ranging between 7 MB and 15 MB.
Experiment Execution Details
the average file size over ~20h time period
Cons
- Poor observability
- AWS only provides a CLI command (S3 Tables maintenance job status) to retrieve the last compaction status
- Arbitrary until 3h delay cannot customize
- No built in monitoring available
- Flawed Approach to Compaction
- Only can choose target file size
- It does not recognize that ideal compaction configurations are specific to different types of readers and writers.
- Compaction not effective for low latency workloads