Best Practices for Optimizing Apache Iceberg Workloads in AWS ¶

General Best Practices ¶

Tip

Use Iceberg format version 2.
Use the AWS Glue Data Catalog as your data catalog.
Use the AWS Glue Data Catalog as lock manager.
Use Zstandard (ZSTD) compression

Optimizing Storage ¶

Tip

Enable S3 Intelligent-Tiering
Archive or delete historic snapshots
- Delete old snapshots: expire_snapshots
- Set retention policies for specific snapshots: use Historical Tags
- Archive old snapshots: S3 Tags + S3 Life cycle rules
Delete orphan files
- remove_orphan_files
- VACUUM statement: equals to expire_snapshots + remove_orphan_files in Spark.

Historical Tags ¶

to mark specific snapshots and define a retention policy for them.

ALTER TABLE glue_catalog.db.table
CREATE TAG 'EOM-01' AS OF VERSION 30 RETAIN 365 DAYS

S3 Tags ¶

spark.sql.catalog.my_catalog.s3.write.tags.my_key1=my_val1
spark.sql.catalog.my_catalog.s3.delete-enabled=false
spark.sql.catalog.my_catalog.s3.delete.tags.my_key=to_archive
spark.sql.catalog.my_catalog.s3.write.table-tag-enabled=true
spark.sql.catalog.my_catalog.s3.write.namespace-tag-enabled=true

`VACUUM` statement ¶

CREATE TABLE my_table (
    ...
)
TBLPROPERTIES (
    'vacuum_max_snapshot_age_seconds' = '432000', -- 5 days
    'vacuum_min_snapshots_to_keep' = '1',
    'vacuum_max_metadata_files_to_keep' = '100'
);

VACUUM glue_catalog.db.my_table

Optimizing Read Performance ¶

Tip

Partitioning
- Partition your data
  - Identify columns that are frequently used in queries
  - Choose a low cardinality partition column to avoid creating an excessive number of partitions
- Use Hidden Partitioning
- Use Partition Evolution
Tuning File Size
- Set target file size and row group size: write.target-file-size-bytes, write.parquet.row-group-size-bytes, write.distribution-mode
- Run regular compaction
Optimize Column Statistics
- write.metadata.metrics.max-inferred-column-defaults: 100
Choose the Right Update Strategy (CoR)
- write.update.mode, write.delete.mode, and write.merge.mode can be set at the table level or independently on the application side.
Use ZSTD Compression
- write.<file_type>.compression-codec
Set the Sort Order
- ALTER TABLE ... WRITE ORDERED BY
- ALTER TABLE ... WRITE DISTRIBUTED BY PARTITION
- ALTER TABLE prod.db.sample WRITE DISTRIBUTED BY PARTITION LOCALLY ORDERED BY category, id

As a rule of thumb, "too many partitions" can be defined as a scenario where the data size in the majority of partitions is less than 2-5 times the value set by target-file-size-bytes.

Optimizing Write Performance ¶

Tip

Set the Table Distribution Mode
- write.distribution-mode: none, hash, range
- Spark Structured Streaming applications → set write.distribution-mode to none.
Choose the Right Update Strategy (MoR)
Choose the Right File Format
- Set write-format to Avro (row-based format) if write speed is important for your workload.

By default, Iceberg's compaction doesn't merge delete files unless you change the default of the delete-file-threshold property to a smaller value.

Maintaining Tables by Using Compaction ¶

Common Compaction Properties¶

Configuration	Default	Description
`max-concurrent-file-group-rewrites`	5	Maximum number of file groups to be simultaneously rewritten
`partial-progress.enabled`	false	Enable committing groups of files prior to the entire rewrite completing
`partial-progress.max-commits`	10	Maximum amount of commits that this rewrite is allowed to produce if partial progress is enabled
`rewrite-job-order`	None	Force the rewrite job order based on the value. Options are `bytes-asc`, `bytes-desc`, `files-asc`, and `files-desc`
`max-file-group-size-bytes`	100GB	Specifies the maximum amount of data that can be rewritten in a single file group
`min-file-size-bytes`	75% of target file size	Files under this threshold will be considered for rewriting regardless of any other criteria
`max-file-size-bytes`	180% of target file size	Files with sizes above this threshold will be considered for rewriting regardless of any other criteria
`min-input-files`	5	Any file group exceeding this number of files will be rewritten regardless of other criteria
`delete-file-threshold`	2147483647	Minimum number of deletes that needs to be associated with a data file for it to be considered for rewriting

Example¶

The Iceberg table consists of four partitions
The Spark application creates a total of four file groups to process.
A file group is an Iceberg abstraction that represents a collection of files that will be processed by a single Spark job. That is, the Spark application that runs compaction will create four Spark jobs to process the data.
The partition labeled month=01 includes two file groups because it exceeds the maximum size constraint of 100 GB.
In contrast, the month=02 partition contains a single file group because it's under 100 GB.
The month=03 partition doesn't satisfy the default minimum input file requirement of five files. As a result, it won't be compacted.
Lastly, although the month=04 partition doesn't contain enough data to form a single file of the desired size, the files will be compacted because the partition includes more than five small files.