Join Optimization¶

Join Enumeration¶

Understanding JOIN optimization is crucial because JOIN order significantly impacts performance, particularly affecting intermediate result sizes and network transfer volumes. Different join orders can lead to vastly different intermediate result sizes. When you start with a JOIN that produces a massive intermediate table, each subsequent step requires moving and processing more data, resulting in slower performance and higher resource consumption. Conversely, starting with a JOIN that produces a small intermediate table leads to much better performance.

To address this challenge, Trino includes Cost-Based Join Enumeration by default. This algorithm uses table statistics (row counts, sizes, distributions) to estimate the cost of each possible JOIN order. It enumerates all possible JOIN execution sequences, compares their costs, and selects the most resource-efficient order. The system automatically chooses the lowest-cost (fastest) JOIN sequence for execution. This means the JOIN order you write in your SQL statements won't affect Trino's execution plan, Trino will automatically select the optimal JOIN order.

However, accurate table statistics are essential for Trino to correctly estimate costs. You can initialize these statistics using the ANALYZE command. This enables Trino to make optimal JOIN order decisions based on the latest data distribution and sizes.

Join Distribution¶

Beyond join order optimization, Trino must also determine how to distribute data across nodes during JOIN execution. Since Trino is a distributed query engine with multiple nodes processing queries collaboratively, JOIN operations require data from both tables to be located on the same node for comparison and hash table construction.

Trino uses a hash-based join algorithm where the system first designates one side as the build side, loading all its data into memory and creating a hash table based on the join key. The other side becomes the probe side, with the system reading its data row by row and querying the hash table using the join key to find matching rows and output results. This method is particularly effective when sufficient memory is available, and typically uses the smaller dataset as the build side to reduce memory usage and improve performance.

There are two main types of join distributions:

Partitioned joins: Each node participating in the query builds a hash table from only a fraction of the data. This approach requires redistributing both tables using a hash of the join key. While these joins can be slower than broadcast joins, they enable much larger joins overall.
Broadcast joins: Each node participating in the query builds a hash table from all the data, with data replicated to each node. Broadcast joins perform better when the build side is much smaller than the probe side. However, they require that build-side tables fit in memory on each node after filtering, whereas partitioned joins only need to fit in distributed memory across all nodes.

Info

Trino's cost-based optimization automatically handles these decisions. With cost-based join distribution selection, Trino automatically chooses between partitioned and broadcast joins. Similarly, cost-based join enumeration automatically determines which sides serve as probe and build.

Configuration

Join enumeration strategy is controlled by join_reordering_strategy session property (or optimizer.join-reordering-strategy config)

AUTOMATIC (default)
ELIMINATE_CROSS_JOINS
NONE.

Join distribution strategy uses join_distribution_type session property (or join-distribution-type config)

AUTOMATIC (default)
BROADCAST
PARTITIONED.

Replicated table size can be capped using join_max_broadcast_table_size session property (or join-max-broadcast-table-size config): defaulting to 100MB.

See Cost-based optimizations for more details.