Trino Cluster Deployment¶
Prerequisites
- Complete all the requirements in the Prerequisites section.
- OAuth 2.0 Client ID and Client Secret (
OAUTH2_CLIENT_ID
andOAUTH2_CLIENT_SECRET
) for Google authentication. See here for instructions. - GCP service account path in your local machine (
GCP_SA_INPUT_PATH
) for accessing BigQuery Datasets under your GCP project (GCP_PROJECT_ID
). See here for instructions. - AWS credentials (
AWS_ACCESS_KEY
andAWS_SECRET_KEY
), regions (AWS_REGION
), and S3 bucket (ICEBERG_S3_URL
) for accessing Iceberg table and Glue Data Catalog. See here for instructions. - AWS S3 Bucket for Exchange Manager (
EXCHANGE_S3_URLS
). See here for instructions.
Without further ado, let's get started with the deployment:
The installation script install.sh
will perform the following steps:
- Generate
.env
andvalues.yaml
files - Create the
trino
namespace. - Generate TLS certificates and create the Kubernetes secret.
- Generate the BigQuery service account secret and create the Kubernetes secret.
- Install Trino using Helm with the generated
values.yaml
.
install.sh
#!/bin/bash
set -euo pipefail
# 產生 .env 和 values.yaml 檔案
bash generate-env.sh
# 載入 .env 當作環境變數
set -a
# shellcheck disable=SC1090
source ".env"
set +a
kubectl create namespace trino || true
bash generate-tls-certs.sh
kubectl apply -f ./trino-tls-secret.yaml -n trino
echo "Generating Kubernetes secret for Accessing BigQuery..."
kubectl create secret generic trino-bigquery-secret \
--from-file=trino-sa.json="$GCP_SA_INPUT_PATH" \
--dry-run=client -o yaml > "./trino-bigquery-secret.yaml"
echo "trino-bigquery-secret.yaml generated successfully."
kubectl apply -f ./trino-bigquery-secret.yaml -n trino
helm repo add trino https://trinodb.github.io/charts/
helm repo update
helm install trino trino/trino \
-f values.yaml \
-n trino \
--version 1.39.1 \
For the script to work correctly, you need to set the following environment variables during the execution of the script. Another option is to set them in a .env
file in the same directory as the script.
.env
-
INTERNAL_SHARED_SECRET
: A secret string for internal communication security between Trino nodes. Used asinternal-communication.shared-secret
. -
OAUTH2_CLIENT_ID
: The client ID for OAuth 2.0 authentication, used to enable Google login for the Trino Web UI. Referenced ashttp-server.authentication.oauth2.client-id
. -
OAUTH2_CLIENT_SECRET
: The client secret for OAuth 2.0 authentication, paired with the client ID for secure login. Used ashttp-server.authentication.oauth2.client-secret
. -
GCP_BQ_PROJECT_ID
: The Google Cloud project ID, required for the BigQuery connector in Trino. Used asbigquery.project-id
. -
GCP_SA_INPUT_PATH
: Path to the service account JSON file for Google Cloud authentication. Used to create the BigQuery service account K8S secret. -
AWS_ACCESS_KEY
: AWS access key for authenticating to AWS services (S3, Glue, etc.). Used for S3 and Glue access, and for exchange manager S3 configuration. -
AWS_SECRET_KEY
: AWS secret key, paired with the access key for AWS authentication. Used for S3, Glue, and exchange manager S3 configuration. -
AWS_REGION
: The AWS region where your S3 buckets and Glue Data Catalog are located. Used for S3, Glue, and exchange manager S3 configuration. -
ICEBERG_S3_URL
: The S3 URL (bucket path) for storing Iceberg table data. Used ashive.metastore.glue.default-warehouse-dir
. -
EXCHANGE_S3_URLS
: S3 URLs for Trino's exchange manager, which handles intermediate data during distributed query execution. Used asexchangeManager.baseDir
.
These variables are substituted into the Trino Helm values file and Kubernetes secrets using envsubst
to configure authentication, storage, and cloud integration for your Trino deployment.
If you don't like my script and want to do it step by step manually, please continue reading. This article will walk you through how to deploy a Trino Cluster on Kubernetes step by step, explaining each part along the way.
Generating Environment and Values Files¶
generate-env.sh
#!/bin/bash
set -euo pipefail
ENV_FILE=".env"
# 若有舊的 .env,就先載入當作預設值
if [[ -f "$ENV_FILE" ]]; then
set -a
# shellcheck disable=SC1090
source "$ENV_FILE"
set +a
fi
# 定義一個函式來詢問變數
ask() {
local name="$1"
local current="${!name-}" # 取目前環境的值(若有)
local prompt
if [[ -n "${current:-}" ]]; then
prompt="Enter $name (default: $current): "
else
prompt="Enter $name: "
fi
read -r -p "$prompt" input
# 空輸入就沿用舊值;否則更新
if [[ -z "${input:-}" && -n "${current:-}" ]]; then
export "$name=$current"
else
export "$name=$input"
fi
}
echo "Generating .env file..."
echo "=== Fill in variables to generate ENV_FILE ==="
# 清空並重新創建 .env 檔案
> "$ENV_FILE"
# 不需要使用者輸入的所有變數
INTERNAL_SHARED_SECRET="$(openssl rand 512 | base64)"
export INTERNAL_SHARED_SECRET
printf 'INTERNAL_SHARED_SECRET=%q\n' "$INTERNAL_SHARED_SECRET" >> "$ENV_FILE"
# 需要使用者輸入的所有變數(順序決定互動順序)
VARS=(
OAUTH2_CLIENT_ID
OAUTH2_CLIENT_SECRET
GCP_BQ_PROJECT_ID
GCP_SA_INPUT_PATH
AWS_ACCESS_KEY
AWS_SECRET_KEY
AWS_REGION
ICEBERG_S3_URL
EXCHANGE_S3_URLS
)
for v in "${VARS[@]}"; do
ask "$v"
val="${!v-}"
printf '%s=%q\n' "$v" "$val" >> "$ENV_FILE"
done
echo "$ENV_FILE generated successfully."
# 檢查 envsubst command 是否存在
if ! command -v envsubst >/dev/null 2>&1; then
echo "Error: envsubst not found. Please install gettext." >&2
exit 1
fi
echo "Generating values.yaml from values-template.yaml..."
envsubst < "values-template.yaml" > "values.yaml"
echo "values.yaml generated successfully."
Loading environment variables from the .env
file¶
Load the environment variables from the .env
file so that they can be used in subsequent commands:
Creating the Namespace¶
Create the trino namespace, which is where we deploy our Trino cluster:
Setting up TLS Certificates¶
Execute generate-tls-certs.sh
to generate TLS certificate and Kubernetes secret and then apply the secret to the trino
namespace:
Result
Creating TLS certificates for Trino...
Step 1: Creating Private Key...
Step 1: Completed.
Step 2: Creating Certificate...
Step 2: Completed.
Step 3: Combining Private Key and Certificate...
Step 3: Completed.
Step 4: Creating Kubernetes secret...
Step 4: Completed.
Certificate generation completed successfully!
Generated files:
- .cert/trino-dev.pem (with private key and certificate)
- trino-tls-secret.yaml (Kubernetes secret manifest)
secret/trino-tls-secret created
Once executed, the trino-tls-secret.yaml
file will have the following structure:
trino-tls-secret.yaml
Configuring BigQuery Service Account¶
Generate the BigQuery service account secret manifest file and apply it to the trino
namespace:
echo "Generating Kubernetes secret for Accessing BigQuery..."
kubectl create secret generic trino-bigquery-secret \
--from-file=trino-sa.json="$GCP_SA_INPUT_PATH" \
--dry-run=client -o yaml > "./trino-bigquery-secret.yaml"
echo "trino-bigquery-secret.yaml generated successfully."
kubectl apply -f ./trino-bigquery-secret.yaml -n trino
Result
Once executed, the trino-bigquery-secret.yaml
file will have the following structure:
trino-bigquery-secret.yaml
Installing Trino¶
First, add and update the Trino Helm repository. Then, deploy Trino in the trino
namespace using the generated values.yaml
file:
helm repo add trino https://trinodb.github.io/charts/
helm repo update
helm install trino trino/trino \
-f values.yaml \
-n trino \
--version 1.39.1 \
Result
Verifying the Deployment¶
After the installation completes, verify that Trino has been deployed successfully. Show the deployed Trino release:
Result
Then, check the status of all resources in the trino
namespace:
Result
NAME READY STATUS RESTARTS AGE
pod/trino-coordinator-6fdfb7bf84-tjwfc 1/1 Running 0 5m38s
pod/trino-worker-777d595c66-dml67 1/1 Running 0 5m38s
pod/trino-worker-777d595c66-pv9h2 1/1 Running 0 5m38s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/trino ClusterIP 10.99.208.168 <none> 8080/TCP,8443/TCP 5m38s
service/trino-worker ClusterIP None <none> 8080/TCP 5m38s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/trino-coordinator 1/1 1 1 5m38s
deployment.apps/trino-worker 2/2 2 2 5m38s
NAME DESIRED CURRENT READY AGE
replicaset.apps/trino-coordinator-6fdfb7bf84 1 1 1 5m38s
replicaset.apps/trino-worker-777d595c66 2 2 2 5m38s
Perfect! The deployment is successful. As you can see, the Trino coordinator and worker pods are up and running, and all services have been created correctly. With the cluster now operational, we can proceed to access the Trino Web UI and Trino CLI to interact with the cluster.
Web UI¶
To access the Trino Web UI, start by port-forwarding the Trino service to your local machine:
Once the port forwarding is active, open your browser and navigate to https://127.0.0.1:8443
. Since we're using a self-signed certificate for development purposes, your browser will display a security warning. This is expected behavior and safe to bypass in a development environment.
To proceed, click "Advanced" and then "Accept the Risk and Continue" (the exact wording may vary depending on your browser).
Since OAuth 2.0 authentication is enabled, you'll be automatically redirected to the Google login page for authentication.
After logging in with your Google account, you will be redirected back to the Trino Web UI.
Once authenticated successfully, you'll be redirected back to the Trino Web UI where you can monitor queries, view cluster status, and manage your Trino environment.
Trino CLI¶
For command-line access to Trino, you'll need to download and install the Trino CLI tool.
First, download the trino-cli-476-executable.jar
file from the Maven repository.
Next, rename the file to trino
, make it executable, and move it to a directory in your PATH (such as /usr/local/bin
):
cd ~/Projects/retail-lakehouse/trino
curl -L -o trino-cli-476-executable.jar https://repo1.maven.org/maven2/io/trino/trino-cli/476/trino-cli-476-executable.jar
chmod +x trino-cli-476-executable.jar
sudo mv trino-cli-476-executable.jar /usr/local/bin/trino
Verify the installation by checking the version:
To connect to your Trino cluster, use the CLI with external authentication enabled:
trino --server https://127.0.0.1:8443 \
--external-authentication \
--insecure \
--user "user@example.com"
Authentication Flow
The CLI authentication process works as follows:
- Start the CLI with the
--external-authentication
option and execute a query. - The CLI starts and connects to Trino.
- A message appears in the CLI directing you to open a browser with a specified URL when the first query is submitted.
- Open the URL in a browser and follow through the authentication process.
- The CLI automatically receives a token.
- When successfully authenticated in the browser, the CLI proceeds to execute the query.
- Further queries in the CLI session do not require additional logins while the authentication token remains valid. Token expiration depends on the external authentication type configuration.
- Expired tokens force you to log in again.
This authentication method ensures secure access to your Trino cluster while maintaining ease of use for interactive queries.
Cleanup¶
To remove the Trino cluster: