Interacting with Apache Polaris¶
Polaris exposes two distinct APIs on two distinct ports. Understanding which API to use, and when, is the key to working with Polaris effectively.
| API | Port | Purpose |
|---|---|---|
| Management API | 8181 |
Admin operations: create/delete catalogs, manage principals, assign roles |
| Iceberg REST API | 8181 |
Catalog operations: create namespaces, create/read/write Iceberg tables |
| Management service | 8182 |
Health checks (/q/health) and Prometheus metrics |
兩套 API,同一個 port
Management API 和 Iceberg REST API 都在 port 8181,但 URL prefix 不同:
- Management API:
/api/management/v1/... - Iceberg REST API:
/api/catalog/v1/...
區別很重要:建立 catalog 是 Management API 的工作;Trino 和 Kafka connector 與 Polaris 互動走的是 Iceberg REST API。
Port Forward¶
All examples in this article use a local port-forward. Run this before executing any curl commands:
Authentication¶
Both APIs require a Bearer token. Polaris uses the OAuth2 client_credentials flow.
curl -s -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "grant_type=client_credentials&client_id=root&client_secret=secret&scope=PRINCIPAL_ROLE:ALL"
Result
Save the token for reuse:
TOKEN=$(curl -s -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "grant_type=client_credentials&client_id=root&client_secret=secret&scope=PRINCIPAL_ROLE:ALL" \
| python3 -c "import sys,json; print(json.load(sys.stdin)['access_token'])")
client_id 和 client_secret 的來源
這就是部署時在 polaris-bootstrap-secret 裡設定的值。格式是 realm,username,password,其中:
username→client_idpassword→client_secret
如果是自動生成的帳號(沒有設定 bootstrap secret),從 pod log 找到 credentials 行:
這裡b65181da2cd9dba7 是 client_id,5bc240fcb49f808f94becdd811bc721f 是 client_secret。
scope=PRINCIPAL_ROLE:ALL 是什麼
Polaris 的 token scope 格式是 PRINCIPAL_ROLE:<role_name>,代表取得特定 principal role 的授權。
PRINCIPAL_ROLE:ALL 是 root principal 專用,代表取得所有權限。一般 service account 應該只取它被授予的 role。
Management API¶
Create a Catalog¶
A catalog is the top-level unit in Polaris. It maps to a storage location on MinIO.
curl -s -X POST http://localhost:8181/api/management/v1/catalogs \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "iceberg",
"type": "INTERNAL",
"properties": {
"default-base-location": "s3://retail-lakehouse-7dj2/warehouse"
},
"storageConfigInfo": {
"storageType": "S3",
"allowedLocations": ["s3://retail-lakehouse-7dj2/"],
"pathStyleAccess": true
}
}'
Result
{
"type": "INTERNAL",
"name": "iceberg",
"properties": {
"default-base-location": "s3://retail-lakehouse-7dj2/warehouse"
},
"createTimestamp": 1774886190664,
"storageConfigInfo": {
"pathStyleAccess": true,
"storageType": "S3",
"allowedLocations": [
"s3://retail-lakehouse-7dj2/warehouse",
"s3://retail-lakehouse-7dj2/"
]
}
}
storageConfigInfo 的欄位
| 欄位 | 說明 |
|---|---|
storageType |
必填。支援 S3、GCS、AZURE、FILE。MinIO 用 S3。 |
allowedLocations |
Polaris 允許存取的 S3 路徑清單。Table location 必須在這個範圍內,否則 Polaris 拒絕操作。 |
default-base-location |
建立 namespace / table 時,如果沒有指定 location,預設用這個路徑加上資源名稱。 |
pathStyleAccess |
Boolean。設 true 啟用 path-style URL(MinIO 必須)。注意:這是頂層 boolean 欄位,不是 s3.pathStyleAccess 字串。 |
roleArn |
AWS IAM Role ARN,用於 STS AssumeRole。MinIO 不用 IAM,省略這個欄位,不要設空字串(會報錯)。 |
pathStyleAccess 欄位格式
pathStyleAccess 是 S3StorageConfigInfo 的頂層 boolean 欄位。使用字串 "s3.pathStyleAccess": "true" 會被 Polaris 忽略,導致 Polaris 以 virtual-hosted style 存取 MinIO,產生:
正確寫法是 "pathStyleAccess": true(boolean,不帶引號,不加 s3. prefix)。
roleArn: \"\" 會報錯
如果把 roleArn 設成空字串,Polaris 會回傳:
roleArn 欄位。
List Catalogs¶
curl -s http://localhost:8181/api/management/v1/catalogs \
-H "Authorization: Bearer $TOKEN" | python3 -m json.tool
Delete a Catalog¶
curl -s -X DELETE http://localhost:8181/api/management/v1/catalogs/iceberg \
-H "Authorization: Bearer $TOKEN"
Iceberg REST API¶
The Iceberg REST API is what Trino and the Kafka Iceberg connector use. Polaris follows the Iceberg REST Catalog spec.
Discover the Catalog Prefix¶
Before calling any Iceberg REST endpoints, fetch the config to find the catalog prefix. Polaris uses the catalog name as the prefix, embedded in every subsequent URL:
curl -s "http://localhost:8181/api/catalog/v1/config?warehouse=iceberg" \
-H "Authorization: Bearer $TOKEN" | python3 -m json.tool
Result
{
"defaults": {
"default-base-location": "s3://retail-lakehouse-7dj2/warehouse"
},
"overrides": {
"prefix": "iceberg"
},
"endpoints": [
"GET /v1/{prefix}/namespaces",
"POST /v1/{prefix}/namespaces",
"GET /v1/{prefix}/namespaces/{namespace}/tables",
"POST /v1/{prefix}/namespaces/{namespace}/tables",
"..."
]
}
prefix 的作用
overrides.prefix 的值(這裡是 iceberg)會被嵌入後續所有 Iceberg REST API 的 URL 路徑,格式是:
這個設計允許一個 Polaris 實例同時服務多個 catalog,每個 catalog 有自己的 prefix 作為隔離。
Trino 在設定 Iceberg catalog 時,會先呼叫這個 config endpoint 取得 prefix,然後用它組合後續的 API 路徑。
List Namespaces¶
curl -s "http://localhost:8181/api/catalog/v1/iceberg/namespaces" \
-H "Authorization: Bearer $TOKEN"
Create a Namespace¶
curl -s -X POST "http://localhost:8181/api/catalog/v1/iceberg/namespaces" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"namespace": ["retail"], "properties": {}}'
Result
{
"namespace": ["retail"],
"properties": {
"location": "s3://retail-lakehouse-7dj2/warehouse/retail/"
}
}
Polaris 自動根據 catalog 的 default-base-location 計算出 namespace 的 storage path。
Create a Table¶
Creating a table is the first operation that triggers a real S3 write — Polaris writes the initial metadata file to MinIO.
curl -s -X POST "http://localhost:8181/api/catalog/v1/iceberg/namespaces/retail/tables" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "orders",
"schema": {
"type": "struct",
"schema-id": 0,
"fields": [
{"id": 1, "name": "id", "required": true, "type": "long"},
{"id": 2, "name": "amount", "required": false, "type": "double"}
]
},
"write-order": {"order-id": 0, "fields": []},
"stage-create": false
}'
Table 建立時發生什麼事
呼叫 POST /namespaces/{ns}/tables 之後,Polaris 會做以下幾件事:
- 在 in-memory store 記錄 table metadata(schema、partition spec 等)
- 決定 table 的 storage location(例如
s3://retail-lakehouse-7dj2/warehouse/retail/orders/) - 呼叫 AWS SDK,在 MinIO 的對應路徑寫入初始的
metadata.json檔案 - 把 table location 和 vended credentials 回傳給呼叫方
步驟 3 是 Polaris 第一次實際連接 MinIO 的時機。如果 AWS_REGION、AWS_ENDPOINT_URL_S3 沒有正確設定,這一步就會失敗:
{
"error": {
"message": "Failed to get subscoped credentials: Unable to load region from any of the providers",
"code": 422
}
}
這個錯誤代表 Polaris pod 裡缺少 AWS_REGION 環境變數。
List Tables¶
curl -s "http://localhost:8181/api/catalog/v1/iceberg/namespaces/retail/tables" \
-H "Authorization: Bearer $TOKEN" | python3 -m json.tool
Get Table Metadata¶
curl -s "http://localhost:8181/api/catalog/v1/iceberg/namespaces/retail/tables/orders" \
-H "Authorization: Bearer $TOKEN" | python3 -m json.tool
The response includes the full table metadata: schema, partition spec, sort order, snapshot history, and the metadata-location pointing to the metadata file on MinIO.
MinIO Connectivity: When Does Polaris Actually Connect?¶
Polaris does not maintain a persistent connection to MinIO. It connects to MinIO only at specific moments:
| Operation | Does Polaris touch MinIO? |
|---|---|
| Start up | No |
| Create catalog | No (only validates config) |
| Create namespace | No |
| Create table | Yes — writes initial metadata.json |
| Update table (commit) | Yes — writes new metadata snapshot |
| Read table metadata | No (reads from in-memory store) |
| Client requests vended credentials | Yes — calls STS to generate temporary credentials |
Vended Credentials 是什麼
Vended credentials 是 Polaris 代替客戶端(Trino)向 STS 取得的臨時存取金鑰,有效期通常是 1 小時。
流程:
- Trino 要讀
retail.orders這張表 - Trino 向 Polaris 的 Iceberg REST API 請求 table metadata
- Polaris 的 response 裡包含
credentials欄位: - Trino 用這組臨時金鑰直接連 MinIO 讀取 Parquet 檔案
- Polaris 不參與實際的資料傳輸
這個設計的好處是客戶端不需要持有 MinIO root 密碼,只拿得到時效性的臨時金鑰,每次操作都要重新向 Polaris 取得。
Polaris URL Summary¶
Port 8181 (main):
POST /api/catalog/v1/oauth/tokens → Get auth token
GET /api/catalog/v1/config?warehouse=X → Discover catalog prefix
Management API (/api/management/v1/):
POST /catalogs → Create catalog
GET /catalogs → List catalogs
DELETE /catalogs/{name} → Delete catalog
Iceberg REST API (/api/catalog/v1/{prefix}/):
GET /namespaces → List namespaces
POST /namespaces → Create namespace
DELETE /namespaces/{ns} → Delete namespace
GET /namespaces/{ns}/tables → List tables
POST /namespaces/{ns}/tables → Create table
GET /namespaces/{ns}/tables/{table} → Get table metadata
DELETE /namespaces/{ns}/tables/{table} → Drop table
Port 8182 (management, headless service):
GET /q/health → Health check
GET /q/metrics → Prometheus metrics