Amazon S3 BigLake External Table¶
Create Amazon S3 Bucket and Upload Data¶
Create an Amazon S3 bucket in the us-west-2
to store the data that you want to query with BigQuery.
resource "random_string" "velano_collectives_suffix" {
length = 4
special = false
numeric = true
lower = true
upper = false
}
module "aws_s3_bucket_velano_collectives" {
source = "terraform-aws-modules/s3-bucket/aws"
version = "~> 4.10.1"
bucket = "velano-collectives-${random_string.velano_collectives_suffix.result}"
}
resource "aws_s3_object" "products" {
bucket = module.aws_s3_bucket_velano_collectives.s3_bucket_id
key = "products.csv"
source = "../data/products.csv"
}
products.csv
is a sample data file that you can upload to the S3 bucket.
Connect to Amazon S3¶
In order to connect BigQuery to Amazon S3, you need to create a BigQuery connection that allows BigQuery to access data stored in S3. This involves creating an AWS IAM policy and role, configuring a trust relationship, and setting up the BigQuery connection.
Here are the steps to follow:
- Create an AWS IAM Policy and Role for BigQuery
- Create a BigQuery connection
- Add a trust relationship to the AWS role
- Add a trust policy to the AWS role
- Configure a custom AWS identity provider
See Connect to Amazon S3 for more details.
After completing these steps, the AWS IAM policy and role would look like this:
And the BigQuery connection would look like this:
Create Amazon S3 BigLake Tables¶
Create a Dataset¶
Create a Table¶
create external table products_aws.products
with connection `aws-us-west-2.aws`
options (
format = "csv",
uris = ["s3://velano-collectives-n1y3/products.csv"]
)
Query Amazon S3 BigLake Tables¶
In order to query the Amazon S3 BigLake table, you need to have the following roles assigned to your user or service account:
- BigQuery Connection User (
roles/bigquery.connectionUser
) - BigQuery Data Viewer (
roles/bigquery.dataViewer
) - BigQuery User (
roles/bigquery.user
)
After you have the necessary permissions, you can query the table to see the data: