Faker Connector¶
This guide covers how to use the faker connector in Trino to generate realistic fake data for testing and development purposes.
Overview¶
The faker connector uses the Datafaker library to generate realistic fake data. It's perfect for:
- Testing queries with realistic data
- Data pipeline development
- Performance testing
- Demo environments
Configuration¶
The faker connector is configured with the following settings:
trino/values-template.yaml
catalogs:
faker: |
connector.name=faker
faker.null-probability=0.1
faker.default-limit=1000
faker.locale=en
tpch: |
connector.name=tpch
tpch.splits-per-node=4
tpcds: |
connector.name=tpcds
tpcds.splits-per-node=4
Here's a breakdown of the configuration:
connector.name=faker
: Specifies the faker connectorfaker.null-probability=0.1
: 10% chance of null values in generated datafaker.default-limit=1000
: Default row limit for queriesfaker.locale=en
: English locale for generated data patterns
Creating Tables¶
The faker connector requires you to create tables with specific generator expressions:
1. Prices Table¶
test-faker.sql - Prices Table
-- Create a prices table with faker connector
CREATE TABLE faker.default.prices (
currency VARCHAR NOT NULL WITH (generator = '#{Currency.code}'),
price DECIMAL(8,2) NOT NULL WITH (min = '0')
);
2. Customer Table¶
test-faker.sql - Customer Table
-- Create a comprehensive customer table
CREATE TABLE faker.default.customer (
id UUID NOT NULL,
name VARCHAR NOT NULL WITH (generator = '#{Name.first_name} #{Name.last_name}'),
email VARCHAR NOT NULL WITH (generator = '#{Internet.emailAddress}'),
phone VARCHAR NOT NULL WITH (generator = '#{PhoneNumber.phoneNumber}'),
address VARCHAR NOT NULL WITH (generator = '#{Address.fullAddress}'),
born_at DATE WITH (min = '1950-01-01', max = '2005-01-01'),
age_years INTEGER WITH (min = '18', max = '75'),
group_id INTEGER WITH (allowed_values = ARRAY['10', '32', '81', '99'])
);
Example Queries¶
Basic Data Generation¶
test-faker.sql - Basic Queries
-- Query the prices table
SELECT * FROM faker.default.prices LIMIT 10;
-- Query the customer table
SELECT * FROM faker.default.customer LIMIT 10;
-- Test the random_string function
SELECT faker.default.random_string('#{Name.first_name}') as first_name;
Advanced Queries¶
test-faker.sql - Advanced Queries
-- Join customer and prices tables
SELECT
c.name as customer_name,
c.email,
p.currency,
p.price as order_amount
FROM faker.default.customer c
CROSS JOIN faker.default.prices p
LIMIT 10;
-- Create JSON objects from prices
SELECT JSON_OBJECT(KEY currency VALUE price) AS complex_price
FROM faker.default.prices
LIMIT 5;
-- Sample query to test various generators
SELECT
faker.default.random_string('#{Name.fullName}') as full_name,
faker.default.random_string('#{Internet.emailAddress}') as email,
faker.default.random_string('#{Address.city}') as city,
faker.default.random_string('#{Company.name}') as company,
faker.default.random_string('#{Lorem.sentence}') as description;
Testing Commands¶
Run individual queries from the command line:
kubectl exec -it deployment/trino-coordinator --namespace trino -- trino --execute "SHOW TABLES FROM faker.default;"
kubectl exec -it deployment/trino-coordinator --namespace trino -- trino --execute "SELECT * FROM faker.default.customer LIMIT 5;"
Available Faker Tables¶
The faker connector has been tested and verified with these tables:
-
prices
- Currency codes and decimal prices (created and tested) -
customer
- Customer profiles with realistic data (created and tested)
Faker Functions¶
-
random_string()
- Generate custom fake data using Datafaker expressions
Available Generators¶
The faker connector supports numerous generators from the Datafaker library:
Personal Information¶
#{Name.firstName}
,#{Name.lastName}
,#{Name.fullName}
#{Internet.emailAddress}
,#{PhoneNumber.phoneNumber}
#{Address.fullAddress}
,#{Address.city}
,#{Address.country}
Business Data¶
#{Currency.code}
,#{Company.name}
#{Commerce.productName}
,#{Commerce.price}
Text Content¶
#{Lorem.sentence}
,#{Lorem.paragraph}
#{Lorem.words}
,#{Lorem.characters}
Dates and Numbers¶
#{Date.past}
,#{Date.future}
#{Number.randomDouble}
,#{Number.randomLong}
Financial Data¶
#{Finance.creditCard}
,#{Finance.iban}
#{Finance.bic}
,#{Finance.stockTicker}
For a complete list of available generators, see the Datafaker Documentation.
Column Constraints¶
You can apply various constraints to faker columns:
Value Ranges¶
-- Numeric ranges
age INTEGER WITH (min = '18', max = '75')
price DECIMAL(8,2) WITH (min = '0', max = '1000')
-- Date ranges
birth_date DATE WITH (min = '1950-01-01', max = '2005-01-01')
Allowed Values¶
-- Specific allowed values
status VARCHAR WITH (allowed_values = ARRAY['active', 'inactive', 'pending'])
priority INTEGER WITH (allowed_values = ARRAY['1', '2', '3', '4', '5'])
Null Probability¶
-- Override default null probability for specific columns
optional_field VARCHAR WITH (generator = '#{Lorem.word}', null_probability = '0.3')
Best Practices¶
- Use Appropriate Data Types: Match your production schema data types
- Set Realistic Constraints: Use min/max values that make sense for your domain
- Consider Cardinality: Use allowed_values for categorical data
- Test Join Performance: Cross joins can generate large result sets quickly
- Limit Result Sets: Always use LIMIT in development to avoid overwhelming queries
Common Use Cases¶
E-commerce Data¶
CREATE TABLE faker.default.products (
id UUID NOT NULL,
name VARCHAR NOT NULL WITH (generator = '#{Commerce.productName}'),
category VARCHAR WITH (allowed_values = ARRAY['electronics', 'clothing', 'books', 'home']),
price DECIMAL(10,2) WITH (min = '1.00', max = '999.99'),
description VARCHAR WITH (generator = '#{Lorem.sentence}'),
in_stock BOOLEAN NOT NULL
);
User Analytics¶
CREATE TABLE faker.default.user_events (
user_id UUID NOT NULL,
event_type VARCHAR WITH (allowed_values = ARRAY['login', 'logout', 'purchase', 'view']),
timestamp TIMESTAMP NOT NULL,
session_id VARCHAR WITH (generator = '#{Internet.uuid}'),
ip_address VARCHAR WITH (generator = '#{Internet.ipV4Address}')
);
Financial Transactions¶
CREATE TABLE faker.default.transactions (
transaction_id UUID NOT NULL,
account_number VARCHAR WITH (generator = '#{Finance.iban}'),
amount DECIMAL(12,2) WITH (min = '-10000', max = '10000'),
currency VARCHAR WITH (generator = '#{Currency.code}'),
transaction_date DATE WITH (min = '2020-01-01', max = '2024-12-31'),
merchant VARCHAR WITH (generator = '#{Company.name}')
);