Modeling Data¶

Installation¶

python3.10 -m venv dbt

source ~/.venvs/dbt/bin/activate

requirements.txt

dbt-core==1.9.4
dbt-bigquery==1.9.1
sqlfluff==3.4.0

pip install -r requirements.txt

dbt --version
Core:
  - installed: 1.9.4
  - latest:    1.9.4 - Up to date!

Plugins:
  - bigquery: 1.9.1 - Up to date!

Sources¶

_sources.yml

version: 2

sources:
  - name: feast
    description: >
      This source contains the raw data for the fraud detection project. It includes 
      transaction records, user account features, and labels indicating whether a 
      transaction was fraudulent.
    project: feast-oss
    dataset: fraud_tutorial
    meta:
      domain: fraud
      owner: "@kclai"
    tags:
      - fraud
      - pii
    tables:
      - name: transactions
      - name: user_account_features
      - name: user_has_fraudulent_transactions

Staging¶

naming convention: stg__<src>__<tbl>

stgfeasttransactions¶

stg__feast__transactions.sql

select
  src_account as src_account_id,
  dest_account as dest_account_id,
  amount,
  case
    when is_fraud = 1 then true
    when is_fraud = 0 then false
    else null
  end as is_fraud,
  timestamp as created_at
from {{ source('feast', 'transactions') }}

stg__feast__transactions.yml

version: 2

models:
  - name: stg__feast__transactions
    description: >
      This staging model standardizes raw transaction data for downstream fraud analysis. 
      It includes source and destination accounts, transaction amount, timestamp, and 
      a binary fraud label. Used as a foundational layer for detecting fraudulent behavior.
    config:
      event_time: created_at
      contract:
        enforced: true
    columns:
      - name: src_account_id
        data_type: string
        description: >
          The unique identifier of the source account that initiated the transaction.

      - name: dest_account_id
        data_type: string
        description: >
          The unique identifier of the destination account that received the transaction.

      - name: amount
        data_type: float
        description: >
          The monetary amount of the transaction. Assumes a consistent currency.

      - name: is_fraud
        data_type: boolean
        description: >
          A boolean flag indicating whether the transaction was identified as fraudulent.
          `true` indicates a confirmed fraudulent transaction.

      - name: created_at
        data_type: timestamp
        description: >
          The timestamp when the transaction was created. Used for time-based analysis and 
          feature extraction (e.g., fraud trends over time).

stgfeastacct_fraud_7d¶

stg__feast__acct_fraud_7d.sql

select
  user_id as account_id,
  case
    when user_has_fraudulent_transactions_7d = 1 then true
    when user_has_fraudulent_transactions_7d = 0 then false
    else null
  end as has_fraud_7d,
  feature_timestamp
from {{ source('feast', 'user_has_fraudulent_transactions') }}

stg__feast__acct_fraud_7d.yml

version: 2

models:
  - name: stg__feast__acct_fraud_7d
    description: >
      This staging model identifies whether a user has been involved in any fraudulent transactions
      within a specified time window (7 days). It is typically used as a feature in fraud detection
      pipelines or real-time inference systems.
    config:
      contract:
        enforced: true
    columns:
      - name: account_id
        data_type: string
        description: >
          The unique identifier of the user. This is used as the primary key to track fraud status
          per user.

      - name: has_fraud_7d
        data_type: boolean
        description: >
          A binary flag (true or false) indicating whether the user has had any fraudulent transactions 
          in the past 7 days. Can be used as a feature in ML models.

      - name: feature_timestamp
        data_type: timestamp
        description: >
          The timestamp representing the point-in-time when the feature was calculated. Useful for
          point-in-time correctness in feature stores or backtesting.

stgfeastacct_profiles¶

stg__feast__acct_profiles.sql

select
  user_id as account_id,
  credit_score,
  account_age_days,
  case
    when user_has_2fa_installed = 1 then true
    when user_has_2fa_installed = 0 then false
    else null
  end as has_2fa_installed,
  feature_timestamp
from {{ source('feast', 'user_account_features') }}

stg__feast__acct_profiles.yml

version: 2

models:
  - name: stg__feast__acct_profiles
    description: ""
    config:
      contract:
        enforced: true
    columns:
      - name: account_id
        data_type: string
        description: ""

      - name: credit_score
        data_type: int
        description: ""

      - name: account_age_days
        data_type: int
        description: ""

      - name: has_2fa_installed
        data_type: boolean
        description: ""

      - name: feature_timestamp
        data_type: timestamp
        description: ""

Intermediate¶

intfeastacct_num_txns¶

int__feast__acct_num_txns.sql

{{ config(
    materialized='incremental',
    incremental_strategy='microbatch',
    event_time='transaction_date',
    begin='2025-04-20',
    batch_size='day',
    partition_by={
      "field": "transaction_date",
      "data_type": "date",
      "granularity": "day"
    },
    on_schema_change='append_new_columns'
) }}

with transactions as (

    -- this ref will be auto-filtered
    select
      {{ dbt_utils.star(
        ref('fraud', 'stg__feast__transactions')
      ) }}
    from {{ ref('fraud', 'stg__feast__transactions') }}

)

select
  src_account_id as account_id,
  date(created_at) as transaction_date,
  count(*) as num_transactions,
  sum(amount) as total_amount,
  sum(case when amount > 0 then amount else 0 end) as total_deposits,
from transactions
group by account_id, date(created_at)

intfeastacct_num_txns__extented¶

int__feast__acct_num_txns__extented.sql

select
  *,
  sum(num_transactions) over (
    partition by account_id
    order by unix_date(transaction_date)
    range between 6 preceding
    and current row
  ) as num_transactions_7d
from {{ ref('fraud', 'int__feast__acct_num_txns') }}

Marts¶

feat_acct_num_txns_7d¶

feat_acct_num_txns_7d.sql

select
  account_id as entity_id,
  num_transactions_7d,
  cast(transaction_date as timestamp) as feature_timestamp
from {{ ref("fraud", "int__feast__acct_num_txns__extented") }}

feat_acct_fraud_7d¶

feat_acct_fraud_7d.sql

select
  account_id as entity_id,
  has_fraud_7d,
  feature_timestamp
from {{ ref('fraud', 'stg__feast__acct_fraud_7d') }}

feat_acct_profiles¶

feat_acct_profiles.sql

select
  account_id as entity_id,
  credit_score,
  account_age_days,
  has_2fa_installed,
  feature_timestamp
from {{ ref('fraud', 'stg__feast__acct_profiles') }}

Exposures¶

exposures.yml

version: 2

exposures:
  - name: account_features
    description:
      This exposure contains features related to account activity and fraud detection.
    type: ml
    maturity: medium
    owner:
      name: KC Lai
      email: kuanchoulai10@gmail.com
    label: Account-related features
    url: https://example.com
    depends_on:
      - ref('feat_acct_fraud_7d')
      - ref('feat_acct_num_txns_7d')
      - ref('feat_acct_profiles')