Writing dbt models with proper CTE patterns, SQL structure, and layer-specific templates...
Transform AI agents into experts on writing production-quality dbt models, providing guidance on CTE patterns, SQL structure, and best practices for creating maintainable and performant data models across all medallion architecture layers.
Activate this skill when users ask about:
All dbt models should follow this consistent CTE pattern for readability and maintainability:
-- Import CTEs (staging and intermediate models)
with customers as (
select * from {{ ref('stg_customers') }}
),
orders as (
select * from {{ ref('stg_orders') }}
),
-- Logical CTEs (business logic)
customer_metrics as (
select
customer_id,
count(*) as order_count,
sum(order_amount) as lifetime_value
from orders
group by customer_id
),
-- Final CTE (column selection and standardization)
final as (
select
-- Primary key
customers.customer_id,
-- Attributes
customers.customer_name,
customers.customer_email,
-- Metrics
coalesce(customer_metrics.order_count, 0) as lifetime_orders,
coalesce(customer_metrics.lifetime_value, 0) as lifetime_value,
-- Metadata
current_timestamp() as dbt_updated_at
from customers
left join customer_metrics
on customers.customer_id = customer_metrics.customer_id
)
select * from final
Official dbt Documentation: How we structure our dbt projects
Purpose: One-to-one with source tables. Clean and standardize only.
Materialization: ephemeral (set at folder level in dbt_project.yml)
-- models/bronze/stg_salesforce__accounts.sql
select
-- Primary key
id as account_id,
-- Attributes
name as account_name,
type as account_type,
industry,
-- Standardized fields
upper(trim(email)) as email_clean,
cast(annual_revenue as number) as annual_revenue,
-- Metadata
current_timestamp() as dbt_loaded_at
from {{ source('salesforce', 'accounts') }}
-- models/bronze/stg_ecommerce__customers.sql
select
-- Primary key
customer_id,
-- Attributes (cleaned)
trim(first_name) as first_name,
trim(last_name) as last_name,
lower(trim(email)) as email,
-- Phone standardization
regexp_replace(phone, '[^0-9]', '') as phone_clean,
-- Boolean conversions
case
when status = 'active' then true
else false
end as is_active,
-- Date standardization
cast(created_at as timestamp) as created_at,
-- Metadata
current_timestamp() as dbt_loaded_at
from {{ source('ecommerce', 'customers') }}
✅ DO:
{{ source() }} for source references❌ DON'T:
Purpose: Reusable business logic, enrichment, and complex transformations.
Materialization: ephemeral or table (set at folder level, override if needed)
-- models/silver/int_customers__with_orders.sql
with customers as (
select * from {{ ref('stg_salesforce__customers') }}
),
orders as (
select * from {{ ref('stg_ecommerce__orders') }}
),
customer_order_metrics as (
select
customer_id,
count(distinct order_id) as total_orders,
sum(order_amount) as lifetime_value,
min(order_date) as first_order_date,
max(order_date) as last_order_date
from orders
group by customer_id
),
final as (
select
c.customer_id,
c.customer_name,
c.customer_email,
coalesce(m.total_orders, 0) as total_orders,
coalesce(m.lifetime_value, 0) as lifetime_value,
m.first_order_date,
m.last_order_date,
datediff(day, m.first_order_date, m.last_order_date) as customer_tenure_days
from customers c
left join customer_order_metrics m
on c.customer_id = m.customer_id
)
select * from final
-- models/silver/int_customers__segmented.sql
with customer_metrics as (
select * from {{ ref('int_customers__with_orders') }}
),
rfm_scores as (
select
customer_id,
-- Recency (days since last order)
datediff(day, last_order_date, current_date()) as recency_days,
-- Frequency
total_orders as frequency,
-- Monetary
lifetime_value as monetary,
-- Quartile scoring
ntile(4) over (order by datediff(day, last_order_date, current_date()) desc) as recency_score,
ntile(4) over (order by total_orders) as frequency_score,
ntile(4) over (order by lifetime_value) as monetary_score
from customer_metrics
where last_order_date is not null
),
final as (
select
customer_id,
recency_days,
frequency,
monetary,
recency_score,
frequency_score,
monetary_score,
-- RFM segment
case
when recency_score >= 3 and frequency_score >= 3 and monetary_score >= 3 then 'Champions'
when recency_score >= 3 and frequency_score >= 2 then 'Loyal Customers'
when recency_score >= 3 and monetary_score >= 3 then 'Big Spenders'
when recency_score <= 2 and frequency_score >= 3 then 'At Risk'
when recency_score <= 1 then 'Lost'
else 'Regular'
end as customer_segment
from rfm_scores
)
select * from final
✅ DO:
{{ ref() }}❌ DON'T:
{{ source() }} directlyPurpose: Business entities ready for BI tools.
Materialization: table (set at folder level)
-- models/gold/dim_customers.sql
with customers as (
select * from {{ ref('int_customers__segmented') }}
)
select
-- Primary key
customer_id,
-- Attributes
customer_name,
customer_email,
-- Metrics
total_orders,
lifetime_value,
first_order_date,
last_order_date,
customer_tenure_days,
-- Segmentation
customer_segment,
-- Business classification
case
when customer_segment = 'Champions' then 'High Value'
when customer_segment in ('Loyal Customers', 'Big Spenders') then 'Medium Value'
else 'Low Value'
end as customer_value_tier,
-- Flags
case when last_order_date >= dateadd(day, -90, current_date()) then true else false end as is_active_90d,
case when total_orders = 1 then true else false end as is_one_time_buyer,
-- Metadata
current_timestamp() as dbt_updated_at
from customers
-- models/gold/dim_products.sql
-- Includes ghost key for unknown/missing products
with products as (
select * from {{ ref('int_products__enriched') }}
),
ghost_key as (
select
-1 as product_id,
'Unknown' as product_name,
'Unknown' as product_category,
0.00 as product_price,
false as is_active,
current_timestamp() as dbt_updated_at
),
final as (
select * from products
union all
select * from ghost_key
)
select * from final
✅ DO:
dbt_constraints.primary_key❌ DON'T:
Purpose: Business processes and transactions.
Materialization: table or incremental (override at model level for incremental)
-- models/gold/fct_orders.sql
with orders as (
select * from {{ ref('stg_ecommerce__orders') }}
),
customers as (
select customer_id from {{ ref('dim_customers') }}
),
products as (
select product_id from {{ ref('dim_products') }}
)
select
-- Primary key
orders.order_id,
-- Foreign keys
coalesce(customers.customer_id, -1) as customer_id, -- Ghost key for unknown
coalesce(products.product_id, -1) as product_id,
-- Attributes
orders.order_date,
orders.order_status,
-- Metrics
orders.order_quantity,
orders.order_amount,
orders.discount_amount,
orders.tax_amount,
orders.total_amount,
-- Metadata
current_timestamp() as dbt_updated_at
from orders
left join customers on orders.customer_id = customers.customer_id
left join products on orders.product_id = products.product_id
-- models/gold/fct_order_lines.sql
{{ config(
materialized='incremental',
unique_key='order_line_id',
incremental_strategy='merge',
merge_exclude_columns=['dbt_inserted_at'],
cluster_by=['order_date']
) }}
with order_lines as (
select * from {{ ref('stg_ecommerce__order_lines') }}
)
select
-- Primary key
order_line_id,
-- Foreign keys
order_id,
product_id,
customer_id,
-- Attributes
order_date,
line_number,
-- Metrics
quantity,
unit_price,
discount_percent,
line_total,
-- Metadata
{% if is_incremental() %}
dbt_inserted_at, -- Preserve from merge_exclude_columns
{% else %}
current_timestamp() as dbt_inserted_at,
{% endif %}
current_timestamp() as dbt_updated_at
from order_lines
{% if is_incremental() %}
where order_date > (select max(order_date) from {{ this }})
{% endif %}
✅ DO:
dbt_constraints.foreign_key❌ DON'T:
Most configuration should be at the folder level:
models:
your_project:
bronze:
+materialized: ephemeral
+tags: ["bronze", "staging"]
silver:
+materialized: ephemeral
+tags: ["silver"]
gold:
+materialized: table
+tags: ["gold", "marts"]
Add {{ config() }} ONLY when overriding folder defaults:
-- Only for incremental-specific settings
{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge',
cluster_by=['order_date']
) }}
-- Use COALESCE for safety
select
customer_id,
coalesce(total_orders, 0) as total_orders,
coalesce(lifetime_value, 0.00) as lifetime_value
from {{ ref('customer_metrics') }}
-- Use QUALIFY in Snowflake for cleaner code
select
customer_id,
order_date,
order_amount,
row_number() over (partition by customer_id order by order_date desc) as order_rank
from {{ ref('stg_orders') }}
qualify order_rank <= 5 -- Top 5 orders per customer
select
customer_id,
count(*) as total_orders,
sum(case when order_status = 'completed' then 1 else 0 end) as completed_orders,
sum(case when order_status = 'cancelled' then 1 else 0 end) as cancelled_orders
from {{ ref('stg_orders') }}
group by customer_id
When users ask for modeling help:
"How do I write this model?"
"How do I join these tables?"
ref() for all dbt model references"Should this be ephemeral or table?"
Goal: Transform AI agents into expert dbt modelers who write clean, maintainable, production-quality SQL that follows industry best practices and is easy for teams to understand and extend.