ALWAYS use this skill when users ask to create, generate, or write UNIT TESTS for BigQuery SQL queries. Invoke proactively whenever the request includes "test" or "tests" with a query/table name...
Composable: Works with bigquery-etl-core (for patterns), metadata-manager (for test updates), and query-writer (for query understanding) When to use: Creating/updating unit test fixtures for BigQuery SQL queries, preventing production queries in tests
NOT for data quality checks: Use bigconfig-generator skill for Bigeye monitoring. The checks.sql file approach is DEPRECATED.
Generate unit test fixtures for BigQuery SQL queries following bigquery-etl conventions. These are development-time tests that validate query logic on small, synthetic data.
Official Documentation: https://mozilla.github.io/bigquery-etl/cookbooks/testing/
CRITICAL DISTINCTION:
This skill creates UNIT TESTS:
For data quality monitoring (NOT this skill):
checks.sql files are DEPRECATED - do NOT create or update themBEFORE creating any test fixtures, you MUST read these files:
CRITICAL FOR SAFETY: Read references/preventing_production_queries.md
UNDERSTAND TEST PATTERNS: Read references/test_strategy_patterns.md
When creating test fixtures, READ and COPY the structure from these template files:
For Input Fixtures:
Glean events? → READ assets/glean_events_fixture.yaml and copy its structure
client_info, events, and extra formatting*_stable.events_v1 or Glean ping tablesLegacy telemetry with arrays? → READ assets/legacy_array_fixture.yaml
Simple table? → READ assets/simple_test_input.yaml
UNION ALL query? → READ both assets/union_all_fixture1.yaml AND assets/union_all_fixture2.yaml
For Query Parameters:
assets/query_params_example.yaml - Must be array format, not key-value pairsFor Expected Output:
assets/simple_test_expect.yaml - Shows array format and NULL handlingassets/timestamp_example.yaml - Shows correct TIMESTAMP format (ISO 8601 with timezone)references/common_test_failures.md - READ THIS when tests fail - Real-world failures and solutions (NULL handling, ordering, timestamps)references/yaml_format_guide.md - YAML syntax, type inference, nested structures (read if you encounter YAML errors)references/external_documentation.md - Links to official docs and related resourcesIMPORTANT: DataHub is for lineage discovery ONLY, NOT for schema lookups
For schema discovery, use this priority order:
schema.yaml files in sql/ directory_live and _stable tablesWhen DataHub IS useful:
See these guides:
../metadata-manager/references/schema_discovery_guide.md - Complete schema discovery workflow../metadata-manager/references/glean_dictionary_patterns.md - Token-efficient Glean Dictionary usage../metadata-manager/scripts/datahub_lineage.py - Efficient lineage queriesTests live in: tests/sql/<project>/<dataset>/<table>/<test_name>/
Required files:
<full_table_reference>.yaml (e.g., moz-fx-data-shared-prod.telemetry.events.yaml)expect.yamlquery_params.yamlThe #1 way tests accidentally query production data:
How to prevent this:
grep -E "FROM|JOIN" query.sql to identify ALL source tablesInput fixtures must match how the table is referenced in the query:
moz-fx-data-shared-prod.dataset.table → file must be moz-fx-data-shared-prod.dataset.table.yamldataset.table → file must be dataset.table.yamltable → file must be table.yamlWrong naming causes tests to query production BigQuery instead of your fixtures.
All fixtures must use array syntax (starts with -). See references/yaml_format_guide.md for details.
# Correct - array syntax with dashes
- field1: value1
field2: value2
loaded: nullfield: nullavg_time_seconds: null[]) - they cause "Schema has no fields" errorsgenerated_time columns from expect.yamlMust be an array (starts with -), not key-value pairs. See assets/query_params_example.yaml for template.
- name: submission_date
type: DATE
value: "2024-12-01"
CRITICAL: BigQuery returns TIMESTAMP and DATETIME fields in ISO 8601 format with timezone.
In expect.yaml, ALWAYS use ISO 8601 format:
# ✅ CORRECT - ISO 8601 with timezone
- campaign_created_at: "2025-06-01T10:00:00+00:00"
campaign_updated_at: "2025-07-01T12:00:00+00:00"
# ❌ WRONG - Missing 'T' and timezone
- campaign_created_at: "2025-06-01 10:00:00"
campaign_updated_at: "2025-07-01 12:00:00"
Format rules:
2025-06-01T10:00:00+00:00 (ISO 8601)2025-06-01 (YYYY-MM-DD)2025-06-01T10:00:00.123456+00:00Input fixtures can use either format, but expect.yaml MUST use ISO 8601.
See references/common_test_failures.md section 3 for details and examples.
See references/test_strategy_patterns.md for comprehensive patterns.
Quick analysis checklist:
Number of tests:
Test naming: Use descriptive snake_case: test_new_clients_only, test_union_all_sources
Before running tests, configure Google Cloud authentication:
export GOOGLE_PROJECT_ID=bigquery-etl-integration-test
gcloud config set project $GOOGLE_PROJECT_ID
gcloud auth application-default login
See https://mozilla.github.io/bigquery-etl/cookbooks/testing/ for more details.
Read the required reference files (from "REQUIRED READING" section above)
references/preventing_production_queries.mdreferences/test_strategy_patterns.mdRead the query.sql file
Gather schema information efficiently (use priority order):
schema.yaml files in sql/ directory for derived tables_live and _stable tables../metadata-manager/references/glean_dictionary_patterns.md for token-efficient extraction../metadata-manager/references/schema_discovery_guide.md for complete workflow../metadata-manager/scripts/datahub_lineage.py for efficient lineage queriesIdentify ALL source tables - use this command:
grep -E "FROM|JOIN" query.sql
Write down EVERY table you find. This is your checklist.
Determine test scenarios needed based on query complexity
Read the appropriate template files for your data sources:
assets/glean_events_fixture.yamlassets/legacy_array_fixture.yamlassets/simple_test_input.yamlassets/query_params_example.yamlassets/union_all_fixture1.yaml and assets/union_all_fixture2.yamlassets/timestamp_example.yaml (VERY IMPORTANT)Create test directory and fixtures:
tests/sql/<project>/<dataset>/<table>/<test_name>/Create expect.yaml and query_params.yaml (if needed)
Run the test:
pytest tests/sql/<project>/<dataset>/<table>/<test_name>/ -v
Common pytest debugging options:
# Stop on first failure (faster debugging)
pytest tests/sql/.../test_name/ -x
# Very verbose output (shows individual test details)
pytest tests/sql/.../test_name/ -vv
# Show full diff on assertion failures
pytest tests/sql/.../test_name/ -vv --tb=short
# Run specific test by pattern
pytest tests/sql/.../test_name/ -k "test_pattern"
# Show local variables in traceback
pytest tests/sql/.../test_name/ -l
# Combined: stop on first failure with very verbose output
pytest tests/sql/.../test_name/ -xvv
Fix common test failures (if test fails):
A. Timestamp format mismatches (VERY COMMON):
2025-06-01T10:00:00+00:00 != 2025-06-01 10:00:002025-06-01T10:00:00+00:00references/common_test_failures.md section 3 for detailsB. Ordering issues (VERY COMMON):
references/common_test_failures.md section 2 for detailed examplesVerify you're NOT querying production:
sql/<project>/<dataset>/<table>/bigconfig.ymlBefore finalizing tests, verify:
grep -E "FROM|JOIN" query.sql to find all source tablesNote: After running tests, if you need to use metadata-manager or query production tables, switch back to the main project:
export GOOGLE_PROJECT_ID=mozdata
gcloud config set project $GOOGLE_PROJECT_ID
For a query at sql/moz-fx-data-shared-prod/dataset/table_v1/query.sql that joins telemetry.clients_daily with the target table:
Test directory: tests/sql/moz-fx-data-shared-prod/dataset/table_v1/test_new_clients/
Input fixture: moz-fx-data-shared-prod.telemetry.clients_daily.yaml
- client_id: "abc123"
submission_date: "2025-01-01"
search_count: 5
Query params: query_params.yaml
- name: submission_date
type: DATE
value: "2025-01-01"
Expected output: expect.yaml
- client_id: "abc123"
total_searches: 5
first_seen_date: "2025-01-01"
For a query that unions three data sources (legacy, glean, and ads):
Test directory: tests/sql/moz-fx-data-shared-prod/dataset/table_v1/test_union_all_sources/
IMPORTANT: Create fixtures for ALL three sources in ONE test directory:
Fixture 1: moz-fx-data-shared-prod.legacy_source.table.yaml
- submission_timestamp: "2024-12-15 10:00:00"
document_id: "doc1"
version: "120.0.0"
event_count: 5
Fixture 2: moz-fx-data-shared-prod.glean_source.table.yaml
- submission_timestamp: "2024-12-15 14:30:00"
document_id: "doc2"
client_info:
app_display_version: "121.0.0"
events:
- category: "interaction"
name: "click"
Fixture 3: moz-fx-data-shared-prod.ads_source.table.yaml
- submission_hour: "2024-12-15 10:00:00"
ad_id: 12345
interaction_type: "impression"
interaction_count: 100
Query params: query_params.yaml
- name: submission_date
type: DATE
value: "2024-12-15"
Expected output: expect.yaml
- submission_date: "2024-12-15"
source: "legacy"
event_count: 5
- submission_date: "2024-12-15"
source: "glean"
event_count: 1
- submission_date: "2024-12-15"
source: "ads"
event_count: 100
See example assets for templates:
assets/glean_events_fixture.yaml - Glean events with extras structureassets/legacy_array_fixture.yaml - Legacy telemetry with array fieldsSee assets/glean_events_fixture.yaml for complete example. Events use nested structure:
events.category, events.namemozfun.map.get_key(events.extra, 'field_name')See assets/legacy_array_fixture.yaml. Arrays get unnested with CROSS JOIN UNNEST(tiles).
Always use three-part versions ("121.0.0") to avoid YAML float parsing. See references/yaml_format_guide.md.
See references/common_test_failures.md for real-world examples and solutions.
See references/yaml_format_guide.md and references/preventing_production_queries.md for detailed guides.
Including field: null in expect.yaml causes test failures because BigQuery omits NULL fields from results.
Solution: Completely omit NULL fields from expect.yaml - don't include them at all.
See references/common_test_failures.md section 1.
When ORDER BY fields have duplicate values, row order becomes non-deterministic and tests fail.
Solution: Either create test data with different ORDER BY values, or run test to see actual order and update expect.yaml.
See references/common_test_failures.md section 2.
Version numbers like "120.0" parsed as FLOAT64 instead of STRING. Solution: Use three-part versions: "120.0.0" or simple integers: "120"
Creating [] causes "Schema has no fields" errors.
Solution: Always include at least one row. To filter out data, use WHERE clause filtering.
If ANY source table lacks a fixture, query hits production BigQuery.
Symptoms: Thousands of rows, real production values, slow execution (>10 seconds)
Solution: Create fixtures for ALL source tables. See references/preventing_production_queries.md
Omit NULL fields from expect.yaml - BigQuery doesn't return them in results.
Full troubleshooting: https://mozilla.github.io/bigquery-etl/cookbooks/testing/
sql-test-generator is invoked by other skills when test fixtures need to be created or updated:
Direct invocation by user:
Invoked by metadata-manager:
After query-writer:
Prevents production queries:
Handles complex queries:
Provides best practices: