Use when implementing models that learn from minimal data or need to adapt to new market regimes rapidly.

Few-Shot Learning for Finance

Purpose

Guide for implementing few-shot learning techniques in financial trading strategies, enabling models to quickly adapt to new market regimes or trade previously unseen assets with minimal data.

When to Use

Activate this skill when:

Implementing models that adapt to regime changes quickly
Trading new or low-liquidity assets with limited history
Building strategies that transfer knowledge across assets
Dealing with non-stationary markets or structural breaks
Implementing meta-learning for trading strategies
Creating context-based prediction systems

Core Concepts

1. Few-Shot vs Zero-Shot Learning

Few-Shot Learning:

Model has seen the target asset during training
Can use historical data from same asset (in context set)
Training set and test set overlap: I_train ∩ I_test = I
Example: Adapting to new regime of S&P 500 after COVID-19

Zero-Shot Learning:

Model has NEVER seen the target asset during training
Must transfer knowledge from different assets entirely
Training set and test set disjoint: I_train ∩ I_test = ∅
Example: Trading a new cryptocurrency using patterns learned from equities

# Few-shot setting
train_assets = ['SPY', 'GLD', 'TLT']  # 30 assets
test_assets = ['SPY', 'GLD', 'TLT']   # Same 30 assets, different time period

# Zero-shot setting
train_assets = ['SPY', 'GLD', 'TLT']  # 30 assets for training
test_assets = ['BTC', 'ETH', 'SOL']   # 20 different assets for testing

2. Episodic Learning

Train models the same way they'll be used at test time:

Traditional Training:

# Standard mini-batch training - all assets mixed together
for epoch in epochs:
    for batch in shuffle(all_data):
        loss = model(batch)
        optimizer.step()

Episodic Training:

# Episode-based training - mimics test-time usage
for episode in episodes:
    # Sample target sequence (what we want to predict)
    target_asset, target_time = sample_target()

    # Sample context set C (what we condition on)
    context_set = sample_contexts(
        assets=train_assets,
        exclude=(target_asset, target_time),  # Ensure causality
        size=C  # Number of context sequences
    )

    # Make prediction using context
    prediction = model(target=target, context=context_set)

    loss = criterion(prediction, true_value)
    optimizer.step()

Key Principles:

Each episode = one prediction task
Context set must be causal (occurred before target)
Model learns to transfer patterns from context to target
Trains on k-shot tasks to perform well on k-shot evaluation

3. Context Set Construction

Context set C contains sequences from other assets/regimes that inform the prediction.

Properties:

Size: Typically 10-30 sequences
Causality: All context must occur before target time
Diversity: Include different assets and market conditions
Quality: CPD segmentation improves performance 11.3% vs random

Construction Methods:

Random: Sample random sequences before target_time
Time-equivalent: Same time window as target, different assets
CPD-segmented: Use change-point detection for clean regime segments

See IMPLEMENTATION.md for code examples.

4. Meta-Learning Architecture

How It Works:

Encode context sequences → Learn patterns from similar situations
Encode target sequence → Understand current market state
Cross-attention → Target queries context for relevant patterns
Combine representations → Integrate transferred knowledge
Predict position → Generate trading signal

Key Insight: Cross-attention automatically identifies which context sequences are most similar to the target, weighting them higher in the final prediction.

See IMPLEMENTATION.md for implementation.

5. Transfer Learning Scenarios

1. Same Asset, Different Regime (Few-Shot)

Target: SPY in 2020 (COVID crash)
Context: SPY in 2008 (financial crisis), SPY in 2018 (correction)
Transfer: Crisis response patterns

2. Different Assets, Similar Dynamics (Zero-Shot)

Target: New cryptocurrency (BTC)
Context: Gold, Silver, Crude Oil (commodities)
Transfer: Trending behavior, volatility patterns

3. Cross-Asset Momentum Spillover

Target: European equities (CAC40)
Context: US equities (SPY), Asian equities (Nikkei)
Transfer: Leading indicators, correlation structures

6. Training Objectives

Joint Loss Function:

L_joint = α * L_MLE + L_Sharpe

where:
- L_MLE: Maximum likelihood (forecasting accuracy)
- L_Sharpe: Negative Sharpe ratio (trading performance)
- α: Balance parameter (1.0 for Gaussian, 5.0 for quantile)

Why Joint Training?

Pure forecasting doesn't optimize for trading
Pure Sharpe can overfit to training period
Joint training balances both objectives

See IMPLEMENTATION.md for implementation.

Evaluation Protocols

Expanding Window Backtest

Process:

Train on 1990-1995 data
Test on 1995-2000
Expand training to 1990-2000
Test on 2000-2005
Continue expanding...

Critical: Context sets must only use data from training period (no look-ahead).

See IMPLEMENTATION.md for code.

Zero-Shot Evaluation

Setup:

Train on 30 assets (traditional futures)
Test on 20 completely different assets (cryptocurrencies)
Context from training assets only
Validates true transfer learning capability

See IMPLEMENTATION.md for implementation.

Performance Insights from X-Trend Paper

Few-Shot Results (2018-2023)

Baseline (no context): Sharpe = 2.27
X-Trend (with context): Sharpe = 2.70 (+18.9%)
X-Trend (CPD context): Sharpe = 2.70 (+18.9%)
vs TSMOM: Sharpe = 0.23 (10× improvement)

Zero-Shot Results (2018-2023)

Baseline: Sharpe = -0.11 (loss-making!)
X-Trend-G (Gaussian): Sharpe = 0.47 (profitable)
TSMOM: Sharpe = -0.26
5× Sharpe improvement vs baseline

COVID-19 Recovery

Baseline: 254 days to recover from drawdown
X-Trend: 162 days (2× faster recovery)

Best Practices

DO:

✅ Use episodic training - train how you test ✅ Ensure causality - context must precede target ✅ Sample diverse contexts - different assets, regimes, conditions ✅ Use change-point detection - improves Sharpe by 11%+ ✅ Test zero-shot performance - validates true transfer learning ✅ Joint optimization - balance forecasting and trading objectives

DON'T:

❌ Don't leak future information into context set ❌ Don't use same (asset, time) in context and target ❌ Don't assume transferability without testing ❌ Don't skip few-shot evaluation even for zero-shot models ❌ Don't ignore context set size - typically 10-30 is optimal

Common Pitfalls

Pitfall 1: Data Leakage

# WRONG - context from future!
context = sample_sequences(all_time_periods)

# CORRECT - context only from past
context = sample_sequences(before=target_time)

Pitfall 2: Overfitting to Context Construction

# WRONG - optimization on test set
best_cpd_threshold = optimize_on_test_set()

# CORRECT - validate on held-out data
best_cpd_threshold = cross_validate_on_train_set()

Pitfall 3: Ignoring Asset Heterogeneity

# WRONG - assume all assets behave identically
encoding = lstm(features)

# CORRECT - use entity embeddings
encoding = lstm(features) + asset_embedding[asset_id]

See IMPLEMENTATION.md for more examples.

Implementation Checklist

When implementing few-shot learning:

Define few-shot vs zero-shot split (asset overlap)
Implement episodic training loop
Create context sampling function (ensure causality)
Add cross-attention mechanism for context integration
Implement joint loss (forecasting + trading)
Set context size (10-30 sequences)
Add CPD-based context construction (optional, +11% Sharpe)
Implement expanding window backtest
Test zero-shot performance separately
Monitor attention weights for interpretability
Validate no future information leaks

Key Takeaways

Few-shot ≠ Small Model - Models can be large, but they adapt with minimal examples
Context Quality Matters - CPD segmentation beats random sampling
Zero-shot Tests Transfer - If it works on unseen assets, transfer is real
Episodic Training Required - Don't mix all data; train in episodes
Joint Objectives Help - Forecasting + trading better than either alone

Related Skills

financial-time-series - Momentum factors, returns, portfolio construction
change-point-detection - GP-CPD for regime segmentation
x-trend-architecture - Cross-attention mechanisms

Reference Files

IMPLEMENTATION.md - Context construction methods, meta-learning architecture, evaluation protocols, common pitfalls

References

Matching Networks for One Shot Learning (Vinyals et al. 2016)
Model-Agnostic Meta-Learning (Finn et al. 2017)
Neural Processes (Garnelo et al. 2018)
X-Trend: Few-Shot Learning Patterns (Wood et al. 2024)

Last Updated: Based on X-Trend paper (March 2024) Skill Type: Domain Knowledge Line Count: ~310 (under 500-line rule ✅)

Use when implementing models that learn from minimal data or need to adapt to new market regimes rapidly.

Few-Shot Learning for Finance

Purpose

Guide for implementing few-shot learning techniques in financial trading strategies, enabling models to quickly adapt to new market regimes or trade previously unseen assets with minimal data.

When to Use

Activate this skill when:

Implementing models that adapt to regime changes quickly
Trading new or low-liquidity assets with limited history
Building strategies that transfer knowledge across assets
Dealing with non-stationary markets or structural breaks
Implementing meta-learning for trading strategies
Creating context-based prediction systems

Core Concepts

1. Few-Shot vs Zero-Shot Learning

Few-Shot Learning:

Model has seen the target asset during training
Can use historical data from same asset (in context set)
Training set and test set overlap: I_train ∩ I_test = I
Example: Adapting to new regime of S&P 500 after COVID-19

Zero-Shot Learning:

Model has NEVER seen the target asset during training
Must transfer knowledge from different assets entirely
Training set and test set disjoint: I_train ∩ I_test = ∅
Example: Trading a new cryptocurrency using patterns learned from equities

# Few-shot setting
train_assets = ['SPY', 'GLD', 'TLT']  # 30 assets
test_assets = ['SPY', 'GLD', 'TLT']   # Same 30 assets, different time period

# Zero-shot setting
train_assets = ['SPY', 'GLD', 'TLT']  # 30 assets for training
test_assets = ['BTC', 'ETH', 'SOL']   # 20 different assets for testing

2. Episodic Learning

Train models the same way they'll be used at test time:

Traditional Training:

# Standard mini-batch training - all assets mixed together
for epoch in epochs:
    for batch in shuffle(all_data):
        loss = model(batch)
        optimizer.step()

Episodic Training:

# Episode-based training - mimics test-time usage
for episode in episodes:
    # Sample target sequence (what we want to predict)
    target_asset, target_time = sample_target()

    # Sample context set C (what we condition on)
    context_set = sample_contexts(
        assets=train_assets,
        exclude=(target_asset, target_time),  # Ensure causality
        size=C  # Number of context sequences
    )

    # Make prediction using context
    prediction = model(target=target, context=context_set)

    loss = criterion(prediction, true_value)
    optimizer.step()

Key Principles:

Each episode = one prediction task
Context set must be causal (occurred before target)
Model learns to transfer patterns from context to target
Trains on k-shot tasks to perform well on k-shot evaluation

3. Context Set Construction

Context set C contains sequences from other assets/regimes that inform the prediction.

Properties:

Size: Typically 10-30 sequences
Causality: All context must occur before target time
Diversity: Include different assets and market conditions
Quality: CPD segmentation improves performance 11.3% vs random

Construction Methods:

Random: Sample random sequences before target_time
Time-equivalent: Same time window as target, different assets
CPD-segmented: Use change-point detection for clean regime segments

See IMPLEMENTATION.md for code examples.

4. Meta-Learning Architecture

How It Works:

Encode context sequences → Learn patterns from similar situations
Encode target sequence → Understand current market state
Cross-attention → Target queries context for relevant patterns
Combine representations → Integrate transferred knowledge
Predict position → Generate trading signal

Key Insight: Cross-attention automatically identifies which context sequences are most similar to the target, weighting them higher in the final prediction.

See IMPLEMENTATION.md for implementation.

5. Transfer Learning Scenarios

1. Same Asset, Different Regime (Few-Shot)

Target: SPY in 2020 (COVID crash)
Context: SPY in 2008 (financial crisis), SPY in 2018 (correction)
Transfer: Crisis response patterns

2. Different Assets, Similar Dynamics (Zero-Shot)

Target: New cryptocurrency (BTC)
Context: Gold, Silver, Crude Oil (commodities)
Transfer: Trending behavior, volatility patterns

3. Cross-Asset Momentum Spillover

Target: European equities (CAC40)
Context: US equities (SPY), Asian equities (Nikkei)
Transfer: Leading indicators, correlation structures

6. Training Objectives

Joint Loss Function:

L_joint = α * L_MLE + L_Sharpe

where:
- L_MLE: Maximum likelihood (forecasting accuracy)
- L_Sharpe: Negative Sharpe ratio (trading performance)
- α: Balance parameter (1.0 for Gaussian, 5.0 for quantile)

Why Joint Training?

Pure forecasting doesn't optimize for trading
Pure Sharpe can overfit to training period
Joint training balances both objectives

See IMPLEMENTATION.md for implementation.

Evaluation Protocols

Expanding Window Backtest

Process:

Train on 1990-1995 data
Test on 1995-2000
Expand training to 1990-2000
Test on 2000-2005
Continue expanding...

Critical: Context sets must only use data from training period (no look-ahead).

See IMPLEMENTATION.md for code.

Zero-Shot Evaluation

Setup:

Train on 30 assets (traditional futures)
Test on 20 completely different assets (cryptocurrencies)
Context from training assets only
Validates true transfer learning capability

See IMPLEMENTATION.md for implementation.

Performance Insights from X-Trend Paper

Few-Shot Results (2018-2023)

Baseline (no context): Sharpe = 2.27
X-Trend (with context): Sharpe = 2.70 (+18.9%)
X-Trend (CPD context): Sharpe = 2.70 (+18.9%)
vs TSMOM: Sharpe = 0.23 (10× improvement)

Zero-Shot Results (2018-2023)

Baseline: Sharpe = -0.11 (loss-making!)
X-Trend-G (Gaussian): Sharpe = 0.47 (profitable)
TSMOM: Sharpe = -0.26
5× Sharpe improvement vs baseline

COVID-19 Recovery

Baseline: 254 days to recover from drawdown
X-Trend: 162 days (2× faster recovery)

Best Practices

DO:

DON'T:

Common Pitfalls

Pitfall 1: Data Leakage

# WRONG - context from future!
context = sample_sequences(all_time_periods)

# CORRECT - context only from past
context = sample_sequences(before=target_time)

Pitfall 2: Overfitting to Context Construction

# WRONG - optimization on test set
best_cpd_threshold = optimize_on_test_set()

# CORRECT - validate on held-out data
best_cpd_threshold = cross_validate_on_train_set()

Pitfall 3: Ignoring Asset Heterogeneity

# WRONG - assume all assets behave identically
encoding = lstm(features)

# CORRECT - use entity embeddings
encoding = lstm(features) + asset_embedding[asset_id]

See IMPLEMENTATION.md for more examples.

Implementation Checklist

When implementing few-shot learning:

Define few-shot vs zero-shot split (asset overlap)
Implement episodic training loop
Create context sampling function (ensure causality)
Add cross-attention mechanism for context integration
Implement joint loss (forecasting + trading)
Set context size (10-30 sequences)
Add CPD-based context construction (optional, +11% Sharpe)
Implement expanding window backtest
Test zero-shot performance separately
Monitor attention weights for interpretability
Validate no future information leaks

Key Takeaways

Few-shot ≠ Small Model - Models can be large, but they adapt with minimal examples
Context Quality Matters - CPD segmentation beats random sampling
Zero-shot Tests Transfer - If it works on unseen assets, transfer is real
Episodic Training Required - Don't mix all data; train in episodes
Joint Objectives Help - Forecasting + trading better than either alone

Related Skills

financial-time-series - Momentum factors, returns, portfolio construction
change-point-detection - GP-CPD for regime segmentation
x-trend-architecture - Cross-attention mechanisms

Reference Files

IMPLEMENTATION.md - Context construction methods, meta-learning architecture, evaluation protocols, common pitfalls

References

Matching Networks for One Shot Learning (Vinyals et al. 2016)
Model-Agnostic Meta-Learning (Finn et al. 2017)
Neural Processes (Garnelo et al. 2018)
X-Trend: Few-Shot Learning Patterns (Wood et al. 2024)

Last Updated: Based on X-Trend paper (March 2024) Skill Type: Domain Knowledge Line Count: ~310 (under 500-line rule ✅)

few-shot-learning-finance

About

SKILL.md

few-shot-learning-finance

About

SKILL.md

Few-Shot Learning for Finance

Purpose

When to Use

Core Concepts

1. Few-Shot vs Zero-Shot Learning

2. Episodic Learning

3. Context Set Construction

4. Meta-Learning Architecture

5. Transfer Learning Scenarios

6. Training Objectives

Evaluation Protocols

Expanding Window Backtest

Zero-Shot Evaluation

Performance Insights from X-Trend Paper

Few-Shot Results (2018-2023)

Zero-Shot Results (2018-2023)

COVID-19 Recovery

Best Practices

DO:

DON'T:

Common Pitfalls

Pitfall 1: Data Leakage

Pitfall 2: Overfitting to Context Construction

Pitfall 3: Ignoring Asset Heterogeneity

Implementation Checklist

Key Takeaways

Related Skills

Reference Files

References

About

SKILL.md

About

SKILL.md

Few-Shot Learning for Finance

Purpose

When to Use

Core Concepts

1. Few-Shot vs Zero-Shot Learning

2. Episodic Learning

3. Context Set Construction

4. Meta-Learning Architecture

5. Transfer Learning Scenarios

6. Training Objectives

Evaluation Protocols

Expanding Window Backtest

Zero-Shot Evaluation

Performance Insights from X-Trend Paper

Few-Shot Results (2018-2023)

Zero-Shot Results (2018-2023)

COVID-19 Recovery

Best Practices

DO:

DON'T:

Common Pitfalls

Pitfall 1: Data Leakage

Pitfall 2: Overfitting to Context Construction

Pitfall 3: Ignoring Asset Heterogeneity

Implementation Checklist

Key Takeaways

Related Skills

Reference Files

References