Use when implementing models that learn from minimal data or need to adapt to new market regimes rapidly.
Guide for implementing few-shot learning techniques in financial trading strategies, enabling models to quickly adapt to new market regimes or trade previously unseen assets with minimal data.
Activate this skill when:
Few-Shot Learning:
I_train ∩ I_test = IZero-Shot Learning:
I_train ∩ I_test = ∅# Few-shot setting
train_assets = ['SPY', 'GLD', 'TLT'] # 30 assets
test_assets = ['SPY', 'GLD', 'TLT'] # Same 30 assets, different time period
# Zero-shot setting
train_assets = ['SPY', 'GLD', 'TLT'] # 30 assets for training
test_assets = ['BTC', 'ETH', 'SOL'] # 20 different assets for testing
Train models the same way they'll be used at test time:
Traditional Training:
# Standard mini-batch training - all assets mixed together
for epoch in epochs:
for batch in shuffle(all_data):
loss = model(batch)
optimizer.step()
Episodic Training:
# Episode-based training - mimics test-time usage
for episode in episodes:
# Sample target sequence (what we want to predict)
target_asset, target_time = sample_target()
# Sample context set C (what we condition on)
context_set = sample_contexts(
assets=train_assets,
exclude=(target_asset, target_time), # Ensure causality
size=C # Number of context sequences
)
# Make prediction using context
prediction = model(target=target, context=context_set)
loss = criterion(prediction, true_value)
optimizer.step()
Key Principles:
Context set C contains sequences from other assets/regimes that inform the prediction.
Properties:
Construction Methods:
See IMPLEMENTATION.md for code examples.
How It Works:
Key Insight: Cross-attention automatically identifies which context sequences are most similar to the target, weighting them higher in the final prediction.
See IMPLEMENTATION.md for implementation.
1. Same Asset, Different Regime (Few-Shot)
2. Different Assets, Similar Dynamics (Zero-Shot)
3. Cross-Asset Momentum Spillover
Joint Loss Function:
L_joint = α * L_MLE + L_Sharpe
where:
- L_MLE: Maximum likelihood (forecasting accuracy)
- L_Sharpe: Negative Sharpe ratio (trading performance)
- α: Balance parameter (1.0 for Gaussian, 5.0 for quantile)
Why Joint Training?
See IMPLEMENTATION.md for implementation.
Process:
Critical: Context sets must only use data from training period (no look-ahead).
See IMPLEMENTATION.md for code.
Setup:
See IMPLEMENTATION.md for implementation.
✅ Use episodic training - train how you test ✅ Ensure causality - context must precede target ✅ Sample diverse contexts - different assets, regimes, conditions ✅ Use change-point detection - improves Sharpe by 11%+ ✅ Test zero-shot performance - validates true transfer learning ✅ Joint optimization - balance forecasting and trading objectives
❌ Don't leak future information into context set ❌ Don't use same (asset, time) in context and target ❌ Don't assume transferability without testing ❌ Don't skip few-shot evaluation even for zero-shot models ❌ Don't ignore context set size - typically 10-30 is optimal
# WRONG - context from future!
context = sample_sequences(all_time_periods)
# CORRECT - context only from past
context = sample_sequences(before=target_time)
# WRONG - optimization on test set
best_cpd_threshold = optimize_on_test_set()
# CORRECT - validate on held-out data
best_cpd_threshold = cross_validate_on_train_set()
# WRONG - assume all assets behave identically
encoding = lstm(features)
# CORRECT - use entity embeddings
encoding = lstm(features) + asset_embedding[asset_id]
See IMPLEMENTATION.md for more examples.
When implementing few-shot learning:
financial-time-series - Momentum factors, returns, portfolio constructionchange-point-detection - GP-CPD for regime segmentationx-trend-architecture - Cross-attention mechanismsLast Updated: Based on X-Trend paper (March 2024) Skill Type: Domain Knowledge Line Count: ~310 (under 500-line rule ✅)