Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    Donaldshen27

    few-shot-learning-finance

    Donaldshen27/few-shot-learning-finance
    AI & ML

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Use when implementing models that learn from minimal data or need to adapt to new market regimes rapidly.

    SKILL.md

    Few-Shot Learning for Finance

    Purpose

    Guide for implementing few-shot learning techniques in financial trading strategies, enabling models to quickly adapt to new market regimes or trade previously unseen assets with minimal data.

    When to Use

    Activate this skill when:

    • Implementing models that adapt to regime changes quickly
    • Trading new or low-liquidity assets with limited history
    • Building strategies that transfer knowledge across assets
    • Dealing with non-stationary markets or structural breaks
    • Implementing meta-learning for trading strategies
    • Creating context-based prediction systems

    Core Concepts

    1. Few-Shot vs Zero-Shot Learning

    Few-Shot Learning:

    • Model has seen the target asset during training
    • Can use historical data from same asset (in context set)
    • Training set and test set overlap: I_train ∩ I_test = I
    • Example: Adapting to new regime of S&P 500 after COVID-19

    Zero-Shot Learning:

    • Model has NEVER seen the target asset during training
    • Must transfer knowledge from different assets entirely
    • Training set and test set disjoint: I_train ∩ I_test = ∅
    • Example: Trading a new cryptocurrency using patterns learned from equities
    # Few-shot setting
    train_assets = ['SPY', 'GLD', 'TLT']  # 30 assets
    test_assets = ['SPY', 'GLD', 'TLT']   # Same 30 assets, different time period
    
    # Zero-shot setting
    train_assets = ['SPY', 'GLD', 'TLT']  # 30 assets for training
    test_assets = ['BTC', 'ETH', 'SOL']   # 20 different assets for testing
    

    2. Episodic Learning

    Train models the same way they'll be used at test time:

    Traditional Training:

    # Standard mini-batch training - all assets mixed together
    for epoch in epochs:
        for batch in shuffle(all_data):
            loss = model(batch)
            optimizer.step()
    

    Episodic Training:

    # Episode-based training - mimics test-time usage
    for episode in episodes:
        # Sample target sequence (what we want to predict)
        target_asset, target_time = sample_target()
    
        # Sample context set C (what we condition on)
        context_set = sample_contexts(
            assets=train_assets,
            exclude=(target_asset, target_time),  # Ensure causality
            size=C  # Number of context sequences
        )
    
        # Make prediction using context
        prediction = model(target=target, context=context_set)
    
        loss = criterion(prediction, true_value)
        optimizer.step()
    

    Key Principles:

    • Each episode = one prediction task
    • Context set must be causal (occurred before target)
    • Model learns to transfer patterns from context to target
    • Trains on k-shot tasks to perform well on k-shot evaluation

    3. Context Set Construction

    Context set C contains sequences from other assets/regimes that inform the prediction.

    Properties:

    • Size: Typically 10-30 sequences
    • Causality: All context must occur before target time
    • Diversity: Include different assets and market conditions
    • Quality: CPD segmentation improves performance 11.3% vs random

    Construction Methods:

    1. Random: Sample random sequences before target_time
    2. Time-equivalent: Same time window as target, different assets
    3. CPD-segmented: Use change-point detection for clean regime segments

    See IMPLEMENTATION.md for code examples.

    4. Meta-Learning Architecture

    How It Works:

    1. Encode context sequences → Learn patterns from similar situations
    2. Encode target sequence → Understand current market state
    3. Cross-attention → Target queries context for relevant patterns
    4. Combine representations → Integrate transferred knowledge
    5. Predict position → Generate trading signal

    Key Insight: Cross-attention automatically identifies which context sequences are most similar to the target, weighting them higher in the final prediction.

    See IMPLEMENTATION.md for implementation.

    5. Transfer Learning Scenarios

    1. Same Asset, Different Regime (Few-Shot)

    • Target: SPY in 2020 (COVID crash)
    • Context: SPY in 2008 (financial crisis), SPY in 2018 (correction)
    • Transfer: Crisis response patterns

    2. Different Assets, Similar Dynamics (Zero-Shot)

    • Target: New cryptocurrency (BTC)
    • Context: Gold, Silver, Crude Oil (commodities)
    • Transfer: Trending behavior, volatility patterns

    3. Cross-Asset Momentum Spillover

    • Target: European equities (CAC40)
    • Context: US equities (SPY), Asian equities (Nikkei)
    • Transfer: Leading indicators, correlation structures

    6. Training Objectives

    Joint Loss Function:

    L_joint = α * L_MLE + L_Sharpe
    
    where:
    - L_MLE: Maximum likelihood (forecasting accuracy)
    - L_Sharpe: Negative Sharpe ratio (trading performance)
    - α: Balance parameter (1.0 for Gaussian, 5.0 for quantile)
    

    Why Joint Training?

    • Pure forecasting doesn't optimize for trading
    • Pure Sharpe can overfit to training period
    • Joint training balances both objectives

    See IMPLEMENTATION.md for implementation.

    Evaluation Protocols

    Expanding Window Backtest

    Process:

    1. Train on 1990-1995 data
    2. Test on 1995-2000
    3. Expand training to 1990-2000
    4. Test on 2000-2005
    5. Continue expanding...

    Critical: Context sets must only use data from training period (no look-ahead).

    See IMPLEMENTATION.md for code.

    Zero-Shot Evaluation

    Setup:

    • Train on 30 assets (traditional futures)
    • Test on 20 completely different assets (cryptocurrencies)
    • Context from training assets only
    • Validates true transfer learning capability

    See IMPLEMENTATION.md for implementation.

    Performance Insights from X-Trend Paper

    Few-Shot Results (2018-2023)

    • Baseline (no context): Sharpe = 2.27
    • X-Trend (with context): Sharpe = 2.70 (+18.9%)
    • X-Trend (CPD context): Sharpe = 2.70 (+18.9%)
    • vs TSMOM: Sharpe = 0.23 (10× improvement)

    Zero-Shot Results (2018-2023)

    • Baseline: Sharpe = -0.11 (loss-making!)
    • X-Trend-G (Gaussian): Sharpe = 0.47 (profitable)
    • TSMOM: Sharpe = -0.26
    • 5× Sharpe improvement vs baseline

    COVID-19 Recovery

    • Baseline: 254 days to recover from drawdown
    • X-Trend: 162 days (2× faster recovery)

    Best Practices

    DO:

    ✅ Use episodic training - train how you test ✅ Ensure causality - context must precede target ✅ Sample diverse contexts - different assets, regimes, conditions ✅ Use change-point detection - improves Sharpe by 11%+ ✅ Test zero-shot performance - validates true transfer learning ✅ Joint optimization - balance forecasting and trading objectives

    DON'T:

    ❌ Don't leak future information into context set ❌ Don't use same (asset, time) in context and target ❌ Don't assume transferability without testing ❌ Don't skip few-shot evaluation even for zero-shot models ❌ Don't ignore context set size - typically 10-30 is optimal

    Common Pitfalls

    Pitfall 1: Data Leakage

    # WRONG - context from future!
    context = sample_sequences(all_time_periods)
    
    # CORRECT - context only from past
    context = sample_sequences(before=target_time)
    

    Pitfall 2: Overfitting to Context Construction

    # WRONG - optimization on test set
    best_cpd_threshold = optimize_on_test_set()
    
    # CORRECT - validate on held-out data
    best_cpd_threshold = cross_validate_on_train_set()
    

    Pitfall 3: Ignoring Asset Heterogeneity

    # WRONG - assume all assets behave identically
    encoding = lstm(features)
    
    # CORRECT - use entity embeddings
    encoding = lstm(features) + asset_embedding[asset_id]
    

    See IMPLEMENTATION.md for more examples.

    Implementation Checklist

    When implementing few-shot learning:

    • Define few-shot vs zero-shot split (asset overlap)
    • Implement episodic training loop
    • Create context sampling function (ensure causality)
    • Add cross-attention mechanism for context integration
    • Implement joint loss (forecasting + trading)
    • Set context size (10-30 sequences)
    • Add CPD-based context construction (optional, +11% Sharpe)
    • Implement expanding window backtest
    • Test zero-shot performance separately
    • Monitor attention weights for interpretability
    • Validate no future information leaks

    Key Takeaways

    1. Few-shot ≠ Small Model - Models can be large, but they adapt with minimal examples
    2. Context Quality Matters - CPD segmentation beats random sampling
    3. Zero-shot Tests Transfer - If it works on unseen assets, transfer is real
    4. Episodic Training Required - Don't mix all data; train in episodes
    5. Joint Objectives Help - Forecasting + trading better than either alone

    Related Skills

    • financial-time-series - Momentum factors, returns, portfolio construction
    • change-point-detection - GP-CPD for regime segmentation
    • x-trend-architecture - Cross-attention mechanisms

    Reference Files

    • IMPLEMENTATION.md - Context construction methods, meta-learning architecture, evaluation protocols, common pitfalls

    References

    • Matching Networks for One Shot Learning (Vinyals et al. 2016)
    • Model-Agnostic Meta-Learning (Finn et al. 2017)
    • Neural Processes (Garnelo et al. 2018)
    • X-Trend: Few-Shot Learning Patterns (Wood et al. 2024)

    Last Updated: Based on X-Trend paper (March 2024) Skill Type: Domain Knowledge Line Count: ~310 (under 500-line rule ✅)

    Repository
    donaldshen27/xtrend-vanilla
    Files