Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    davila7

    dspy

    davila7/dspy
    AI & ML
    19,892
    7 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Build complex AI systems with declarative programming, optimize prompts automatically, create modular RAG systems and agents with DSPy - Stanford NLP's framework for systematic LM programming

    SKILL.md

    DSPy: Declarative Language Model Programming

    When to Use This Skill

    Use DSPy when you need to:

    • Build complex AI systems with multiple components and workflows
    • Program LMs declaratively instead of manual prompt engineering
    • Optimize prompts automatically using data-driven methods
    • Create modular AI pipelines that are maintainable and portable
    • Improve model outputs systematically with optimizers
    • Build RAG systems, agents, or classifiers with better reliability

    GitHub Stars: 22,000+ | Created By: Stanford NLP

    Installation

    # Stable release
    pip install dspy
    
    # Latest development version
    pip install git+https://github.com/stanfordnlp/dspy.git
    
    # With specific LM providers
    pip install dspy[openai]        # OpenAI
    pip install dspy[anthropic]     # Anthropic Claude
    pip install dspy[all]           # All providers
    

    Quick Start

    Basic Example: Question Answering

    import dspy
    
    # Configure your language model
    lm = dspy.Claude(model="claude-sonnet-4-5-20250929")
    dspy.settings.configure(lm=lm)
    
    # Define a signature (input → output)
    class QA(dspy.Signature):
        """Answer questions with short factual answers."""
        question = dspy.InputField()
        answer = dspy.OutputField(desc="often between 1 and 5 words")
    
    # Create a module
    qa = dspy.Predict(QA)
    
    # Use it
    response = qa(question="What is the capital of France?")
    print(response.answer)  # "Paris"
    

    Chain of Thought Reasoning

    import dspy
    
    lm = dspy.Claude(model="claude-sonnet-4-5-20250929")
    dspy.settings.configure(lm=lm)
    
    # Use ChainOfThought for better reasoning
    class MathProblem(dspy.Signature):
        """Solve math word problems."""
        problem = dspy.InputField()
        answer = dspy.OutputField(desc="numerical answer")
    
    # ChainOfThought generates reasoning steps automatically
    cot = dspy.ChainOfThought(MathProblem)
    
    response = cot(problem="If John has 5 apples and gives 2 to Mary, how many does he have?")
    print(response.rationale)  # Shows reasoning steps
    print(response.answer)     # "3"
    

    Core Concepts

    1. Signatures

    Signatures define the structure of your AI task (inputs → outputs):

    # Inline signature (simple)
    qa = dspy.Predict("question -> answer")
    
    # Class signature (detailed)
    class Summarize(dspy.Signature):
        """Summarize text into key points."""
        text = dspy.InputField()
        summary = dspy.OutputField(desc="bullet points, 3-5 items")
    
    summarizer = dspy.ChainOfThought(Summarize)
    

    When to use each:

    • Inline: Quick prototyping, simple tasks
    • Class: Complex tasks, type hints, better documentation

    2. Modules

    Modules are reusable components that transform inputs to outputs:

    dspy.Predict

    Basic prediction module:

    predictor = dspy.Predict("context, question -> answer")
    result = predictor(context="Paris is the capital of France",
                       question="What is the capital?")
    

    dspy.ChainOfThought

    Generates reasoning steps before answering:

    cot = dspy.ChainOfThought("question -> answer")
    result = cot(question="Why is the sky blue?")
    print(result.rationale)  # Reasoning steps
    print(result.answer)     # Final answer
    

    dspy.ReAct

    Agent-like reasoning with tools:

    from dspy.predict import ReAct
    
    class SearchQA(dspy.Signature):
        """Answer questions using search."""
        question = dspy.InputField()
        answer = dspy.OutputField()
    
    def search_tool(query: str) -> str:
        """Search Wikipedia."""
        # Your search implementation
        return results
    
    react = ReAct(SearchQA, tools=[search_tool])
    result = react(question="When was Python created?")
    

    dspy.ProgramOfThought

    Generates and executes code for reasoning:

    pot = dspy.ProgramOfThought("question -> answer")
    result = pot(question="What is 15% of 240?")
    # Generates: answer = 240 * 0.15
    

    3. Optimizers

    Optimizers improve your modules automatically using training data:

    BootstrapFewShot

    Learns from examples:

    from dspy.teleprompt import BootstrapFewShot
    
    # Training data
    trainset = [
        dspy.Example(question="What is 2+2?", answer="4").with_inputs("question"),
        dspy.Example(question="What is 3+5?", answer="8").with_inputs("question"),
    ]
    
    # Define metric
    def validate_answer(example, pred, trace=None):
        return example.answer == pred.answer
    
    # Optimize
    optimizer = BootstrapFewShot(metric=validate_answer, max_bootstrapped_demos=3)
    optimized_qa = optimizer.compile(qa, trainset=trainset)
    
    # Now optimized_qa performs better!
    

    MIPRO (Most Important Prompt Optimization)

    Iteratively improves prompts:

    from dspy.teleprompt import MIPRO
    
    optimizer = MIPRO(
        metric=validate_answer,
        num_candidates=10,
        init_temperature=1.0
    )
    
    optimized_cot = optimizer.compile(
        cot,
        trainset=trainset,
        num_trials=100
    )
    

    BootstrapFinetune

    Creates datasets for model fine-tuning:

    from dspy.teleprompt import BootstrapFinetune
    
    optimizer = BootstrapFinetune(metric=validate_answer)
    optimized_module = optimizer.compile(qa, trainset=trainset)
    
    # Exports training data for fine-tuning
    

    4. Building Complex Systems

    Multi-Stage Pipeline

    import dspy
    
    class MultiHopQA(dspy.Module):
        def __init__(self):
            super().__init__()
            self.retrieve = dspy.Retrieve(k=3)
            self.generate_query = dspy.ChainOfThought("question -> search_query")
            self.generate_answer = dspy.ChainOfThought("context, question -> answer")
    
        def forward(self, question):
            # Stage 1: Generate search query
            search_query = self.generate_query(question=question).search_query
    
            # Stage 2: Retrieve context
            passages = self.retrieve(search_query).passages
            context = "\n".join(passages)
    
            # Stage 3: Generate answer
            answer = self.generate_answer(context=context, question=question).answer
            return dspy.Prediction(answer=answer, context=context)
    
    # Use the pipeline
    qa_system = MultiHopQA()
    result = qa_system(question="Who wrote the book that inspired the movie Blade Runner?")
    

    RAG System with Optimization

    import dspy
    from dspy.retrieve.chromadb_rm import ChromadbRM
    
    # Configure retriever
    retriever = ChromadbRM(
        collection_name="documents",
        persist_directory="./chroma_db"
    )
    
    class RAG(dspy.Module):
        def __init__(self, num_passages=3):
            super().__init__()
            self.retrieve = dspy.Retrieve(k=num_passages)
            self.generate = dspy.ChainOfThought("context, question -> answer")
    
        def forward(self, question):
            context = self.retrieve(question).passages
            return self.generate(context=context, question=question)
    
    # Create and optimize
    rag = RAG()
    
    # Optimize with training data
    from dspy.teleprompt import BootstrapFewShot
    
    optimizer = BootstrapFewShot(metric=validate_answer)
    optimized_rag = optimizer.compile(rag, trainset=trainset)
    

    LM Provider Configuration

    Anthropic Claude

    import dspy
    
    lm = dspy.Claude(
        model="claude-sonnet-4-5-20250929",
        api_key="your-api-key",  # Or set ANTHROPIC_API_KEY env var
        max_tokens=1000,
        temperature=0.7
    )
    dspy.settings.configure(lm=lm)
    

    OpenAI

    lm = dspy.OpenAI(
        model="gpt-4",
        api_key="your-api-key",
        max_tokens=1000
    )
    dspy.settings.configure(lm=lm)
    

    Local Models (Ollama)

    lm = dspy.OllamaLocal(
        model="llama3.1",
        base_url="http://localhost:11434"
    )
    dspy.settings.configure(lm=lm)
    

    Multiple Models

    # Different models for different tasks
    cheap_lm = dspy.OpenAI(model="gpt-3.5-turbo")
    strong_lm = dspy.Claude(model="claude-sonnet-4-5-20250929")
    
    # Use cheap model for retrieval, strong model for reasoning
    with dspy.settings.context(lm=cheap_lm):
        context = retriever(question)
    
    with dspy.settings.context(lm=strong_lm):
        answer = generator(context=context, question=question)
    

    Common Patterns

    Pattern 1: Structured Output

    from pydantic import BaseModel, Field
    
    class PersonInfo(BaseModel):
        name: str = Field(description="Full name")
        age: int = Field(description="Age in years")
        occupation: str = Field(description="Current job")
    
    class ExtractPerson(dspy.Signature):
        """Extract person information from text."""
        text = dspy.InputField()
        person: PersonInfo = dspy.OutputField()
    
    extractor = dspy.TypedPredictor(ExtractPerson)
    result = extractor(text="John Doe is a 35-year-old software engineer.")
    print(result.person.name)  # "John Doe"
    print(result.person.age)   # 35
    

    Pattern 2: Assertion-Driven Optimization

    import dspy
    from dspy.primitives.assertions import assert_transform_module, backtrack_handler
    
    class MathQA(dspy.Module):
        def __init__(self):
            super().__init__()
            self.solve = dspy.ChainOfThought("problem -> solution: float")
    
        def forward(self, problem):
            solution = self.solve(problem=problem).solution
    
            # Assert solution is numeric
            dspy.Assert(
                isinstance(float(solution), float),
                "Solution must be a number",
                backtrack=backtrack_handler
            )
    
            return dspy.Prediction(solution=solution)
    

    Pattern 3: Self-Consistency

    import dspy
    from collections import Counter
    
    class ConsistentQA(dspy.Module):
        def __init__(self, num_samples=5):
            super().__init__()
            self.qa = dspy.ChainOfThought("question -> answer")
            self.num_samples = num_samples
    
        def forward(self, question):
            # Generate multiple answers
            answers = []
            for _ in range(self.num_samples):
                result = self.qa(question=question)
                answers.append(result.answer)
    
            # Return most common answer
            most_common = Counter(answers).most_common(1)[0][0]
            return dspy.Prediction(answer=most_common)
    

    Pattern 4: Retrieval with Reranking

    class RerankedRAG(dspy.Module):
        def __init__(self):
            super().__init__()
            self.retrieve = dspy.Retrieve(k=10)
            self.rerank = dspy.Predict("question, passage -> relevance_score: float")
            self.answer = dspy.ChainOfThought("context, question -> answer")
    
        def forward(self, question):
            # Retrieve candidates
            passages = self.retrieve(question).passages
    
            # Rerank passages
            scored = []
            for passage in passages:
                score = float(self.rerank(question=question, passage=passage).relevance_score)
                scored.append((score, passage))
    
            # Take top 3
            top_passages = [p for _, p in sorted(scored, reverse=True)[:3]]
            context = "\n\n".join(top_passages)
    
            # Generate answer
            return self.answer(context=context, question=question)
    

    Evaluation and Metrics

    Custom Metrics

    def exact_match(example, pred, trace=None):
        """Exact match metric."""
        return example.answer.lower() == pred.answer.lower()
    
    def f1_score(example, pred, trace=None):
        """F1 score for text overlap."""
        pred_tokens = set(pred.answer.lower().split())
        gold_tokens = set(example.answer.lower().split())
    
        if not pred_tokens:
            return 0.0
    
        precision = len(pred_tokens & gold_tokens) / len(pred_tokens)
        recall = len(pred_tokens & gold_tokens) / len(gold_tokens)
    
        if precision + recall == 0:
            return 0.0
    
        return 2 * (precision * recall) / (precision + recall)
    

    Evaluation

    from dspy.evaluate import Evaluate
    
    # Create evaluator
    evaluator = Evaluate(
        devset=testset,
        metric=exact_match,
        num_threads=4,
        display_progress=True
    )
    
    # Evaluate model
    score = evaluator(qa_system)
    print(f"Accuracy: {score}")
    
    # Compare optimized vs unoptimized
    score_before = evaluator(qa)
    score_after = evaluator(optimized_qa)
    print(f"Improvement: {score_after - score_before:.2%}")
    

    Best Practices

    1. Start Simple, Iterate

    # Start with Predict
    qa = dspy.Predict("question -> answer")
    
    # Add reasoning if needed
    qa = dspy.ChainOfThought("question -> answer")
    
    # Add optimization when you have data
    optimized_qa = optimizer.compile(qa, trainset=data)
    

    2. Use Descriptive Signatures

    # ❌ Bad: Vague
    class Task(dspy.Signature):
        input = dspy.InputField()
        output = dspy.OutputField()
    
    # ✅ Good: Descriptive
    class SummarizeArticle(dspy.Signature):
        """Summarize news articles into 3-5 key points."""
        article = dspy.InputField(desc="full article text")
        summary = dspy.OutputField(desc="bullet points, 3-5 items")
    

    3. Optimize with Representative Data

    # Create diverse training examples
    trainset = [
        dspy.Example(question="factual", answer="...).with_inputs("question"),
        dspy.Example(question="reasoning", answer="...").with_inputs("question"),
        dspy.Example(question="calculation", answer="...").with_inputs("question"),
    ]
    
    # Use validation set for metric
    def metric(example, pred, trace=None):
        return example.answer in pred.answer
    

    4. Save and Load Optimized Models

    # Save
    optimized_qa.save("models/qa_v1.json")
    
    # Load
    loaded_qa = dspy.ChainOfThought("question -> answer")
    loaded_qa.load("models/qa_v1.json")
    

    5. Monitor and Debug

    # Enable tracing
    dspy.settings.configure(lm=lm, trace=[])
    
    # Run prediction
    result = qa(question="...")
    
    # Inspect trace
    for call in dspy.settings.trace:
        print(f"Prompt: {call['prompt']}")
        print(f"Response: {call['response']}")
    

    Comparison to Other Approaches

    Feature Manual Prompting LangChain DSPy
    Prompt Engineering Manual Manual Automatic
    Optimization Trial & error None Data-driven
    Modularity Low Medium High
    Type Safety No Limited Yes (Signatures)
    Portability Low Medium High
    Learning Curve Low Medium Medium-High

    When to choose DSPy:

    • You have training data or can generate it
    • You need systematic prompt improvement
    • You're building complex multi-stage systems
    • You want to optimize across different LMs

    When to choose alternatives:

    • Quick prototypes (manual prompting)
    • Simple chains with existing tools (LangChain)
    • Custom optimization logic needed

    Resources

    • Documentation: https://dspy.ai
    • GitHub: https://github.com/stanfordnlp/dspy (22k+ stars)
    • Discord: https://discord.gg/XCGy2WDCQB
    • Twitter: @DSPyOSS
    • Paper: "DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines"

    See Also

    • references/modules.md - Detailed module guide (Predict, ChainOfThought, ReAct, ProgramOfThought)
    • references/optimizers.md - Optimization algorithms (BootstrapFewShot, MIPRO, BootstrapFinetune)
    • references/examples.md - Real-world examples (RAG, agents, classifiers)
    Recommended Servers
    Thoughtbox
    Thoughtbox
    Browser tool
    Browser tool
    Svelte
    Svelte
    Repository
    davila7/claude-code-templates
    Files