Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    davila7

    langsmith-observability

    davila7/langsmith-observability
    AI & ML
    19,892
    1 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    LLM observability platform for tracing, evaluation, and monitoring...

    SKILL.md

    LangSmith - LLM Observability Platform

    Development platform for debugging, evaluating, and monitoring language models and AI applications.

    When to use LangSmith

    Use LangSmith when:

    • Debugging LLM application issues (prompts, chains, agents)
    • Evaluating model outputs systematically against datasets
    • Monitoring production LLM systems
    • Building regression testing for AI features
    • Analyzing latency, token usage, and costs
    • Collaborating on prompt engineering

    Key features:

    • Tracing: Capture inputs, outputs, latency for all LLM calls
    • Evaluation: Systematic testing with built-in and custom evaluators
    • Datasets: Create test sets from production traces or manually
    • Monitoring: Track metrics, errors, and costs in production
    • Integrations: Works with OpenAI, Anthropic, LangChain, LlamaIndex

    Use alternatives instead:

    • Weights & Biases: Deep learning experiment tracking, model training
    • MLflow: General ML lifecycle, model registry focus
    • Arize/WhyLabs: ML monitoring, data drift detection

    Quick start

    Installation

    pip install langsmith
    
    # Set environment variables
    export LANGSMITH_API_KEY="your-api-key"
    export LANGSMITH_TRACING=true
    

    Basic tracing with @traceable

    from langsmith import traceable
    from openai import OpenAI
    
    client = OpenAI()
    
    @traceable
    def generate_response(prompt: str) -> str:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content
    
    # Automatically traced to LangSmith
    result = generate_response("What is machine learning?")
    

    OpenAI wrapper (automatic tracing)

    from langsmith.wrappers import wrap_openai
    from openai import OpenAI
    
    # Wrap client for automatic tracing
    client = wrap_openai(OpenAI())
    
    # All calls automatically traced
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    

    Core concepts

    Runs and traces

    A run is a single execution unit (LLM call, chain, tool). Runs form hierarchical traces showing the full execution flow.

    from langsmith import traceable
    
    @traceable(run_type="chain")
    def process_query(query: str) -> str:
        # Parent run
        context = retrieve_context(query)  # Child run
        response = generate_answer(query, context)  # Child run
        return response
    
    @traceable(run_type="retriever")
    def retrieve_context(query: str) -> list:
        return vector_store.search(query)
    
    @traceable(run_type="llm")
    def generate_answer(query: str, context: list) -> str:
        return llm.invoke(f"Context: {context}\n\nQuestion: {query}")
    

    Projects

    Projects organize related runs. Set via environment or code:

    import os
    os.environ["LANGSMITH_PROJECT"] = "my-project"
    
    # Or per-function
    @traceable(project_name="my-project")
    def my_function():
        pass
    

    Client API

    from langsmith import Client
    
    client = Client()
    
    # List runs
    runs = list(client.list_runs(
        project_name="my-project",
        filter='eq(status, "success")',
        limit=100
    ))
    
    # Get run details
    run = client.read_run(run_id="...")
    
    # Create feedback
    client.create_feedback(
        run_id="...",
        key="correctness",
        score=0.9,
        comment="Good answer"
    )
    

    Datasets and evaluation

    Create dataset

    from langsmith import Client
    
    client = Client()
    
    # Create dataset
    dataset = client.create_dataset("qa-test-set", description="QA evaluation")
    
    # Add examples
    client.create_examples(
        inputs=[
            {"question": "What is Python?"},
            {"question": "What is ML?"}
        ],
        outputs=[
            {"answer": "A programming language"},
            {"answer": "Machine learning"}
        ],
        dataset_id=dataset.id
    )
    

    Run evaluation

    from langsmith import evaluate
    
    def my_model(inputs: dict) -> dict:
        # Your model logic
        return {"answer": generate_answer(inputs["question"])}
    
    def correctness_evaluator(run, example):
        prediction = run.outputs["answer"]
        reference = example.outputs["answer"]
        score = 1.0 if reference.lower() in prediction.lower() else 0.0
        return {"key": "correctness", "score": score}
    
    results = evaluate(
        my_model,
        data="qa-test-set",
        evaluators=[correctness_evaluator],
        experiment_prefix="v1"
    )
    
    print(f"Average score: {results.aggregate_metrics['correctness']}")
    

    Built-in evaluators

    from langsmith.evaluation import LangChainStringEvaluator
    
    # Use LangChain evaluators
    results = evaluate(
        my_model,
        data="qa-test-set",
        evaluators=[
            LangChainStringEvaluator("qa"),
            LangChainStringEvaluator("cot_qa")
        ]
    )
    

    Advanced tracing

    Tracing context

    from langsmith import tracing_context
    
    with tracing_context(
        project_name="experiment-1",
        tags=["production", "v2"],
        metadata={"version": "2.0"}
    ):
        # All traceable calls inherit context
        result = my_function()
    

    Manual runs

    from langsmith import trace
    
    with trace(
        name="custom_operation",
        run_type="tool",
        inputs={"query": "test"}
    ) as run:
        result = do_something()
        run.end(outputs={"result": result})
    

    Process inputs/outputs

    def sanitize_inputs(inputs: dict) -> dict:
        if "password" in inputs:
            inputs["password"] = "***"
        return inputs
    
    @traceable(process_inputs=sanitize_inputs)
    def login(username: str, password: str):
        return authenticate(username, password)
    

    Sampling

    import os
    os.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1"  # 10% sampling
    

    LangChain integration

    from langchain_openai import ChatOpenAI
    from langchain_core.prompts import ChatPromptTemplate
    
    # Tracing enabled automatically with LANGSMITH_TRACING=true
    llm = ChatOpenAI(model="gpt-4o")
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a helpful assistant."),
        ("user", "{input}")
    ])
    
    chain = prompt | llm
    
    # All chain runs traced automatically
    response = chain.invoke({"input": "Hello!"})
    

    Production monitoring

    Hub prompts

    from langsmith import Client
    
    client = Client()
    
    # Pull prompt from hub
    prompt = client.pull_prompt("my-org/qa-prompt")
    
    # Use in application
    result = prompt.invoke({"question": "What is AI?"})
    

    Async client

    from langsmith import AsyncClient
    
    async def main():
        client = AsyncClient()
    
        runs = []
        async for run in client.list_runs(project_name="my-project"):
            runs.append(run)
    
        return runs
    

    Feedback collection

    from langsmith import Client
    
    client = Client()
    
    # Collect user feedback
    def record_feedback(run_id: str, user_rating: int, comment: str = None):
        client.create_feedback(
            run_id=run_id,
            key="user_rating",
            score=user_rating / 5.0,  # Normalize to 0-1
            comment=comment
        )
    
    # In your application
    record_feedback(run_id="...", user_rating=4, comment="Helpful response")
    

    Testing integration

    Pytest integration

    from langsmith import test
    
    @test
    def test_qa_accuracy():
        result = my_qa_function("What is Python?")
        assert "programming" in result.lower()
    

    Evaluation in CI/CD

    from langsmith import evaluate
    
    def run_evaluation():
        results = evaluate(
            my_model,
            data="regression-test-set",
            evaluators=[accuracy_evaluator]
        )
    
        # Fail CI if accuracy drops
        assert results.aggregate_metrics["accuracy"] >= 0.9, \
            f"Accuracy {results.aggregate_metrics['accuracy']} below threshold"
    

    Best practices

    1. Structured naming - Use consistent project/run naming conventions
    2. Add metadata - Include version, environment, user info
    3. Sample in production - Use sampling rate to control volume
    4. Create datasets - Build test sets from interesting production cases
    5. Automate evaluation - Run evaluations in CI/CD pipelines
    6. Monitor costs - Track token usage and latency trends

    Common issues

    Traces not appearing:

    import os
    # Ensure tracing is enabled
    os.environ["LANGSMITH_TRACING"] = "true"
    os.environ["LANGSMITH_API_KEY"] = "your-key"
    
    # Verify connection
    from langsmith import Client
    client = Client()
    print(client.list_projects())  # Should work
    

    High latency from tracing:

    # Enable background batching (default)
    from langsmith import Client
    client = Client(auto_batch_tracing=True)
    
    # Or use sampling
    os.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1"
    

    Large payloads:

    # Hide sensitive/large fields
    @traceable(
        process_inputs=lambda x: {k: v for k, v in x.items() if k != "large_field"}
    )
    def my_function(data):
        pass
    

    References

    • Advanced Usage - Custom evaluators, distributed tracing, hub prompts
    • Troubleshooting - Common issues, debugging, performance

    Resources

    • Documentation: https://docs.smith.langchain.com
    • Python SDK: https://github.com/langchain-ai/langsmith-sdk
    • Web App: https://smith.langchain.com
    • Version: 0.2.0+
    • License: MIT
    Recommended Servers
    Better Stack
    Better Stack
    Cloudflare Workers Observability
    Cloudflare Workers Observability
    Thoughtbox
    Thoughtbox
    Repository
    davila7/claude-code-templates
    Files