Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    vasilyu1983

    ai-agents

    vasilyu1983/ai-agents
    AI & ML
    29

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Production-grade AI agent patterns with MCP integration, agentic RAG, handoff orchestration, multi-layer guardrails, observability, token economics, ROI frameworks, and build-vs-not decision guidance...

    SKILL.md

    AI Agents Development — Production Skill Hub

    Modern Best Practices (March 2026): deterministic control flow, bounded tools, auditable state, MCP-based tool integration, handoff-first orchestration, multi-layer guardrails, OpenTelemetry tracing, and human-in-the-loop controls (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).

    This skill provides production-ready operational patterns for designing, building, evaluating, and deploying AI agents. It centralizes procedures, checklists, decision rules, and templates used across RAG agents, tool-using agents, OS agents, and multi-agent systems.

    No theory. No narrative. Only operational steps and templates.


    When to Use This Skill

    Codex should activate this skill whenever the user asks for:

    • Designing an agent (LLM-based, tool-based, OS-based, or multi-agent).
    • Scoping capability maturity and rollout risk for new agent behaviors.
    • Creating action loops, plans, workflows, or delegation logic.
    • Writing tool definitions, MCP tools, schemas, or validation logic.
    • Generating RAG pipelines, retrieval modules, or context injection.
    • Building memory systems (session, long-term, episodic, task).
    • Creating evaluation harnesses, observability plans, or safety gates.
    • Preparing CI/CD, rollout, deployment, or production operational specs.
    • Producing any template in /references/ or /assets/.
    • Implementing MCP servers or integrating Model Context Protocol.
    • Setting up agent handoffs and orchestration patterns.
    • Configuring multi-layer guardrails and safety controls.
    • Evaluating whether to build an agent (build vs not decision).
    • Calculating agent ROI, token costs, or cost/benefit analysis.
    • Assessing hallucination risk and mitigation strategies.
    • Deciding when to kill an agent project (kill triggers).
    • For prompt scaffolds, retrieval tuning, or security depth, see Scope Boundaries below.

    Scope Boundaries (Use These Skills for Depth)

    • Prompt scaffolds & structured outputs → ai-prompt-engineering
    • RAG retrieval & chunking → ai-rag
    • Search tuning (BM25/HNSW/hybrid) → ai-rag
    • Security/guardrails → ai-mlops
    • Inference optimization → ai-llm-inference

    Default Workflow (Production)

    • Pick an architecture with the Decision Tree (below); default to workflow/FSM/DAG for production.
    • Draft an agent spec with assets/core/agent-template-standard.md (or assets/core/agent-template-quick.md).
    • Specify tools and handoffs with JSON Schema using assets/tools/tool-definition.md and references/api-contracts-for-agents.md.
    • Add retrieval only when needed; start with assets/rag/rag-basic.md and scale via assets/rag/rag-advanced.md + references/rag-patterns.md.
    • Add eval + telemetry early via references/evaluation-and-observability.md.
    • Run the go/no-go gate with assets/checklists/agent-safety-checklist.md.
    • Plan deploy/rollback and safety controls via references/deployment-ci-cd-and-safety.md.

    Quick Reference

    Agent Type Core Control Flow Interfaces MCP/A2A When to Use
    Workflow Agent (FSM/DAG) Explicit state transitions State store, tool allowlist MCP Deterministic, auditable flows
    Tool-Using Agent Route → call tool → observe Tool schemas, retries/timeouts MCP External actions (APIs, DB, files)
    RAG Agent Retrieve → answer → cite Retriever, citations, ACLs MCP Knowledge-grounded responses
    Planner/Executor Plan → execute steps with caps Planner prompts, step budget MCP (+A2A) Multi-step problems with bounded autonomy
    Multi-Agent (Orchestrated) Delegate → merge → validate Handoff contracts, eval gates A2A Specialization with explicit handoffs
    OS Agent Observe UI → act → verify Sandbox, UI grounding MCP Desktop/browser control under strict guardrails
    Code/SWE Agent Branch → edit → test → PR Repo access, CI gates MCP Coding tasks with review/merge controls

    Framework Selection (March 2026)

    Tier 1 — Production-Grade

    Framework Architecture Best For Languages Ease
    LangGraph Graph-based, stateful Enterprise, compliance, auditability Python, JS Medium
    Claude Agent SDK Event-driven, tool-centric Anthropic ecosystem, Computer Use, MCP-native Python, TS Easy
    OpenAI Agents SDK Tool-centric, lightweight Fast prototyping, OpenAI ecosystem Python Easy
    Google ADK Code-first, multi-language Gemini/Vertex AI, polyglot teams Python, TS, Go, Java Medium
    Pydantic AI Type-safe, graph FSM Production Python, type safety, MCP+A2A native Python Medium
    MS Agent Framework Kernel + multi-agent Enterprise Azure, .NET/Java teams Python, .NET, Java Medium

    Tier 2 — Specialized

    Framework Architecture Best For Languages Ease
    LlamaIndex Event-driven workflows RAG-native agents, retrieval-heavy Python, TS Medium
    CrewAI Role-based crews Team workflows, content generation Python Easiest
    Mastra Vercel AI SDK-based TypeScript/Next.js teams TypeScript Easy
    SmolAgents Code-first, minimalist Lightweight, fewer LLM calls Python Easy
    Agno FastAPI-native runtime Production Python, 100+ integrations Python Easy
    AWS Bedrock Agents Managed infrastructure Enterprise AWS, knowledge bases Python Easy

    Tier 3 — Niche

    Framework Niche
    Haystack Enterprise RAG+agents pipeline (Airbus, NVIDIA)
    DSPy Declarative optimization — compiles programs into prompts/weights

    See references/modern-best-practices.md for detailed comparison and selection guide.

    Framework Deep Dives

    • Claude Agent SDK - references/claude-agent-sdk-patterns.md Agent definition, built-in tools (Bash, TextEditor, Computer), MCP servers, guardrails, multi-agent, streaming events
    • Pydantic AI - references/pydantic-ai-patterns.md Type-safe agents, MCP toolsets, native A2A, pydantic-graph FSM, durable execution, HITL, TestModel testing

    Decision Tree: Choosing Agent Architecture

    What does the agent need to do?
        ├─ Answer questions from knowledge base?
        │   ├─ Simple lookup? → RAG Agent (LangChain/LlamaIndex + vector DB)
        │   └─ Complex multi-step? → Agentic RAG (iterative retrieval + reasoning)
        │
        ├─ Perform external actions (APIs, tools, functions)?
        │   ├─ 1-3 tools, linear flow? → Tool-Using Agent (LangGraph + MCP)
        │   └─ Complex workflows, branching? → Planning Agent (ReAct/Plan-Execute)
        │
        ├─ Write/modify code autonomously?
        │   ├─ Single file edits? → Tool-Using Agent with code tools
        │   └─ Multi-file, issue resolution? → Code/SWE Agent (HyperAgent pattern)
        │
        ├─ Delegate tasks to specialists?
        │   ├─ Fixed workflow? → Multi-Agent Sequential (A → B → C)
        │   ├─ Manager-Worker? → Multi-Agent Hierarchical (Manager + Workers)
        │   └─ Dynamic routing? → Multi-Agent Group Chat (collaborative)
        │
        ├─ Control desktop/browser?
        │   └─ OS Agent (Anthropic Computer Use + MCP for system access)
        │
        └─ Hybrid (combination of above)?
            └─ Planning Agent that coordinates:
                - Tool-using for actions (MCP)
                - RAG for knowledge (MCP)
                - Multi-agent for delegation (A2A)
                - Code agents for implementation
    

    Protocol Selection:

    • Use MCP for: Tool access, data retrieval, single-agent integration
    • Use A2A for: Agent-to-agent handoffs, multi-agent coordination, task delegation

    Framework Selection (after choosing architecture):

    Which framework?
        ├─ MVP/Prototyping?
        │   ├─ Python → OpenAI Agents SDK or CrewAI
        │   └─ TypeScript → Mastra or Claude Agent SDK
        │
        ├─ Production →
        │   ├─ Auditability/compliance? → LangGraph
        │   ├─ Type safety + MCP/A2A native? → Pydantic AI
        │   ├─ Anthropic models + Computer Use? → Claude Agent SDK
        │   ├─ Google Cloud / Gemini? → Google ADK
        │   ├─ Azure / .NET / Java? → MS Agent Framework
        │   ├─ AWS managed? → Bedrock Agents
        │   └─ RAG-heavy? → LlamaIndex Workflows
        │
        ├─ Minimalist / Research →
        │   ├─ Fewest LLM calls? → SmolAgents
        │   └─ Optimize prompts automatically? → DSPy
        │
        └─ Enterprise pipeline → Haystack
    

    Core Concepts (Vendor-Agnostic)

    Control Flow Options

    • Reactive: direct tool routing per user request (fast, brittle if unbounded).
    • Workflow (FSM/DAG): explicit states and transitions (default for deterministic production).
    • Planner/Executor: plan with strict budgets, then execute step-by-step (use when branching is unavoidable).
    • Orchestrated multi-agent: separate roles with validated handoffs (use when specialization is required).

    Memory Types (Tradeoffs)

    • Short-term (session): cheap, ephemeral; best for conversational continuity.
    • Episodic (task): scoped to a case/ticket; supports audit and replay.
    • Long-term (profile/knowledge): high risk; requires consent, retention limits, and provenance.

    Failure Handling (Production Defaults)

    • Classify errors: retriable vs fatal vs needs-human.
    • Bound retries: max attempts, backoff, jitter; avoid retry storms.
    • Fallbacks: degraded mode, smaller model, cached answers, or safe refusal.

    Do / Avoid

    Do

    • Do keep state explicit and serializable (replayable runs).
    • Do enforce tool allowlists, scopes, and idempotency for side effects.
    • Do log traces/metrics for model calls and tool calls (OpenTelemetry GenAI semantic conventions: https://opentelemetry.io/docs/specs/semconv/gen-ai/).

    Avoid

    • Avoid runaway autonomy (unbounded loops or step counts).
    • Avoid hidden state (implicit memory that cannot be audited).
    • Avoid untrusted tool outputs without validation/sanitization.

    Navigation: Economics & Decision Framework

    Should You Build an Agent?

    • Build vs Not Decision Framework - references/build-vs-not-decision.md
      • 10-second test (volume, cost, error tolerance)
      • Red flags and immediate disqualifiers
      • Alternatives to agents (usually better)
      • Full decision tree with stage gates
      • Kill triggers during development and post-launch
      • Pre-build validation checklist

    Agent ROI & Token Economics

    • Agent Economics - references/agent-economics.md
      • Token pricing by model (January 2026)
      • Cost per task by agent type
      • ROI calculation formula and tiers
      • Hallucination cost framework and mitigation ROI
      • Investment decision matrix
      • Monthly tracking dashboard

    Navigation: AI Engine Layers

    Five-layer architecture for production agent systems. Start with the overview, then drill into layer-specific patterns.

    • AI Engine Architecture — references/ai-engine-layers.md 5-layer composition model, layer interaction matrix, implementation phases

    • Context Graph Patterns — references/context-graph-patterns.md Node/edge schema, traversal patterns, graph-RAG, memory tiers, conflict detection

    • Inbox Engine Patterns — references/inbox-engine-patterns.md Event-driven intake, signal classification, deduplication, priority routing, dead letter

    • Knowledge Base Architecture — assets/knowledge-base/kb-architecture.md Unified KB schema (vector + graph + doc index), provenance, freshness, multi-tenant

    Action Graph → covered by references/operational-patterns.md + references/agent-operations-best-practices.md Data Agent → covered by ../ai-rag/SKILL.md + references/rag-patterns.md


    Navigation: Core Concepts & Patterns

    Governance & Maturity

    • Agent Maturity & Governance - references/agent-maturity-governance.md
      • Capability maturity levels (L0-L4)
      • Identity & policy enforcement
      • Fleet control and registry management
      • Deprecation rules and kill switches

    Modern Best Practices

    • Modern Best Practices - references/modern-best-practices.md
      • Model Context Protocol (MCP)
      • Agent-to-Agent Protocol (A2A)
      • Agentic RAG (Dynamic Retrieval)
      • Multi-layer guardrails
      • LangGraph over LangChain
      • OpenTelemetry for agents

    Context Management

    • Context Engineering - references/context-engineering.md
      • Progressive disclosure
      • Session management
      • Memory provenance
      • Retrieval timing
      • Multimodal context

    Core Operational Patterns

    • Operational Patterns - references/operational-patterns.md
      • Agent loop pattern (PLAN → ACT → OBSERVE → UPDATE)
      • OS agent action loop
      • RAG pipeline pattern
      • Tool specification
      • Memory system pattern
      • Multi-agent workflow
      • Safety & guardrails
      • Observability
      • Evaluation patterns
      • Deployment & CI/CD

    Navigation: Protocol Implementation

    • MCP Practical Guide - references/mcp-practical-guide.md Building MCP servers, tool integration, and standardized data access

    • MCP Server Builder - references/mcp-server-builder.md End-to-end checklist for workflow-focused MCP servers (design → build → test)

    • A2A Handoff Patterns - references/a2a-handoff-patterns.md Agent-to-agent communication, task delegation, and coordination protocols

    • Protocol Decision Tree - references/protocol-decision-tree.md When to use MCP vs A2A, decision framework, and selection criteria


    Navigation: Agent Capabilities

    • Agent Operations - references/agent-operations-best-practices.md Action loops, planning, observation, and execution patterns

    • RAG Patterns - references/rag-patterns.md Contextual retrieval, agentic RAG, and hybrid search strategies

    • Memory Systems - references/memory-systems.md Session, long-term, episodic, and task memory architectures

    • Tool Design & Validation - references/tool-design-specs.md Tool schemas, validation, error handling, and MCP integration

    Skill Packaging & Sharing

    • Skill Lifecycle - references/skill-lifecycle.md Scaffold, validate, package, and share skills with teams (Slack-ready)

    • API Contracts for Agents - references/api-contracts-for-agents.md Request/response envelopes, safety gates, streaming/async patterns, error taxonomy

    • Multi-Agent Patterns - references/multi-agent-patterns.md Manager-worker, sequential, handoff, and group chat orchestration

    • OS Agent Capabilities - references/os-agent-capabilities.md Desktop automation, UI grounding, and computer use patterns

    • Code/SWE Agents - references/code-swe-agents.md SE 3.0 paradigm, autonomous coding patterns, SWE-Bench, HyperAgent architecture

    Framework-Specific Patterns

    • Pydantic AI Patterns - references/pydantic-ai-patterns.md Type-safe agents, MCP toolsets (Stdio/SSE/StreamableHTTP), A2A via to_a2a(), pydantic-graph FSM, durable execution, TestModel testing

    Navigation: Production Operations

    • Evaluation & Observability - references/evaluation-and-observability.md OpenTelemetry GenAI, metrics, LLM-as-judge, and monitoring

    • Deployment, CI/CD & Safety - references/deployment-ci-cd-and-safety.md Multi-layer guardrails, HITL controls, NIST AI RMF, production checklists

    • Agent Debugging Patterns - references/agent-debugging-patterns.md Systematic debugging for agentic systems: trace analysis, tool call failures, loop detection, state corruption

    • Voice & Multimodal Agents - references/voice-multimodal-agents.md Voice-first and multimodal agent patterns: speech pipelines, vision grounding, cross-modal orchestration

    • Guardrails Implementation - references/guardrails-implementation.md Multi-layer guardrail patterns: input/output validation, content filtering, PII detection, cost caps


    Navigation: Templates (Copy-Paste Ready)

    Checklists

    • Agent Design & Safety Checklist - assets/checklists/agent-safety-checklist.md Go/No-Go safety gate: permissions, HITL triggers, eval gates, observability, rollback

    Core Agent Templates

    • Standard Agent Template - assets/core/agent-template-standard.md Full production spec: memory, tools, RAG, evaluation, observability, safety

    • Specialized Agent Template - assets/core/agent-template-specialized.md Domain-specific agents with custom capabilities and constraints

    • Quick Agent Template - assets/core/agent-template-quick.md Minimal viable agent for rapid prototyping

    RAG Templates

    • Basic RAG - assets/rag/rag-basic.md Simple retrieval-augmented generation pipeline

    • Advanced RAG - assets/rag/rag-advanced.md Contextual retrieval, reranking, and agentic RAG patterns

    • Hybrid Retrieval - assets/rag/hybrid-retrieval.md Semantic + keyword search with BM25 fusion

    Tool Templates

    • Tool Definition - assets/tools/tool-definition.md MCP-compatible tool schemas with validation and error handling

    • Tool Validation Checklist - assets/tools/tool-validation-checklist.md Testing, security, and production readiness checks

    Multi-Agent Templates

    • Manager-Worker Template - assets/multi-agent/manager-worker-template.md Orchestration pattern with task delegation and result aggregation

    • Evaluator-Router Template - assets/multi-agent/evaluator-router-template.md Dynamic routing with quality assessment and domain classification

    Service Layer Templates

    • FastAPI Agent Service - ../dev-api-design/assets/fastapi/fastapi-complete-api.md Auth, pagination, validation, error handling; extend with model lifespan loads, SSE, background tasks

    External Sources Metadata

    • Curated References - data/sources.json Authoritative sources spanning standards, protocols, and production agent frameworks

    Shared Utilities (Centralized patterns — extract, don't duplicate)

    • ../software-clean-code-standard/utilities/llm-utilities.md — Token counting, streaming, cost estimation
    • ../software-clean-code-standard/utilities/error-handling.md — Effect Result types, correlation IDs
    • ../software-clean-code-standard/utilities/resilience-utilities.md — p-retry v6, circuit breaker for API calls
    • ../software-clean-code-standard/utilities/logging-utilities.md — pino v9 + OpenTelemetry integration
    • ../software-clean-code-standard/utilities/observability-utilities.md — OpenTelemetry SDK, tracing, metrics
    • ../software-clean-code-standard/utilities/testing-utilities.md — Test factories, fixtures, mocks
    • ../software-clean-code-standard/references/clean-code-standard.md — Canonical clean code rules (CC-*) for citation

    Trend Awareness Protocol

    IMPORTANT: When users ask framework recommendations or "what's best for X" questions, use WebSearch to verify current landscape before answering. If unavailable, use data/sources.json and state what was verified vs assumed.

    Trigger: framework comparisons, "best for [use case]", "is X still relevant?", "latest in AI agents", MCP server availability.

    Report: current landscape, emerging trends, deprecated patterns, recommendation with rationale.


    Related Skills

    This skill integrates with complementary skills:

    Core Dependencies

    • ../ai-llm/ - LLM patterns, prompt engineering, and model selection for agents
    • ../ai-rag/ - Deep RAG implementation: chunking, embedding, reranking
    • ../ai-prompt-engineering/ - System prompt design, few-shot patterns, reasoning strategies

    Production & Operations

    • ../qa-observability/ - OpenTelemetry, metrics, distributed tracing
    • ../software-security-appsec/ - OWASP Top 10, input validation, secure tool design
    • ../ops-devops-platform/ - CI/CD pipelines, deployment strategies, infrastructure

    Supporting Patterns

    • ../dev-api-design/ - REST/GraphQL design for agent APIs and tool interfaces
    • ../ai-mlops/ - Model deployment, monitoring, drift detection
    • ../qa-debugging/ - Agent debugging, error analysis, root cause investigation
    • ../dev-ai-coding-metrics/ - Team-level AI coding metrics: adoption, DORA/SPACE, ROI, DX surveys (this skill covers per-task agent economics)

    Usage pattern: Start here for agent architecture, then reference specialized skills for deep implementation details.


    Usage Notes

    • Modern Standards: Default to MCP for tools, agentic RAG for retrieval, handoff-first for multi-agent
    • Lightweight SKILL.md: Use this file for quick reference and navigation
    • Drill-down resources: Reference detailed resources for implementation guidance
    • Copy-paste templates: Use templates when the user asks for structured artifacts
    • External sources: Reference data/sources.json for authoritative documentation links
    • No theory: Never include theoretical explanations; only operational steps

    AI-Native SDLC Template

    • Use assets/agent-template-ainative-sdlc.md for the Delegate → Review → Own runbook (guardrails + outputs checklist).

    Fact-Checking

    • Use web search/web fetch to verify current external facts, versions, pricing, deadlines, regulations, or platform behavior before final answers.
    • Prefer primary sources; report source links and dates for volatile information.
    • If web access is unavailable, state the limitation and mark guidance as unverified.
    Recommended Servers
    Thoughtbox
    Thoughtbox
    MCP Hive
    MCP Hive
    Agent Safe Message MCP
    Agent Safe Message MCP
    Repository
    vasilyu1983/ai-agents-public
    Files