Production-grade AI agent patterns with MCP integration, agentic RAG, handoff orchestration, multi-layer guardrails, observability, token economics, ROI frameworks, and build-vs-not decision guidance...
Modern Best Practices (March 2026): deterministic control flow, bounded tools, auditable state, MCP-based tool integration, handoff-first orchestration, multi-layer guardrails, OpenTelemetry tracing, and human-in-the-loop controls (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).
This skill provides production-ready operational patterns for designing, building, evaluating, and deploying AI agents. It centralizes procedures, checklists, decision rules, and templates used across RAG agents, tool-using agents, OS agents, and multi-agent systems.
No theory. No narrative. Only operational steps and templates.
Codex should activate this skill whenever the user asks for:
/references/ or /assets/.assets/core/agent-template-standard.md (or assets/core/agent-template-quick.md).assets/tools/tool-definition.md and references/api-contracts-for-agents.md.assets/rag/rag-basic.md and scale via assets/rag/rag-advanced.md + references/rag-patterns.md.references/evaluation-and-observability.md.assets/checklists/agent-safety-checklist.md.references/deployment-ci-cd-and-safety.md.| Agent Type | Core Control Flow | Interfaces | MCP/A2A | When to Use |
|---|---|---|---|---|
| Workflow Agent (FSM/DAG) | Explicit state transitions | State store, tool allowlist | MCP | Deterministic, auditable flows |
| Tool-Using Agent | Route → call tool → observe | Tool schemas, retries/timeouts | MCP | External actions (APIs, DB, files) |
| RAG Agent | Retrieve → answer → cite | Retriever, citations, ACLs | MCP | Knowledge-grounded responses |
| Planner/Executor | Plan → execute steps with caps | Planner prompts, step budget | MCP (+A2A) | Multi-step problems with bounded autonomy |
| Multi-Agent (Orchestrated) | Delegate → merge → validate | Handoff contracts, eval gates | A2A | Specialization with explicit handoffs |
| OS Agent | Observe UI → act → verify | Sandbox, UI grounding | MCP | Desktop/browser control under strict guardrails |
| Code/SWE Agent | Branch → edit → test → PR | Repo access, CI gates | MCP | Coding tasks with review/merge controls |
Tier 1 — Production-Grade
| Framework | Architecture | Best For | Languages | Ease |
|---|---|---|---|---|
| LangGraph | Graph-based, stateful | Enterprise, compliance, auditability | Python, JS | Medium |
| Claude Agent SDK | Event-driven, tool-centric | Anthropic ecosystem, Computer Use, MCP-native | Python, TS | Easy |
| OpenAI Agents SDK | Tool-centric, lightweight | Fast prototyping, OpenAI ecosystem | Python | Easy |
| Google ADK | Code-first, multi-language | Gemini/Vertex AI, polyglot teams | Python, TS, Go, Java | Medium |
| Pydantic AI | Type-safe, graph FSM | Production Python, type safety, MCP+A2A native | Python | Medium |
| MS Agent Framework | Kernel + multi-agent | Enterprise Azure, .NET/Java teams | Python, .NET, Java | Medium |
Tier 2 — Specialized
| Framework | Architecture | Best For | Languages | Ease |
|---|---|---|---|---|
| LlamaIndex | Event-driven workflows | RAG-native agents, retrieval-heavy | Python, TS | Medium |
| CrewAI | Role-based crews | Team workflows, content generation | Python | Easiest |
| Mastra | Vercel AI SDK-based | TypeScript/Next.js teams | TypeScript | Easy |
| SmolAgents | Code-first, minimalist | Lightweight, fewer LLM calls | Python | Easy |
| Agno | FastAPI-native runtime | Production Python, 100+ integrations | Python | Easy |
| AWS Bedrock Agents | Managed infrastructure | Enterprise AWS, knowledge bases | Python | Easy |
Tier 3 — Niche
| Framework | Niche |
|---|---|
| Haystack | Enterprise RAG+agents pipeline (Airbus, NVIDIA) |
| DSPy | Declarative optimization — compiles programs into prompts/weights |
See references/modern-best-practices.md for detailed comparison and selection guide.
references/claude-agent-sdk-patterns.md
Agent definition, built-in tools (Bash, TextEditor, Computer), MCP servers, guardrails, multi-agent, streaming eventsreferences/pydantic-ai-patterns.md
Type-safe agents, MCP toolsets, native A2A, pydantic-graph FSM, durable execution, HITL, TestModel testingWhat does the agent need to do?
├─ Answer questions from knowledge base?
│ ├─ Simple lookup? → RAG Agent (LangChain/LlamaIndex + vector DB)
│ └─ Complex multi-step? → Agentic RAG (iterative retrieval + reasoning)
│
├─ Perform external actions (APIs, tools, functions)?
│ ├─ 1-3 tools, linear flow? → Tool-Using Agent (LangGraph + MCP)
│ └─ Complex workflows, branching? → Planning Agent (ReAct/Plan-Execute)
│
├─ Write/modify code autonomously?
│ ├─ Single file edits? → Tool-Using Agent with code tools
│ └─ Multi-file, issue resolution? → Code/SWE Agent (HyperAgent pattern)
│
├─ Delegate tasks to specialists?
│ ├─ Fixed workflow? → Multi-Agent Sequential (A → B → C)
│ ├─ Manager-Worker? → Multi-Agent Hierarchical (Manager + Workers)
│ └─ Dynamic routing? → Multi-Agent Group Chat (collaborative)
│
├─ Control desktop/browser?
│ └─ OS Agent (Anthropic Computer Use + MCP for system access)
│
└─ Hybrid (combination of above)?
└─ Planning Agent that coordinates:
- Tool-using for actions (MCP)
- RAG for knowledge (MCP)
- Multi-agent for delegation (A2A)
- Code agents for implementation
Protocol Selection:
Framework Selection (after choosing architecture):
Which framework?
├─ MVP/Prototyping?
│ ├─ Python → OpenAI Agents SDK or CrewAI
│ └─ TypeScript → Mastra or Claude Agent SDK
│
├─ Production →
│ ├─ Auditability/compliance? → LangGraph
│ ├─ Type safety + MCP/A2A native? → Pydantic AI
│ ├─ Anthropic models + Computer Use? → Claude Agent SDK
│ ├─ Google Cloud / Gemini? → Google ADK
│ ├─ Azure / .NET / Java? → MS Agent Framework
│ ├─ AWS managed? → Bedrock Agents
│ └─ RAG-heavy? → LlamaIndex Workflows
│
├─ Minimalist / Research →
│ ├─ Fewest LLM calls? → SmolAgents
│ └─ Optimize prompts automatically? → DSPy
│
└─ Enterprise pipeline → Haystack
Do
Avoid
references/build-vs-not-decision.mdreferences/agent-economics.mdFive-layer architecture for production agent systems. Start with the overview, then drill into layer-specific patterns.
AI Engine Architecture — references/ai-engine-layers.md
5-layer composition model, layer interaction matrix, implementation phases
Context Graph Patterns — references/context-graph-patterns.md
Node/edge schema, traversal patterns, graph-RAG, memory tiers, conflict detection
Inbox Engine Patterns — references/inbox-engine-patterns.md
Event-driven intake, signal classification, deduplication, priority routing, dead letter
Knowledge Base Architecture — assets/knowledge-base/kb-architecture.md
Unified KB schema (vector + graph + doc index), provenance, freshness, multi-tenant
Action Graph → covered by
references/operational-patterns.md+references/agent-operations-best-practices.mdData Agent → covered by../ai-rag/SKILL.md+references/rag-patterns.md
references/agent-maturity-governance.mdreferences/modern-best-practices.mdreferences/context-engineering.mdreferences/operational-patterns.mdMCP Practical Guide - references/mcp-practical-guide.md
Building MCP servers, tool integration, and standardized data access
MCP Server Builder - references/mcp-server-builder.md
End-to-end checklist for workflow-focused MCP servers (design → build → test)
A2A Handoff Patterns - references/a2a-handoff-patterns.md
Agent-to-agent communication, task delegation, and coordination protocols
Protocol Decision Tree - references/protocol-decision-tree.md
When to use MCP vs A2A, decision framework, and selection criteria
Agent Operations - references/agent-operations-best-practices.md
Action loops, planning, observation, and execution patterns
RAG Patterns - references/rag-patterns.md
Contextual retrieval, agentic RAG, and hybrid search strategies
Memory Systems - references/memory-systems.md
Session, long-term, episodic, and task memory architectures
Tool Design & Validation - references/tool-design-specs.md
Tool schemas, validation, error handling, and MCP integration
Skill Lifecycle - references/skill-lifecycle.md
Scaffold, validate, package, and share skills with teams (Slack-ready)
API Contracts for Agents - references/api-contracts-for-agents.md
Request/response envelopes, safety gates, streaming/async patterns, error taxonomy
Multi-Agent Patterns - references/multi-agent-patterns.md
Manager-worker, sequential, handoff, and group chat orchestration
OS Agent Capabilities - references/os-agent-capabilities.md
Desktop automation, UI grounding, and computer use patterns
Code/SWE Agents - references/code-swe-agents.md
SE 3.0 paradigm, autonomous coding patterns, SWE-Bench, HyperAgent architecture
references/pydantic-ai-patterns.md
Type-safe agents, MCP toolsets (Stdio/SSE/StreamableHTTP), A2A via to_a2a(), pydantic-graph FSM, durable execution, TestModel testingEvaluation & Observability - references/evaluation-and-observability.md
OpenTelemetry GenAI, metrics, LLM-as-judge, and monitoring
Deployment, CI/CD & Safety - references/deployment-ci-cd-and-safety.md
Multi-layer guardrails, HITL controls, NIST AI RMF, production checklists
Agent Debugging Patterns - references/agent-debugging-patterns.md
Systematic debugging for agentic systems: trace analysis, tool call failures, loop detection, state corruption
Voice & Multimodal Agents - references/voice-multimodal-agents.md
Voice-first and multimodal agent patterns: speech pipelines, vision grounding, cross-modal orchestration
Guardrails Implementation - references/guardrails-implementation.md
Multi-layer guardrail patterns: input/output validation, content filtering, PII detection, cost caps
assets/checklists/agent-safety-checklist.md
Go/No-Go safety gate: permissions, HITL triggers, eval gates, observability, rollbackStandard Agent Template - assets/core/agent-template-standard.md
Full production spec: memory, tools, RAG, evaluation, observability, safety
Specialized Agent Template - assets/core/agent-template-specialized.md
Domain-specific agents with custom capabilities and constraints
Quick Agent Template - assets/core/agent-template-quick.md
Minimal viable agent for rapid prototyping
Basic RAG - assets/rag/rag-basic.md
Simple retrieval-augmented generation pipeline
Advanced RAG - assets/rag/rag-advanced.md
Contextual retrieval, reranking, and agentic RAG patterns
Hybrid Retrieval - assets/rag/hybrid-retrieval.md
Semantic + keyword search with BM25 fusion
Tool Definition - assets/tools/tool-definition.md
MCP-compatible tool schemas with validation and error handling
Tool Validation Checklist - assets/tools/tool-validation-checklist.md
Testing, security, and production readiness checks
Manager-Worker Template - assets/multi-agent/manager-worker-template.md
Orchestration pattern with task delegation and result aggregation
Evaluator-Router Template - assets/multi-agent/evaluator-router-template.md
Dynamic routing with quality assessment and domain classification
../dev-api-design/assets/fastapi/fastapi-complete-api.md
Auth, pagination, validation, error handling; extend with model lifespan loads, SSE, background tasksdata/sources.json
Authoritative sources spanning standards, protocols, and production agent frameworksCC-*) for citationIMPORTANT: When users ask framework recommendations or "what's best for X" questions, use WebSearch to verify current landscape before answering. If unavailable, use data/sources.json and state what was verified vs assumed.
Trigger: framework comparisons, "best for [use case]", "is X still relevant?", "latest in AI agents", MCP server availability.
Report: current landscape, emerging trends, deprecated patterns, recommendation with rationale.
This skill integrates with complementary skills:
../ai-llm/ - LLM patterns, prompt engineering, and model selection for agents../ai-rag/ - Deep RAG implementation: chunking, embedding, reranking../ai-prompt-engineering/ - System prompt design, few-shot patterns, reasoning strategies../qa-observability/ - OpenTelemetry, metrics, distributed tracing../software-security-appsec/ - OWASP Top 10, input validation, secure tool design../ops-devops-platform/ - CI/CD pipelines, deployment strategies, infrastructure../dev-api-design/ - REST/GraphQL design for agent APIs and tool interfaces../ai-mlops/ - Model deployment, monitoring, drift detection../qa-debugging/ - Agent debugging, error analysis, root cause investigation../dev-ai-coding-metrics/ - Team-level AI coding metrics: adoption, DORA/SPACE, ROI, DX surveys (this skill covers per-task agent economics)Usage pattern: Start here for agent architecture, then reference specialized skills for deep implementation details.
data/sources.json for authoritative documentation linksassets/agent-template-ainative-sdlc.md for the Delegate → Review → Own runbook (guardrails + outputs checklist).