multi-agent-observability

melodic-software/multi-agent-observability

AI & ML

About

SKILL.md

multi-agent-observability

melodic-software/multi-agent-observability

AI & ML

About

Build observability interfaces for multi-agent systems. Use when monitoring multi-agent execution, tracking agent metrics, implementing logging for parallel agents, or debugging agent workflows.

SKILL.md

Multi-Agent Observability Skill

Build observability interfaces for monitoring and measuring multi-agent systems.

Purpose

Guide the design and implementation of observability layers that provide real-time visibility into multi-agent execution.

When to Use

Designing monitoring for agent fleets
Building metrics dashboards
Implementing logging architecture
Creating cost tracking systems

Prerequisites

Understanding of the Three Pillars (@three-pillars-orchestration.md)
Familiarity with results-oriented patterns (@results-oriented-engineering.md)
Access to Claude Agent SDK documentation

SDK Requirement

Implementation Note: Full observability requires Claude Agent SDK with custom MCP tools and UI components. This skill provides design patterns.

The Critical Principle

"If you can't measure it, you can't improve it. If you can't measure it, you can't scale it."

What to Observe

Per-Agent Metrics

Metric	Purpose	How to Track
Status	Know state	Agent state enum
Context usage	Token consumption	API response
Cost	Financial impact	API usage data
Tool calls	What it's doing	Hook logging
Results	Output verification	Result parsing
Duration	Execution time	Timestamps

Aggregate Metrics

Metric	Purpose	Calculation
Total agents	Scale	Count active
Total duration	End-to-end time	First to last
Total cost	Financial total	Sum per-agent
Success rate	Reliability	Success / total
Coverage	Scope	Files touched

Observability Components

1. Agent Cards

Real-time status for each agent:

┌─────────────────────────────────────┐
│ scout_1                 [EXECUTING] │
├─────────────────────────────────────┤
│ Template: scout-fast                │
│ Model: haiku                        │
│ Context: 12,500 / 100,000 tokens    │
│ Cost: $0.05                         │
│ Duration: 45s                       │
│ Tool calls: 15                      │
└─────────────────────────────────────┘

Required fields:

Agent ID and template
Status (idle, executing, complete, error)
Model being used
Context usage (current / max)
Running cost
Execution duration
Tool call count

2. Event Stream

Real-time log of all activities:

[10:30:00] scout_1 created (template: scout-fast)
[10:30:01] scout_1 commanded: "Analyze auth module"
[10:30:05] scout_1 Read: src/auth/login.ts
[10:30:08] scout_1 Grep: "password" in src/auth/
[10:30:15] scout_1 completed (duration: 14s)
[10:30:16] scout_1 deleted

Event types:

Agent lifecycle (create, delete)
Commands sent
Tool calls
Status changes
Errors

3. Cost Tracking

Track spend per agent and total:

Cost Summary
────────────────────────────────────
scout_1 (haiku)      $0.05
scout_2 (haiku)      $0.04
builder_1 (sonnet)   $0.35
reviewer_1 (sonnet)  $0.12
────────────────────────────────────
Total                $0.56
Budget remaining     $4.44 (89%)

Cost components:

Input tokens
Output tokens
Per-agent breakdown
Running total
Budget tracking

4. Result Inspector

View consumed and produced assets:

Agent: builder_1

Consumed Assets:
├── Scout report (summary)
├── src/auth/middleware.ts
└── package.json

Produced Assets:
├── src/auth/rate-limit.ts (created)
├── src/auth/middleware.ts (modified)
└── tests/rate-limit.test.ts (created)

Summary: "Implemented rate limiting middleware"
Status: completed

5. Log Viewer

Filterable activity history:

Filters: [agent: all] [level: all] [tool: all]

10:30:00 INFO  scout_1   Created from template
10:30:01 INFO  scout_1   Received command
10:30:05 DEBUG scout_1   Read: src/auth/login.ts (1,200 tokens)
10:30:08 DEBUG scout_1   Grep: found 5 matches
10:30:12 WARN  scout_1   Context at 80% capacity
10:30:15 INFO  scout_1   Completed successfully

Implementation Patterns

Logging Architecture

# Event types
class AgentEvent:
    timestamp: datetime
    agent_id: str
    event_type: str  # create, command, tool, status, error
    details: dict

# Log collector
def log_event(event: AgentEvent):
    # Store to database
    db.events.insert(event)
    # Emit to WebSocket
    ws.broadcast(event)
    # Update metrics
    metrics.update(event)

Real-Time Updates

# WebSocket for live updates
async def agent_status_stream(agent_id):
    while agent_active(agent_id):
        status = get_agent_status(agent_id)
        yield status
        await asyncio.sleep(1)

Cost Calculation

def calculate_cost(usage):
    input_cost = usage.input_tokens * MODEL_INPUT_PRICE
    output_cost = usage.output_tokens * MODEL_OUTPUT_PRICE
    return input_cost + output_cost

UI Components

Minimal CLI View

Orchestration: Add rate limiting
────────────────────────────────────
Agents: 3 active | 2 complete | 0 error
Cost: $0.56 / $5.00 budget
Progress: ████████░░ 80%

[scout_1] ✓ complete (14s)
[scout_2] ✓ complete (12s)
[builder] ⚡ executing (45s)

Rich Dashboard View

┌─────────────────────────────────────────────────────────────┐
│                    Orchestration Dashboard                    │
├─────────────────────────────────────────────────────────────┤
│ Task: Add rate limiting to authentication                    │
│ Started: 10:30:00 | Duration: 2m 15s | Cost: $0.56          │
├─────────────────────────────────────────────────────────────┤
│ Agent Fleet                          │ Event Stream          │
│ ┌─────────────────────────────────┐  │ [10:32:15] builder   │
│ │ scout_1        [✓ complete]    │  │   Write: rate-limit  │
│ │ scout_2        [✓ complete]    │  │ [10:32:10] builder   │
│ │ builder        [⚡ executing]   │  │   Read: middleware   │
│ │ reviewer       [○ pending]     │  │ [10:30:15] scout_2   │
│ └─────────────────────────────────┘  │   completed          │
├─────────────────────────────────────────────────────────────┤
│ Cost Breakdown     │ Results Summary                         │
│ haiku:  $0.09     │ Files read: 8                           │
│ sonnet: $0.47     │ Files written: 3                        │
│ Total:  $0.56     │ Tests: 5/5 passing                      │
└─────────────────────────────────────────────────────────────┘

Design Checklist

Per-agent metrics defined
Aggregate metrics calculated
Event logging implemented
Real-time updates via WebSocket
Cost tracking per agent
Result inspection available
Log filtering supported
UI components designed

Output Format

When designing observability, provide:

## Observability Design

### Metrics

**Per-Agent:**
[List with tracking method]

**Aggregate:**
[List with calculation]

### Components

**Agent Cards:** [fields and update frequency]
**Event Stream:** [event types and storage]
**Cost Tracking:** [breakdown and budgets]
**Result Inspector:** [consumed/produced format]
**Log Viewer:** [filters and retention]

### Implementation

**Logging:** [architecture]
**Real-Time:** [WebSocket design]
**Storage:** [database schema]
**UI:** [component specifications]

Anti-Patterns

Anti-Pattern	Problem	Solution
No metrics	Flying blind	Track everything
Delayed updates	Stale status	Real-time WebSocket
No cost tracking	Budget overruns	Per-agent costs
Missing logs	Can't debug	Log all events
No aggregation	Can't summarize	Calculate totals

Cross-References

@three-pillars-orchestration.md - Observability pillar
@results-oriented-engineering.md - Result patterns
@agent-lifecycle-crud.md - Agent state tracking
@orchestrator-design skill - System architecture

Version History

v1.0.0 (2025-12-26): Initial release

Last Updated

Date: 2025-12-26 Model: claude-opus-4-5-20251101

About

SKILL.md

About

Build observability interfaces for multi-agent systems. Use when monitoring multi-agent execution, tracking agent metrics, implementing logging for parallel agents, or debugging agent workflows.

SKILL.md

Multi-Agent Observability Skill

Build observability interfaces for monitoring and measuring multi-agent systems.

Purpose

Guide the design and implementation of observability layers that provide real-time visibility into multi-agent execution.

When to Use

Designing monitoring for agent fleets
Building metrics dashboards
Implementing logging architecture
Creating cost tracking systems

Prerequisites

Understanding of the Three Pillars (@three-pillars-orchestration.md)
Familiarity with results-oriented patterns (@results-oriented-engineering.md)
Access to Claude Agent SDK documentation

SDK Requirement

Implementation Note: Full observability requires Claude Agent SDK with custom MCP tools and UI components. This skill provides design patterns.

The Critical Principle

"If you can't measure it, you can't improve it. If you can't measure it, you can't scale it."

What to Observe

Per-Agent Metrics

Metric	Purpose	How to Track
Status	Know state	Agent state enum
Context usage	Token consumption	API response
Cost	Financial impact	API usage data
Tool calls	What it's doing	Hook logging
Results	Output verification	Result parsing
Duration	Execution time	Timestamps

Aggregate Metrics

Metric	Purpose	Calculation
Total agents	Scale	Count active
Total duration	End-to-end time	First to last
Total cost	Financial total	Sum per-agent
Success rate	Reliability	Success / total
Coverage	Scope	Files touched

Observability Components

1. Agent Cards

Real-time status for each agent:

┌─────────────────────────────────────┐
│ scout_1                 [EXECUTING] │
├─────────────────────────────────────┤
│ Template: scout-fast                │
│ Model: haiku                        │
│ Context: 12,500 / 100,000 tokens    │
│ Cost: $0.05                         │
│ Duration: 45s                       │
│ Tool calls: 15                      │
└─────────────────────────────────────┘

Required fields:

Agent ID and template
Status (idle, executing, complete, error)
Model being used
Context usage (current / max)
Running cost
Execution duration
Tool call count

2. Event Stream

Real-time log of all activities:

[10:30:00] scout_1 created (template: scout-fast)
[10:30:01] scout_1 commanded: "Analyze auth module"
[10:30:05] scout_1 Read: src/auth/login.ts
[10:30:08] scout_1 Grep: "password" in src/auth/
[10:30:15] scout_1 completed (duration: 14s)
[10:30:16] scout_1 deleted

Event types:

Agent lifecycle (create, delete)
Commands sent
Tool calls
Status changes
Errors

3. Cost Tracking

Track spend per agent and total:

Cost Summary
────────────────────────────────────
scout_1 (haiku)      $0.05
scout_2 (haiku)      $0.04
builder_1 (sonnet)   $0.35
reviewer_1 (sonnet)  $0.12
────────────────────────────────────
Total                $0.56
Budget remaining     $4.44 (89%)

Cost components:

Input tokens
Output tokens
Per-agent breakdown
Running total
Budget tracking

4. Result Inspector

View consumed and produced assets:

Agent: builder_1

Consumed Assets:
├── Scout report (summary)
├── src/auth/middleware.ts
└── package.json

Produced Assets:
├── src/auth/rate-limit.ts (created)
├── src/auth/middleware.ts (modified)
└── tests/rate-limit.test.ts (created)

Summary: "Implemented rate limiting middleware"
Status: completed

5. Log Viewer

Filterable activity history:

Filters: [agent: all] [level: all] [tool: all]

10:30:00 INFO  scout_1   Created from template
10:30:01 INFO  scout_1   Received command
10:30:05 DEBUG scout_1   Read: src/auth/login.ts (1,200 tokens)
10:30:08 DEBUG scout_1   Grep: found 5 matches
10:30:12 WARN  scout_1   Context at 80% capacity
10:30:15 INFO  scout_1   Completed successfully

Implementation Patterns

Logging Architecture

# Event types
class AgentEvent:
    timestamp: datetime
    agent_id: str
    event_type: str  # create, command, tool, status, error
    details: dict

# Log collector
def log_event(event: AgentEvent):
    # Store to database
    db.events.insert(event)
    # Emit to WebSocket
    ws.broadcast(event)
    # Update metrics
    metrics.update(event)

Real-Time Updates

# WebSocket for live updates
async def agent_status_stream(agent_id):
    while agent_active(agent_id):
        status = get_agent_status(agent_id)
        yield status
        await asyncio.sleep(1)

Cost Calculation

def calculate_cost(usage):
    input_cost = usage.input_tokens * MODEL_INPUT_PRICE
    output_cost = usage.output_tokens * MODEL_OUTPUT_PRICE
    return input_cost + output_cost

UI Components

Minimal CLI View

Orchestration: Add rate limiting
────────────────────────────────────
Agents: 3 active | 2 complete | 0 error
Cost: $0.56 / $5.00 budget
Progress: ████████░░ 80%

[scout_1] ✓ complete (14s)
[scout_2] ✓ complete (12s)
[builder] ⚡ executing (45s)

Rich Dashboard View

┌─────────────────────────────────────────────────────────────┐
│                    Orchestration Dashboard                    │
├─────────────────────────────────────────────────────────────┤
│ Task: Add rate limiting to authentication                    │
│ Started: 10:30:00 | Duration: 2m 15s | Cost: $0.56          │
├─────────────────────────────────────────────────────────────┤
│ Agent Fleet                          │ Event Stream          │
│ ┌─────────────────────────────────┐  │ [10:32:15] builder   │
│ │ scout_1        [✓ complete]    │  │   Write: rate-limit  │
│ │ scout_2        [✓ complete]    │  │ [10:32:10] builder   │
│ │ builder        [⚡ executing]   │  │   Read: middleware   │
│ │ reviewer       [○ pending]     │  │ [10:30:15] scout_2   │
│ └─────────────────────────────────┘  │   completed          │
├─────────────────────────────────────────────────────────────┤
│ Cost Breakdown     │ Results Summary                         │
│ haiku:  $0.09     │ Files read: 8                           │
│ sonnet: $0.47     │ Files written: 3                        │
│ Total:  $0.56     │ Tests: 5/5 passing                      │
└─────────────────────────────────────────────────────────────┘

Design Checklist

Per-agent metrics defined
Aggregate metrics calculated
Event logging implemented
Real-time updates via WebSocket
Cost tracking per agent
Result inspection available
Log filtering supported
UI components designed

Output Format

When designing observability, provide:

## Observability Design

### Metrics

**Per-Agent:**
[List with tracking method]

**Aggregate:**
[List with calculation]

### Components

**Agent Cards:** [fields and update frequency]
**Event Stream:** [event types and storage]
**Cost Tracking:** [breakdown and budgets]
**Result Inspector:** [consumed/produced format]
**Log Viewer:** [filters and retention]

### Implementation

**Logging:** [architecture]
**Real-Time:** [WebSocket design]
**Storage:** [database schema]
**UI:** [component specifications]

Anti-Patterns

Anti-Pattern	Problem	Solution
No metrics	Flying blind	Track everything
Delayed updates	Stale status	Real-time WebSocket
No cost tracking	Budget overruns	Per-agent costs
Missing logs	Can't debug	Log all events
No aggregation	Can't summarize	Calculate totals

Cross-References

@three-pillars-orchestration.md - Observability pillar
@results-oriented-engineering.md - Result patterns
@agent-lifecycle-crud.md - Agent state tracking
@orchestrator-design skill - System architecture

Version History

v1.0.0 (2025-12-26): Initial release

Last Updated

Date: 2025-12-26 Model: claude-opus-4-5-20251101