observability

akaszubski/observability

DevOps

2 installs

About

SKILL.md

observability

akaszubski/observability

DevOps

2 installs

About

Structured logging, debugging techniques (pdb/ipdb), profiling (cProfile/line_profiler), stack traces, performance monitoring, and metrics...

SKILL.md

Observability Skill

Comprehensive guide to logging, debugging, profiling, and performance monitoring in Python applications.

When This Skill Activates

Adding logging to code
Debugging production issues
Profiling performance bottlenecks
Monitoring application metrics
Analyzing stack traces
Performance optimization
Keywords: "logging", "debug", "profiling", "performance", "monitoring"

Core Concepts

1. Structured Logging

Structured logging with JSON format for machine-readable logs and rich context.

Why Structured Logging?

Machine-parseable (easy to search, filter, aggregate)
Context-rich (attach metadata to log entries)
Consistent format across services

Key Features:

JSON-formatted logs
Log levels (DEBUG, INFO, WARNING, ERROR, CRITICAL)
Context logging with extra metadata
Best practices for meaningful logs

Example:

import logging
import json

logger = logging.getLogger(__name__)
logger.info("User action", extra={
    "user_id": 123,
    "action": "login",
    "ip": "192.168.1.1"
})

See: docs/structured-logging.md for Python logging setup and patterns

2. Debugging Techniques

Interactive debugging with pdb/ipdb and effective debugging strategies.

Tools:

Print debugging - Quick and simple
pdb - Python's built-in debugger
ipdb - IPython-enhanced debugger
Post-mortem debugging - Debug after crash

pdb Commands:

n (next) - Execute current line
s (step) - Step into function
c (continue) - Continue execution
p variable - Print variable value
l - List source code
q - Quit debugger

Example:

import pdb; pdb.set_trace()  # Debugger starts here

See: docs/debugging.md for interactive debugging patterns

3. Profiling

CPU and memory profiling to identify performance bottlenecks.

Tools:

cProfile - CPU profiling (built-in)
line_profiler - Line-by-line CPU profiling
memory_profiler - Memory usage analysis
py-spy - Sampling profiler (no code changes)

cProfile Example:

python -m cProfile -s cumulative script.py

Profile Decorator:

import cProfile
import pstats

def profile(func):
    def wrapper(*args, **kwargs):
        profiler = cProfile.Profile()
        profiler.enable()
        result = func(*args, **kwargs)
        profiler.disable()
        stats = pstats.Stats(profiler)
        stats.sort_stats('cumulative')
        stats.print_stats(10)  # Top 10 functions
        return result
    return wrapper

@profile
def slow_function():
    # Your code here
    pass

See: docs/profiling.md for comprehensive profiling techniques

4. Monitoring & Metrics

Performance monitoring, timing decorators, and simple metrics.

Timing Patterns:

Timing decorator - Measure function execution time
Context manager timer - Measure code block duration
Performance assertions - Fail if too slow

Simple Metrics:

Counters - Track event occurrences
Histograms - Track value distributions

Example:

import time
from functools import wraps

def timer(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        duration = time.time() - start
        print(f"{func.__name__} took {duration:.2f}s")
        return result
    return wrapper

@timer
def process_data():
    # Your code here
    pass

See: docs/monitoring-metrics.md for stack traces, timers, and metrics

5. Best Practices & Anti-Patterns

Debugging strategies and logging anti-patterns to avoid.

Debugging Best Practices:

Binary Search Debugging - Narrow down the problem area
Rubber Duck Debugging - Explain the problem to someone (or something)
Add Assertions - Catch bugs early
Simplify and Isolate - Reproduce with minimal code

Logging Anti-Patterns to Avoid:

Logging sensitive data (passwords, tokens)
Logging in loops (use counters instead)
No context in error logs
Inconsistent log formats
Too verbose logging (noise)

See: docs/best-practices-antipatterns.md for detailed strategies

Quick Reference

Tool	Use Case	Details
Structured Logging	Production logs	`docs/structured-logging.md`
pdb/ipdb	Interactive debugging	`docs/debugging.md`
cProfile	CPU profiling	`docs/profiling.md`
line_profiler	Line-by-line profiling	`docs/profiling.md`
memory_profiler	Memory analysis	`docs/profiling.md`
Timer decorator	Function timing	`docs/monitoring-metrics.md`
Context timer	Code block timing	`docs/monitoring-metrics.md`

Logging Cheat Sheet

import logging

# Setup
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

# Usage
logger.debug("Debug message")       # Detailed diagnostic
logger.info("Info message")         # General information
logger.warning("Warning message")   # Warning (recoverable)
logger.error("Error message")       # Error (handled)
logger.critical("Critical message") # Critical (unrecoverable)

# With context
logger.info("User action", extra={"user_id": 123, "action": "login"})

Debugging Cheat Sheet

# pdb
import pdb; pdb.set_trace()

# ipdb (enhanced)
import ipdb; ipdb.set_trace()

# Post-mortem (debug after crash)
import pdb, sys
try:
    # Your code
    pass
except Exception:
    pdb.post_mortem(sys.exc_info()[2])

Profiling Cheat Sheet

# CPU profiling
python -m cProfile -s cumulative script.py

# Line profiling
kernprof -l -v script.py

# Memory profiling
python -m memory_profiler script.py

# Sampling profiler (no code changes)
py-spy top --pid 12345

Progressive Disclosure

This skill uses progressive disclosure to prevent context bloat:

Index (this file): High-level concepts and quick reference (<500 lines)
Detailed docs: docs/*.md files with implementation details (loaded on-demand)

Available Documentation:

docs/structured-logging.md - Logging setup, levels, JSON format, best practices
docs/debugging.md - Print debugging, pdb/ipdb, post-mortem debugging
docs/profiling.md - cProfile, line_profiler, memory_profiler, py-spy
docs/monitoring-metrics.md - Stack traces, timing patterns, simple metrics
docs/best-practices-antipatterns.md - Debugging strategies and logging anti-patterns

Cross-References

Related Skills:

error-handling-patterns - Error handling best practices
python-standards - Python coding conventions
testing-guide - Testing and debugging strategies
performance-optimization - Performance tuning techniques

Related Tools:

Python logging - Standard library logging module
pdb/ipdb - Interactive debuggers
cProfile - CPU profiling
memory_profiler - Memory analysis
py-spy - Sampling profiler

Key Takeaways

Use structured logging - JSON format for machine-readable logs
Log at appropriate levels - DEBUG < INFO < WARNING < ERROR < CRITICAL
Include context - Add metadata to logs (user_id, request_id, etc.)
Don't log sensitive data - Passwords, tokens, PII
Use pdb/ipdb for debugging - Interactive debugging is powerful
Profile before optimizing - Measure to find real bottlenecks
Use cProfile for CPU profiling - Identify slow functions
Use line_profiler for line-level profiling - Fine-grained analysis
Use memory_profiler for memory leaks - Track memory usage
Time critical sections - Decorator or context manager
Binary search debugging - Narrow down problem area
Simplify and isolate - Reproduce with minimal code

Hard Rules

FORBIDDEN:

Logging sensitive data (passwords, tokens, API keys) at any level
Using print() for production logging (MUST use structured logging)
Swallowing exceptions silently without logging
except Exception: (or except Exception as e:) without a subsequent raise or logging.exception()/logger.error(..., exc_info=True)
Bare except: pass — discards exception with zero handling
except Exception: pass — syntactically explicit but semantically identical to bare except: pass
contextlib.suppress() wrapping error-critical operations without inline justification comment
finally blocks that contain return, break, or continue — these suppress any pending exception from the try body

REQUIRED (compliant exception handling MUST use at least one of):

Re-raise: After logging, call raise (bare) or raise NewError(...) from original_exc to propagate the exception
Log with exc_info: logger.error("Operation failed", exc_info=True) or logging.exception("Operation failed") — preserves full stack trace without suppressing
contextlib.suppress() with justification: Acceptable ONLY for genuinely non-critical cleanup operations; MUST include an inline comment explaining why suppression is safe

# COMPLIANT: re-raise after logging
try:
    process(data)
except ValueError as exc:
    logger.error("Invalid data: %s", exc, exc_info=True)
    raise

# COMPLIANT: log with exc_info (caller gets full stack trace in logs)
try:
    send_metric(value)
except ExternalServiceError:
    logger.exception("Metric send failed — continuing without metric")

# COMPLIANT: contextlib.suppress with justification
with contextlib.suppress(FileNotFoundError):
    # Optional cache file; absence is expected on first run
    cache_path.unlink()

# NON-COMPLIANT: silent swallow
try:
    critical_operation()
except Exception:
    pass  # FORBIDDEN

# NON-COMPLIANT: log without exc_info and without re-raise
try:
    critical_operation()
except Exception as e:
    logger.error("Failed: %s", e)  # FORBIDDEN — no stack trace, exception swallowed

About

SKILL.md

About

Structured logging, debugging techniques (pdb/ipdb), profiling (cProfile/line_profiler), stack traces, performance monitoring, and metrics...

SKILL.md

Observability Skill

Comprehensive guide to logging, debugging, profiling, and performance monitoring in Python applications.

When This Skill Activates

Adding logging to code
Debugging production issues
Profiling performance bottlenecks
Monitoring application metrics
Analyzing stack traces
Performance optimization
Keywords: "logging", "debug", "profiling", "performance", "monitoring"

Core Concepts

1. Structured Logging

Structured logging with JSON format for machine-readable logs and rich context.

Why Structured Logging?

Machine-parseable (easy to search, filter, aggregate)
Context-rich (attach metadata to log entries)
Consistent format across services

Key Features:

JSON-formatted logs
Log levels (DEBUG, INFO, WARNING, ERROR, CRITICAL)
Context logging with extra metadata
Best practices for meaningful logs

Example:

import logging
import json

logger = logging.getLogger(__name__)
logger.info("User action", extra={
    "user_id": 123,
    "action": "login",
    "ip": "192.168.1.1"
})

See: docs/structured-logging.md for Python logging setup and patterns

2. Debugging Techniques

Interactive debugging with pdb/ipdb and effective debugging strategies.

Tools:

Print debugging - Quick and simple
pdb - Python's built-in debugger
ipdb - IPython-enhanced debugger
Post-mortem debugging - Debug after crash

pdb Commands:

n (next) - Execute current line
s (step) - Step into function
c (continue) - Continue execution
p variable - Print variable value
l - List source code
q - Quit debugger

Example:

import pdb; pdb.set_trace()  # Debugger starts here

See: docs/debugging.md for interactive debugging patterns

3. Profiling

CPU and memory profiling to identify performance bottlenecks.

Tools:

cProfile - CPU profiling (built-in)
line_profiler - Line-by-line CPU profiling
memory_profiler - Memory usage analysis
py-spy - Sampling profiler (no code changes)

cProfile Example:

python -m cProfile -s cumulative script.py

Profile Decorator:

import cProfile
import pstats

def profile(func):
    def wrapper(*args, **kwargs):
        profiler = cProfile.Profile()
        profiler.enable()
        result = func(*args, **kwargs)
        profiler.disable()
        stats = pstats.Stats(profiler)
        stats.sort_stats('cumulative')
        stats.print_stats(10)  # Top 10 functions
        return result
    return wrapper

@profile
def slow_function():
    # Your code here
    pass

See: docs/profiling.md for comprehensive profiling techniques

4. Monitoring & Metrics

Performance monitoring, timing decorators, and simple metrics.

Timing Patterns:

Timing decorator - Measure function execution time
Context manager timer - Measure code block duration
Performance assertions - Fail if too slow

Simple Metrics:

Counters - Track event occurrences
Histograms - Track value distributions

Example:

import time
from functools import wraps

def timer(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        duration = time.time() - start
        print(f"{func.__name__} took {duration:.2f}s")
        return result
    return wrapper

@timer
def process_data():
    # Your code here
    pass

See: docs/monitoring-metrics.md for stack traces, timers, and metrics

5. Best Practices & Anti-Patterns

Debugging strategies and logging anti-patterns to avoid.

Debugging Best Practices:

Binary Search Debugging - Narrow down the problem area
Rubber Duck Debugging - Explain the problem to someone (or something)
Add Assertions - Catch bugs early
Simplify and Isolate - Reproduce with minimal code

Logging Anti-Patterns to Avoid:

Logging sensitive data (passwords, tokens)
Logging in loops (use counters instead)
No context in error logs
Inconsistent log formats
Too verbose logging (noise)

See: docs/best-practices-antipatterns.md for detailed strategies

Quick Reference

Tool	Use Case	Details
Structured Logging	Production logs	`docs/structured-logging.md`
pdb/ipdb	Interactive debugging	`docs/debugging.md`
cProfile	CPU profiling	`docs/profiling.md`
line_profiler	Line-by-line profiling	`docs/profiling.md`
memory_profiler	Memory analysis	`docs/profiling.md`
Timer decorator	Function timing	`docs/monitoring-metrics.md`
Context timer	Code block timing	`docs/monitoring-metrics.md`

Logging Cheat Sheet

import logging

# Setup
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

# Usage
logger.debug("Debug message")       # Detailed diagnostic
logger.info("Info message")         # General information
logger.warning("Warning message")   # Warning (recoverable)
logger.error("Error message")       # Error (handled)
logger.critical("Critical message") # Critical (unrecoverable)

# With context
logger.info("User action", extra={"user_id": 123, "action": "login"})

Debugging Cheat Sheet

# pdb
import pdb; pdb.set_trace()

# ipdb (enhanced)
import ipdb; ipdb.set_trace()

# Post-mortem (debug after crash)
import pdb, sys
try:
    # Your code
    pass
except Exception:
    pdb.post_mortem(sys.exc_info()[2])

Profiling Cheat Sheet

# CPU profiling
python -m cProfile -s cumulative script.py

# Line profiling
kernprof -l -v script.py

# Memory profiling
python -m memory_profiler script.py

# Sampling profiler (no code changes)
py-spy top --pid 12345

Progressive Disclosure

This skill uses progressive disclosure to prevent context bloat:

Index (this file): High-level concepts and quick reference (<500 lines)
Detailed docs: docs/*.md files with implementation details (loaded on-demand)

Available Documentation:

docs/structured-logging.md - Logging setup, levels, JSON format, best practices
docs/debugging.md - Print debugging, pdb/ipdb, post-mortem debugging
docs/profiling.md - cProfile, line_profiler, memory_profiler, py-spy
docs/monitoring-metrics.md - Stack traces, timing patterns, simple metrics
docs/best-practices-antipatterns.md - Debugging strategies and logging anti-patterns

Cross-References

Related Skills:

error-handling-patterns - Error handling best practices
python-standards - Python coding conventions
testing-guide - Testing and debugging strategies
performance-optimization - Performance tuning techniques

Related Tools:

Python logging - Standard library logging module
pdb/ipdb - Interactive debuggers
cProfile - CPU profiling
memory_profiler - Memory analysis
py-spy - Sampling profiler

Key Takeaways

Use structured logging - JSON format for machine-readable logs
Log at appropriate levels - DEBUG < INFO < WARNING < ERROR < CRITICAL
Include context - Add metadata to logs (user_id, request_id, etc.)
Don't log sensitive data - Passwords, tokens, PII
Use pdb/ipdb for debugging - Interactive debugging is powerful
Profile before optimizing - Measure to find real bottlenecks
Use cProfile for CPU profiling - Identify slow functions
Use line_profiler for line-level profiling - Fine-grained analysis
Use memory_profiler for memory leaks - Track memory usage
Time critical sections - Decorator or context manager
Binary search debugging - Narrow down problem area
Simplify and isolate - Reproduce with minimal code

Hard Rules

FORBIDDEN:

Logging sensitive data (passwords, tokens, API keys) at any level
Using print() for production logging (MUST use structured logging)
Swallowing exceptions silently without logging
except Exception: (or except Exception as e:) without a subsequent raise or logging.exception()/logger.error(..., exc_info=True)
Bare except: pass — discards exception with zero handling
except Exception: pass — syntactically explicit but semantically identical to bare except: pass
contextlib.suppress() wrapping error-critical operations without inline justification comment
finally blocks that contain return, break, or continue — these suppress any pending exception from the try body

REQUIRED (compliant exception handling MUST use at least one of):

Re-raise: After logging, call raise (bare) or raise NewError(...) from original_exc to propagate the exception
Log with exc_info: logger.error("Operation failed", exc_info=True) or logging.exception("Operation failed") — preserves full stack trace without suppressing
contextlib.suppress() with justification: Acceptable ONLY for genuinely non-critical cleanup operations; MUST include an inline comment explaining why suppression is safe

# COMPLIANT: re-raise after logging
try:
    process(data)
except ValueError as exc:
    logger.error("Invalid data: %s", exc, exc_info=True)
    raise

# COMPLIANT: log with exc_info (caller gets full stack trace in logs)
try:
    send_metric(value)
except ExternalServiceError:
    logger.exception("Metric send failed — continuing without metric")

# COMPLIANT: contextlib.suppress with justification
with contextlib.suppress(FileNotFoundError):
    # Optional cache file; absence is expected on first run
    cache_path.unlink()

# NON-COMPLIANT: silent swallow
try:
    critical_operation()
except Exception:
    pass  # FORBIDDEN

# NON-COMPLIANT: log without exc_info and without re-raise
try:
    critical_operation()
except Exception as e:
    logger.error("Failed: %s", e)  # FORBIDDEN — no stack trace, exception swallowed