Comprehensive debugging specialist for errors, test failures, log analysis, and system problems...
This skill provides comprehensive debugging capabilities for identifying and fixing errors, test failures, unexpected behavior, and production issues. It combines general debugging workflows with specialized error analysis, log parsing, and pattern recognition.
This skill includes Python helper scripts in scripts/:
parse_logs.py: Parses log files and extracts errors, exceptions, and stack traces. Outputs JSON with error analysis and pattern detection.
python scripts/parse_logs.py /var/log/app.log
Debug this error: TypeError: Cannot read property 'x' of undefined
Investigate why the test is failing in test_user_service.js
Analyze the error logs in /var/log/app.log and identify the root cause
Investigate why the API is returning 500 errors
Find patterns in these error logs from the past 24 hours
Correlate errors between the API service and database
Error Message:
Stack Trace:
Context:
Using Helper Script:
The skill includes a Python helper script for parsing logs:
# Parse log file and extract errors
python scripts/parse_logs.py /var/log/app.log
Manual Log Parsing Patterns:
# Extract errors from logs
grep -i "error\|exception\|fatal\|critical" /var/log/app.log
# Extract stack traces
grep -A 20 "Exception\|Error\|Traceback" /var/log/app.log
# Extract specific error types
grep "TypeError\|ReferenceError\|SyntaxError" /var/log/app.log
Structured Log Parsing:
// Parse JSON logs
const errors = logs
.filter(log => log.level === 'error' || log.level === 'critical')
.map(log => ({
timestamp: log.timestamp,
message: log.message,
stack: log.stack,
context: log.context
}));
Common Patterns:
JavaScript/Node.js:
Error: Cannot read property 'x' of undefined
at FunctionName (file.js:123:45)
at AnotherFunction (file.js:456:78)
Python:
Traceback (most recent call last):
File "app.py", line 123, in function_name
result = process(data)
File "utils.py", line 45, in process
return data['key']
KeyError: 'key'
Java:
java.lang.NullPointerException
at com.example.Class.method(Class.java:123)
at com.example.AnotherClass.call(AnotherClass.java:456)
Timeline Analysis:
Service Correlation:
Common Error Patterns:
N+1 Query Problem:
Multiple database queries in loop
Pattern: SELECT * FROM users; SELECT * FROM posts WHERE user_id = ?
Memory Leaks:
Gradually increasing memory usage
Pattern: Memory growth over time without release
Race Conditions:
Intermittent failures under load
Pattern: Errors only occur with concurrent requests
Timeout Issues:
Requests timing out
Pattern: Errors after specific duration (e.g., 30s)
Reproduction Steps:
Example:
## Reproduction Steps
1. Navigate to `/users/123`
2. Click "Edit Profile"
3. Submit form without filling required fields
4. Error occurs: "Cannot read property 'validate' of undefined"
Code Analysis:
Debugging Techniques:
Hypothesis Formation:
Testing Hypotheses:
Investigation Steps:
Analysis Framework:
## Error Analysis
**Error**: [Description]
**Frequency**: [How often]
**Timeline**: [When it started]
**Affected Services**: [Which services]
**User Impact**: [How many users affected]
**Root Cause Hypothesis**:
- [Primary hypothesis with evidence]
- [Alternative hypotheses]
**Evidence**:
- [Log entries supporting hypothesis]
- [Error patterns observed]
- [Correlation with other events]
**Recommended Actions**:
- [Immediate fix]
- [Long-term prevention]
Fix Principles:
Fix Verification:
Documentation Should Include:
Input: Error: TypeError: Cannot read property 'name' of undefined
Output:
## Debugging Analysis
### Error Information
- **Type**: TypeError
- **Location**: `src/services/user.js:45`
- **Message**: Cannot read property 'name' of undefined
### Root Cause
The `getUser()` function returns `undefined` when a user is not found, but the code assumes it always returns a user object.
### Code Location
```javascript
// src/services/user.js:45
const user = getUser(userId);
console.log(user.name); // ❌ Error: user is undefined
// Add null check
const user = getUser(userId);
if (!user) {
throw new Error(`User ${userId} not found`);
}
console.log(user.name); // ✅ Safe
user?.name
### Example 2: Database Connection Errors (Log Analysis)
**Input**: Error logs showing database connection failures
**Output**:
```markdown
## Error Analysis: Database Connection Failures
### Pattern Detected
**Error**: `ECONNREFUSED` errors from PostgreSQL
**Frequency**: 15 errors/minute, started 2 hours ago
**Timeline**: Errors began after deployment at 14:30
### Root Cause Analysis
**Hypothesis**: Connection pool exhaustion
**Evidence**:
- Errors correlate with high traffic period (14:30-16:30)
- Connection pool size: 10, concurrent requests: 50+
- No connection cleanup in error handlers
- Errors spike during peak usage
**Code Location**: `src/db/connection.js:45`
**Fix**:
```javascript
// Add connection cleanup
try {
const result = await query(sql);
return result;
} catch (error) {
// Ensure connection is released
await releaseConnection();
throw error;
}
Monitoring Query:
SELECT count(*) FROM pg_stat_activity WHERE state = 'active';
## Reference Files
For detailed debugging workflows, error patterns, and techniques, load reference files as needed:
- **`references/debugging_workflows.md`** - Common debugging workflows by issue type, language-specific debugging, debugging techniques, debugging checklists, and common error patterns (database errors, memory leaks, race conditions, timeouts, authentication errors, network errors, application errors, performance errors)
- **`references/INCIDENT_POSTMORTEM.template.md`** - Incident postmortem template with timeline, root cause analysis, and action items
When debugging specific types of issues or analyzing error patterns, load `references/debugging_workflows.md` and refer to the relevant section.
## Best Practices
### Debugging Approach
1. **Start with Symptoms**: Understand what's wrong before jumping to solutions
2. **Work Backward**: Trace from error to cause
3. **Test Hypotheses**: Don't assume, verify
4. **Minimal Changes**: Fix only what's necessary
5. **Verify Fixes**: Always test that the fix works
### Log Analysis Techniques
1. **Use Structured Logging**: JSON logs are easier to parse and analyze
2. **Include Context**: Add request IDs, user IDs, timestamps to all logs
3. **Log Levels**: Use appropriate levels (error, warn, info, debug)
4. **Correlation IDs**: Use request IDs to trace errors across services
5. **Error Grouping**: Group similar errors to identify patterns
### Error Pattern Recognition
**Time-Based Patterns:**
- Errors at specific times (deployment windows, peak hours)
- Errors after specific duration (timeouts, memory leaks)
- Errors during specific events (database migrations, cache clears)
**Frequency Patterns:**
- Sudden spikes (deployment issues, traffic spikes)
- Gradual increases (memory leaks, resource exhaustion)
- Intermittent (race conditions, timing issues)
**Correlation Patterns:**
- Errors in multiple services simultaneously (infrastructure issues)
- Errors after specific user actions (application bugs)
- Errors correlated with external services (dependency issues)
### Common Debugging Patterns
**Null/Undefined Checks:**
```javascript
// Always check for null/undefined
if (!value) {
// Handle missing value
}
Error Handling:
try {
// Risky operation
} catch (error) {
// Log error with context
console.error('Operation failed:', error);
// Handle gracefully
}
Logging:
// Strategic logging
console.log('Before operation:', { userId, data });
const result = await operation();
console.log('After operation:', { result });
Type Checking:
// Verify types
if (typeof value !== 'string') {
throw new TypeError('Expected string');
}
Error Rate Monitoring:
// Track error rate over time
const errorRate = errors.length / totalRequests;
if (errorRate > 0.01) { // 1% error rate threshold
alert('High error rate detected');
}
Error Alerting:
