bug-investigation

resper1965/bug-investigation

Coding

1 installs

About

SKILL.md

bug-investigation

resper1965/bug-investigation

Coding

1 installs

About

Systematic bug investigation and root cause analysis

SKILL.md

Bug Investigation Skill

When to Use

Use this skill when:

Debugging extraction failures
Investigating classification errors
Analyzing search performance issues
Troubleshooting database problems

Debugging Workflow

1. Reproduce the Issue

Identify failing PDF or operation
Reproduce in isolation
Collect error messages

2. Check Extraction Metrics

Use metrics.py to check:

Extraction success rate
Methods used (pdfplumber vs OCR)
Error patterns

3. Review Logs

Error messages in console
Database error logs
Processing statistics

Common Bug Patterns

PDF Extraction Failures

Symptoms:

"Sem texto extraível"
Empty content in database
OCR not triggered when needed

Investigation:

Check if PDF is scanned (images)
Verify OCR is installed and working
Test extraction manually
Check file permissions

Classification Errors

Symptoms:

Documents classified as "outros"
Incorrect contract number extraction
Missing document numbers

Investigation:

Check filename pattern
Test regex patterns
Verify classification logic
Review expected vs actual output

Database Issues

Symptoms:

Duplicate key errors
FTS5 index not updating
Missing data in results

Investigation:

Check filepath uniqueness
Verify triggers are working
Test queries directly
Check database schema

Search Problems

Symptoms:

No results found
Incorrect results
Performance issues

Investigation:

Verify FTS5 index exists
Test query syntax
Check content was indexed
Review filter logic

Logging and Error Handling

Error Logging Pattern

try:
    text = extract_text_from_pdf(full_path)
except Exception as e:
    print(f"   ❌ Erro ao processar {file}: {e}")
    # Log error with context
    errors += 1

Debugging Checklist

Error message clear and helpful
Context information logged
Error doesn't crash entire process
Metrics track failures

Test Verification Steps

For Extraction Bugs

Test with sample PDF
Verify extraction method used
Check text length
Validate OCR if used

For Classification Bugs

Test classification function directly
Verify regex matches
Check fallback logic
Compare with expected result

Bug Fix Examples

Fix: OCR Not Triggering

Root Cause: OCR check happens after text validation

Fix: Move OCR check before validation failure

Fix: Classification Fails

Root Cause: Regex doesn't match all patterns

Fix: Improve regex or add alternative patterns

About

SKILL.md

About

Systematic bug investigation and root cause analysis

SKILL.md

Bug Investigation Skill

When to Use

Use this skill when:

Debugging extraction failures
Investigating classification errors
Analyzing search performance issues
Troubleshooting database problems

Debugging Workflow

1. Reproduce the Issue

Identify failing PDF or operation
Reproduce in isolation
Collect error messages

2. Check Extraction Metrics

Use metrics.py to check:

Extraction success rate
Methods used (pdfplumber vs OCR)
Error patterns

3. Review Logs

Error messages in console
Database error logs
Processing statistics

Common Bug Patterns

PDF Extraction Failures

Symptoms:

"Sem texto extraível"
Empty content in database
OCR not triggered when needed

Investigation:

Check if PDF is scanned (images)
Verify OCR is installed and working
Test extraction manually
Check file permissions

Classification Errors

Symptoms:

Documents classified as "outros"
Incorrect contract number extraction
Missing document numbers

Investigation:

Check filename pattern
Test regex patterns
Verify classification logic
Review expected vs actual output

Database Issues

Symptoms:

Duplicate key errors
FTS5 index not updating
Missing data in results

Investigation:

Check filepath uniqueness
Verify triggers are working
Test queries directly
Check database schema

Search Problems

Symptoms:

No results found
Incorrect results
Performance issues

Investigation:

Verify FTS5 index exists
Test query syntax
Check content was indexed
Review filter logic

Logging and Error Handling

Error Logging Pattern

try:
    text = extract_text_from_pdf(full_path)
except Exception as e:
    print(f"   ❌ Erro ao processar {file}: {e}")
    # Log error with context
    errors += 1

Debugging Checklist

Error message clear and helpful
Context information logged
Error doesn't crash entire process
Metrics track failures

Test Verification Steps

For Extraction Bugs

Test with sample PDF
Verify extraction method used
Check text length
Validate OCR if used

For Classification Bugs

Test classification function directly
Verify regex matches
Check fallback logic
Compare with expected result

Bug Fix Examples

Fix: OCR Not Triggering

Root Cause: OCR check happens after text validation

Fix: Move OCR check before validation failure

Fix: Classification Fails

Root Cause: Regex doesn't match all patterns

Fix: Improve regex or add alternative patterns