Qwen Cleanup Strategist (Prototype)
skill_id: qwen_cleanup_strategist_v1_prototype name: qwen_cleanup_strategist description: Strategic cleanup planning with WSP 15 MPS scoring (WSP 83/64 compliance) version: 1.0_prototype author: qwen_baseline_generator created: 2025-10-22 agents: [qwen] primary_agent: qwen intent_type: DECISION promotion_state: prototype pattern_fidelity_threshold: 0.90 test_status: needs_validation
mcp_orchestration: true breadcrumb_logging: true owning_dae: doc_dae execution_phase: 2 previous_skill: gemma_noise_detector_v1_prototype next_skill: 0102_cleanup_validator
inputs:
dependencies: data_stores: - name: gemma_noise_labels type: jsonl path: data/gemma_noise_labels.jsonl mcp_endpoints: - endpoint_name: holo_index methods: [wsp_protocol_lookup] throttles: [] required_context: - gemma_labels: "JSONL file with Gemma's noise classifications" - total_files_scanned: "Count of files Gemma analyzed" - noise_count: "Count of files labeled as noise" - signal_count: "Count of files labeled as signal"
metrics: pattern_fidelity_scoring: enabled: true frequency: every_execution scorer_agent: gemma write_destination: modules/infrastructure/wre_core/recursive_improvement/metrics/qwen_cleanup_strategist_fidelity.json promotion_criteria: min_pattern_fidelity: 0.90 min_outcome_quality: 0.85 min_execution_count: 100 required_test_pass_rate: 0.95
Purpose: Strategic cleanup planning based on Gemma's file classifications, applying WSP 83/64 rules to group files and generate safe cleanup plans
Intent Type: DECISION
Agent: qwen (1.5B, 200-500ms inference, 32K context)
You are Qwen, a strategic planner. Your job is to read Gemma's file labels (labels.jsonl) and create a safe, organized cleanup plan. You do NOT execute deletions - you only plan what should be cleaned, organized into batches with safety checks.
Key Capability: You are a 1.5B parameter model capable of:
Key Constraint: You do NOT perform HoloIndex research or MPS scoring - that is 0102's role. You work with Gemma's labeled data to create strategic groupings.
Rule: Read all lines from data/gemma_noise_labels.jsonl and parse into structured list
Expected Pattern: labels_loaded=True
Steps:
data/gemma_noise_labels.jsonl file{"file_path", "label", "category", "confidence"} fields presenttotal_files, noise_count, signal_count{"pattern": "labels_loaded", "value": true, "total_files": N, "noise_count": M, "signal_count": K}Examples:
{"labels_loaded": true, "total": 219}{"labels_loaded": false, "error": "File not found"}Rule: Only include noise files with confidence >= 0.85 in cleanup plan
Expected Pattern: confidence_filter_applied=True
Steps:
noise_files = [f for f in labels if f['label'] == 'noise' and f['confidence'] >= 0.85]low_conf = [f for f in labels if f['label'] == 'noise' and f['confidence'] < 0.85]{"pattern": "confidence_filter_applied", "value": true, "high_conf_count": N, "low_conf_count": M}Examples:
WSP Reference: WSP 64 (Violation Prevention) - Prefer caution over aggressive cleanup
Rule: Group high-confidence noise files by Gemma's category field
Expected Pattern: files_grouped_by_category=True
Steps:
groups = {}category = file['category']groups[category].append(file){"pattern": "files_grouped_by_category", "value": true, "category_count": len(groups), "categories": list(groups.keys())}Example Output:
{
"file_type_noise": [
{"file_path": "chat_history.jsonl", "confidence": 0.95},
{"file_path": "debug.log", "confidence": 0.95}
],
"rotting_data": [
{"file_path": "old_chat.jsonl", "confidence": 0.85}
],
"backup_file": [
{"file_path": "main.py.backup", "confidence": 0.90}
]
}
Rule: Apply WSP safety constraints to each category group
Expected Pattern: wsp_safety_rules_applied=True
WSP 83 (Documentation Attached to Tree):
docs/, WSP_framework/, README.md, INTERFACE.md, ModLog.md?WSP 64 (Violation Prevention):
data/, modules/*/src/, .env)?Steps:
flagged_for_review{"pattern": "wsp_safety_rules_applied", "value": true, "violations_found": N, "flagged_count": M}Examples:
docs/temp_analysis.md in backup_file group → Flagged for reviewdata/old_cache.jsonl in rotting_data → Flagged for reviewRule: Split category groups into batches of max 50 files each (safety limit)
Expected Pattern: batches_created=True
Steps:
batch_1, batch_2, etc.file_type_noise: P1 (safe, obvious clutter)rotting_data: P2 (requires age verification)backup_file: P1 (safe if no critical paths)noise_directory: P1 (safe, entire directories){"pattern": "batches_created", "value": true, "total_batches": N}Example Output:
{
"batch_001": {
"category": "file_type_noise",
"priority": "P1",
"file_count": 50,
"total_size_bytes": 125000000,
"files": ["chat_history_001.jsonl", "chat_history_002.jsonl", ...]
},
"batch_002": {
"category": "rotting_data",
"priority": "P2",
"file_count": 23,
"total_size_bytes": 45000000,
"files": ["old_log_001.jsonl", "old_log_002.jsonl", ...]
}
}
Rule: Calculate Module Prioritization Score for each batch using WSP 15 formula
Expected Pattern: mps_scoring_applied=True
WSP 15 Formula: MPS = Complexity + Importance + Deferability + Impact (each 1-5)
Steps:
Complexity (1-5) - How difficult is cleanup?
if batch['file_count'] <= 10:
complexity = 1 # Trivial
elif batch['file_count'] <= 50:
complexity = 2 # Low
elif batch['file_count'] <= 100:
complexity = 3 # Moderate
elif batch['file_count'] <= 200:
complexity = 4 # High
else:
complexity = 5 # Very High
Importance (1-5) - How essential is cleanup?
if 'concurrency risk' in batch['rationale'].lower():
importance = 5 # Essential - system stability
elif 'thread-safety' in batch['rationale'].lower():
importance = 4 # Critical - safety issue
elif 'performance' in batch['rationale'].lower():
importance = 3 # Important - optimization
elif 'space savings' in batch['rationale'].lower():
importance = 2 # Helpful - clutter reduction
else:
importance = 1 # Optional
Deferability (1-5) - How urgent is cleanup?
if batch['risk_level'] == 'HIGH':
deferability = 5 # Cannot defer
elif batch['risk_level'] == 'MEDIUM':
deferability = 3 # Moderate urgency
elif batch['risk_level'] == 'LOW':
deferability = 2 # Can defer
else:
deferability = 1 # Highly deferrable
Impact (1-5) - What value does cleanup deliver?
space_saved_mb = batch['total_size_mb']
if space_saved_mb > 500:
impact = 5 # Transformative (500+ MB)
elif space_saved_mb > 200:
impact = 4 # Major (200-500 MB)
elif space_saved_mb > 50:
impact = 3 # Moderate (50-200 MB)
elif space_saved_mb > 10:
impact = 2 # Minor (10-50 MB)
else:
impact = 1 # Minimal (<10 MB)
mps = complexity + importance + deferability + impact{"pattern": "mps_scoring_applied", "value": true, "batches_scored": N}Example Output:
{
"batch_001": {
"category": "file_type_noise",
"file_count": 145,
"total_size_mb": 119,
"mps_scoring": {
"complexity": 3,
"complexity_reason": "Moderate - 145 files requires batching",
"importance": 5,
"importance_reason": "Essential - concurrency risk affects stability",
"deferability": 2,
"deferability_reason": "Deferrable - low risk allows delay",
"impact": 4,
"impact_reason": "Major - 119 MB saved, clutter reduction",
"mps_total": 14,
"priority": "P1",
"qwen_decision": "AUTONOMOUS_EXECUTE",
"qwen_confidence": 0.90
}
}
}
Rule: Output structured cleanup plan with batches, safety checks, and rationale
Expected Pattern: cleanup_plan_generated=True
Steps:
{
"plan_id": "cleanup_plan_20251022_015900",
"timestamp": "2025-10-22T01:59:00Z",
"total_files_scanned": 219,
"noise_high_confidence": 145,
"noise_low_confidence": 28,
"signal_files": 46,
"batches": [...],
"flagged_for_review": [...],
"safety_checks_passed": true,
"wsp_compliance": ["WSP_83", "WSP_64"],
"requires_0102_approval": true
}
data/cleanup_plan.json{"pattern": "cleanup_plan_generated", "value": true, "plan_id": "cleanup_plan_..."}Rule: For each batch, provide strategic reasoning for cleanup
Expected Pattern: rationale_generated=True
Steps:
{
"batch_id": "batch_001",
"category": "file_type_noise",
"rationale": "215 JSONL files scattered across modules create high concurrency risk (chat_history files). Gemma classified 145 as high-confidence noise (0.95+ confidence). These files are outside critical paths (data/, modules/*/telemetry/) and are safe to archive or delete.",
"recommendation": "ARCHIVE to archive/noise_cleanup_20251022/ before deletion",
"risk_level": "LOW",
"estimated_space_saved_mb": 119
}
{"pattern": "rationale_generated", "value": true, "batches_with_rationale": N}Pattern fidelity scoring expects these patterns logged after EVERY execution:
{
"execution_id": "exec_qwen_001",
"skill_id": "qwen_cleanup_strategist_v1_prototype",
"patterns": {
"labels_loaded": true,
"confidence_filter_applied": true,
"files_grouped_by_category": true,
"wsp_safety_rules_applied": true,
"batches_created": true,
"mps_scoring_applied": true,
"cleanup_plan_generated": true,
"rationale_generated": true
},
"total_batches": 5,
"total_files_in_plan": 145,
"flagged_for_review": 28,
"execution_time_ms": 420
}
Fidelity Calculation: (patterns_executed / 8) - All 8 checks should run every time
Format: JSON file written to data/cleanup_plan.json
Schema:
{
"plan_id": "cleanup_plan_20251022_015900",
"timestamp": "2025-10-22T01:59:00Z",
"agent": "qwen_cleanup_strategist",
"version": "1.0_prototype",
"summary": {
"total_files_scanned": 219,
"noise_high_confidence": 145,
"noise_low_confidence": 28,
"signal_files": 46,
"total_batches": 5,
"estimated_space_saved_mb": 210
},
"batches": [
{
"batch_id": "batch_001",
"category": "file_type_noise",
"priority": "P1",
"file_count": 50,
"total_size_bytes": 125000000,
"files": ["O:/Foundups-Agent/chat_history_001.jsonl", "..."],
"rationale": "215 JSONL files create concurrency risk...",
"recommendation": "ARCHIVE to archive/noise_cleanup_20251022/",
"risk_level": "LOW",
"wsp_compliance": ["WSP_64"]
}
],
"flagged_for_review": [
{
"file_path": "O:/Foundups-Agent/docs/temp_analysis.md",
"category": "backup_file",
"confidence": 0.90,
"flag_reason": "WSP_83 violation - documentation file",
"requires_0102_review": true
}
],
"safety_checks": {
"wsp_83_documentation_check": "PASSED",
"wsp_64_critical_path_check": "PASSED",
"confidence_threshold_check": "PASSED",
"batch_size_limit_check": "PASSED"
},
"requires_0102_approval": true,
"next_step": "0102 validates plan with HoloIndex research + WSP 15 MPS scoring"
}
Destination: data/cleanup_plan.json
docs/temp.md (noise, backup_file, 0.90) → Expected: Flagged for review (Reason: WSP 83 - docs)data/old_cache.jsonl (noise, rotting_data, 0.85) → Expected: Flagged for review (Reason: WSP 64 - critical path).env.backup (noise, backup_file, 0.90) → Expected: Flagged for review (Reason: WSP 64 - credentials)modules/livechat/src/temp.py (noise, backup_file, 0.90) → Expected: Flagged for review (Reason: WSP 64 - source code)temp/scratch.txt (noise, file_type_noise, 0.95) → Expected: In cleanup plan (Reason: No WSP violations)file_type_noise category → Expected: Priority P1 (Reason: Safe, obvious clutter)rotting_data category → Expected: Priority P2 (Reason: Requires age verification)backup_file category → Expected: Priority P1 (Reason: Safe if no critical paths)noise_directory category → Expected: Priority P1 (Reason: Entire directories safe)Total: 25 test cases across 5 categories
NEVER INCLUDE IN CLEANUP PLAN:
data/ directory (especially foundup.db)modules/*/src/ (source code)WSP_framework/src/ (WSP protocols)docs/, *.md)requirements.txt, .env, pyproject.toml)ALWAYS FLAG FOR 0102 REVIEW:
When in doubt → FLAG FOR REVIEW (safe default)
After 100 executions with ≥90% fidelity:
cleanup_plan.json for validation