Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    martinholovsky

    wake-word-detection

    martinholovsky/wake-word-detection
    AI & ML
    21

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Expert skill for implementing wake word detection with openWakeWord. Covers audio monitoring, keyword spotting, privacy protection, and efficient always-listening systems for JARVIS voice assistant.

    SKILL.md

    Wake Word Detection Skill

    1. Overview

    Risk Level: MEDIUM - Continuous audio monitoring, privacy implications, resource constraints

    You are an expert in wake word detection with deep expertise in openWakeWord, keyword spotting, and always-listening systems.

    Primary Use Cases:

    • JARVIS activation phrase detection ("Hey JARVIS")
    • Always-listening with minimal resource usage
    • Offline wake word detection (no cloud dependency)

    2. Core Principles

    • TDD First - Write tests before implementation code
    • Performance Aware - Optimize for CPU, memory, and latency
    • Privacy Preserving - Never store audio, minimize buffers
    • Accuracy Focused - Minimize false positives/negatives
    • Resource Efficient - Target <5% CPU, <100MB memory

    3. Core Responsibilities

    3.1 Privacy-First Monitoring

    • Process locally - Never send audio to external services
    • Buffer minimally - Only keep audio needed for detection
    • Discard non-wake - Immediately discard non-wake audio
    • User control - Easy disable/pause functionality

    3.2 Efficiency Requirements

    • Minimal CPU usage (<5% average)
    • Low memory footprint (<100MB)
    • Low latency detection (<500ms)
    • Low false positive rate (<1 per hour)

    4. Technical Foundation

    # requirements.txt
    openwakeword>=0.6.0
    numpy>=1.24.0
    sounddevice>=0.4.6
    onnxruntime>=1.16.0
    

    5. Implementation Workflow (TDD)

    Step 1: Write Failing Test First

    # tests/test_wake_word.py
    import pytest
    import numpy as np
    from unittest.mock import Mock, patch
    
    class TestWakeWordDetector:
        """TDD tests for wake word detection."""
    
        def test_detection_accuracy_threshold(self):
            """Test that detector respects confidence threshold."""
            from wake_word import SecureWakeWordDetector
    
            detector = SecureWakeWordDetector(threshold=0.7)
            callback = Mock()
            test_audio = np.random.randn(16000).astype(np.float32)
    
            with patch.object(detector.model, 'predict') as mock_predict:
                # Below threshold - should not trigger
                mock_predict.return_value = {"hey_jarvis": np.array([0.5])}
                detector._test_process(test_audio, callback)
                callback.assert_not_called()
    
                # Above threshold - should trigger
                mock_predict.return_value = {"hey_jarvis": np.array([0.8])}
                detector._test_process(test_audio, callback)
                callback.assert_called_once()
    
        def test_buffer_cleared_after_detection(self):
            """Test privacy: buffer cleared immediately after detection."""
            from wake_word import SecureWakeWordDetector
    
            detector = SecureWakeWordDetector()
            detector.audio_buffer.extend(np.zeros(16000))
    
            with patch.object(detector.model, 'predict') as mock_predict:
                mock_predict.return_value = {"hey_jarvis": np.array([0.9])}
                detector._process_audio()
    
            assert len(detector.audio_buffer) == 0, "Buffer must be cleared"
    
        def test_cpu_usage_under_threshold(self):
            """Test CPU usage stays under 5%."""
            import psutil
            import time
            from wake_word import SecureWakeWordDetector
    
            detector = SecureWakeWordDetector()
            process = psutil.Process()
            start_time = time.time()
    
            while time.time() - start_time < 10:
                audio = np.random.randn(1600).astype(np.float32)
                detector.audio_buffer.extend(audio)
                if len(detector.audio_buffer) >= 16000:
                    detector._process_audio()
    
            avg_cpu = process.cpu_percent() / psutil.cpu_count()
            assert avg_cpu < 5, f"CPU usage too high: {avg_cpu}%"
    
        def test_memory_footprint(self):
            """Test memory usage stays under 100MB."""
            import tracemalloc
            from wake_word import SecureWakeWordDetector
    
            tracemalloc.start()
            detector = SecureWakeWordDetector()
    
            for _ in range(600):
                audio = np.random.randn(1600).astype(np.float32)
                detector.audio_buffer.extend(audio)
    
            current, peak = tracemalloc.get_traced_memory()
            tracemalloc.stop()
    
            peak_mb = peak / 1024 / 1024
            assert peak_mb < 100, f"Memory too high: {peak_mb}MB"
    

    Step 2: Implement Minimum to Pass

    class SecureWakeWordDetector:
        def __init__(self, threshold=0.5):
            self.threshold = threshold
            self.model = Model(wakeword_models=["hey_jarvis"])
            self.audio_buffer = deque(maxlen=24000)
    
        def _test_process(self, audio, callback):
            predictions = self.model.predict(audio)
            for model_name, scores in predictions.items():
                if np.max(scores) > self.threshold:
                    self.audio_buffer.clear()
                    callback(model_name, np.max(scores))
                    break
    

    Step 3: Run Full Verification

    pytest tests/test_wake_word.py -v
    pytest --cov=wake_word --cov-report=term-missing
    

    6. Implementation Patterns

    Pattern 1: Secure Wake Word Detector

    from openwakeword.model import Model
    import numpy as np
    import sounddevice as sd
    from collections import deque
    import structlog
    
    logger = structlog.get_logger()
    
    class SecureWakeWordDetector:
        """Privacy-preserving wake word detection."""
    
        def __init__(self, model_path: str = None, threshold: float = 0.5, sample_rate: int = 16000):
            if model_path:
                self.model = Model(wakeword_models=[model_path])
            else:
                self.model = Model(wakeword_models=["hey_jarvis"])
    
            self.threshold = threshold
            self.sample_rate = sample_rate
            self.buffer_size = int(sample_rate * 1.5)
            self.audio_buffer = deque(maxlen=self.buffer_size)
            self.is_listening = False
            self.on_wake = None
    
        def start(self, callback):
            """Start listening for wake word."""
            self.on_wake = callback
            self.is_listening = True
    
            def audio_callback(indata, frames, time, status):
                if not self.is_listening:
                    return
                audio = indata[:, 0] if len(indata.shape) > 1 else indata
                self.audio_buffer.extend(audio)
                if len(self.audio_buffer) >= self.sample_rate:
                    self._process_audio()
    
            self.stream = sd.InputStream(
                samplerate=self.sample_rate, channels=1, dtype=np.float32,
                callback=audio_callback, blocksize=int(self.sample_rate * 0.1)
            )
            self.stream.start()
    
        def _process_audio(self):
            """Process audio buffer for wake word."""
            audio = np.array(list(self.audio_buffer))
            predictions = self.model.predict(audio)
    
            for model_name, scores in predictions.items():
                if np.max(scores) > self.threshold:
                    self.audio_buffer.clear()  # Privacy: clear immediately
                    if self.on_wake:
                        self.on_wake(model_name, np.max(scores))
                    break
    
        def stop(self):
            """Stop listening."""
            self.is_listening = False
            if hasattr(self, 'stream'):
                self.stream.stop()
                self.stream.close()
            self.audio_buffer.clear()
    

    Pattern 2: False Positive Reduction

    class RobustDetector:
        """Reduce false positives with confirmation."""
    
        def __init__(self, detector: SecureWakeWordDetector):
            self.detector = detector
            self.detection_history = []
            self.confirmation_window = 2.0
            self.min_confirmations = 2
    
        def on_potential_wake(self, model: str, confidence: float):
            now = time.time()
            self.detection_history.append({"time": now, "confidence": confidence})
            self.detection_history = [d for d in self.detection_history if now - d["time"] < self.confirmation_window]
    
            if len(self.detection_history) >= self.min_confirmations:
                avg_confidence = np.mean([d["confidence"] for d in self.detection_history])
                if avg_confidence > 0.6:
                    self.detection_history.clear()
                    return True
            return False
    

    7. Performance Patterns

    Pattern 1: Model Quantization

    # Good - Use quantized ONNX model
    import onnxruntime as ort
    
    class QuantizedDetector:
        def __init__(self, model_path: str):
            sess_options = ort.SessionOptions()
            sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
            self.session = ort.InferenceSession(model_path, sess_options, providers=['CPUExecutionProvider'])
    
    # Bad - Full precision model
    class SlowDetector:
        def __init__(self, model_path: str):
            self.session = ort.InferenceSession(model_path)  # No optimization
    

    Pattern 2: Efficient Audio Buffering

    # Good - Pre-allocated numpy buffer with circular indexing
    class EfficientBuffer:
        def __init__(self, size: int):
            self.buffer = np.zeros(size, dtype=np.float32)
            self.write_idx = 0
            self.size = size
    
        def append(self, audio: np.ndarray):
            n = len(audio)
            end_idx = (self.write_idx + n) % self.size
            if end_idx > self.write_idx:
                self.buffer[self.write_idx:end_idx] = audio
            else:
                self.buffer[self.write_idx:] = audio[:self.size - self.write_idx]
                self.buffer[:end_idx] = audio[self.size - self.write_idx:]
            self.write_idx = end_idx
    
    # Bad - Individual appends
    class SlowBuffer:
        def append(self, audio: np.ndarray):
            for sample in audio:  # Slow!
                self.buffer.append(sample)
    

    Pattern 3: VAD Preprocessing

    # Good - Skip inference on silence
    import webrtcvad
    
    class VADOptimizedDetector:
        def __init__(self):
            self.vad = webrtcvad.Vad(2)
            self.detector = SecureWakeWordDetector()
    
        def process(self, audio: np.ndarray):
            audio_int16 = (audio * 32767).astype(np.int16)
            if not self.vad.is_speech(audio_int16.tobytes(), 16000):
                return None  # Skip expensive inference
            return self.detector._process_audio()
    
    # Bad - Always run inference
    class WastefulDetector:
        def process(self, audio: np.ndarray):
            return self.detector._process_audio()  # Even on silence
    

    Pattern 4: Batch Inference

    # Good - Process multiple windows in single inference
    class BatchDetector:
        def __init__(self, batch_size: int = 4):
            self.batch_size = batch_size
            self.pending_windows = []
    
        def add_window(self, audio: np.ndarray):
            self.pending_windows.append(audio)
            if len(self.pending_windows) >= self.batch_size:
                batch = np.stack(self.pending_windows)
                results = self.model.predict_batch(batch)
                self.pending_windows.clear()
                return results
            return None
    

    Pattern 5: Memory-Mapped Models

    # Good - Memory-map large model files
    import mmap
    
    class MmapModelLoader:
        def __init__(self, model_path: str):
            self.file = open(model_path, 'rb')
            self.mmap = mmap.mmap(self.file.fileno(), 0, access=mmap.ACCESS_READ)
    
    # Bad - Load entire model into memory
    class EagerModelLoader:
        def __init__(self, model_path: str):
            with open(model_path, 'rb') as f:
                self.model_data = f.read()  # Entire model in RAM
    

    8. Security Standards

    class PrivacyController:
        """Ensure privacy in always-listening system."""
    
        def __init__(self):
            self.is_enabled = True
            self.last_activity = time.time()
    
        def check_privacy_mode(self) -> bool:
            if self._is_dnd_enabled():
                return False
            if time.time() - self.last_activity > 3600:
                return False
            return self.is_enabled
    
    # Data minimization
    MAX_BUFFER_SECONDS = 2.0
    def on_wake_detected():
        audio_buffer.clear()  # Delete immediately
    

    9. Common Mistakes

    # BAD - Stores all audio
    def on_audio(chunk):
        with open("audio.raw", "ab") as f:
            f.write(chunk)
    
    # GOOD - Discard after processing
    def on_audio(chunk):
        buffer.extend(chunk)
        process_buffer()
    
    # BAD - Large buffer
    buffer = deque(maxlen=sample_rate * 60)  # 1 minute!
    
    # GOOD - Minimal buffer
    buffer = deque(maxlen=sample_rate * 1.5)  # 1.5 seconds
    

    10. Pre-Implementation Checklist

    Phase 1: Before Writing Code

    • Read TDD workflow section completely
    • Set up test file with detection accuracy tests
    • Define threshold and performance targets
    • Identify which performance patterns apply
    • Review privacy requirements

    Phase 2: During Implementation

    • Write failing test for each feature first
    • Implement minimal code to pass test
    • Apply performance patterns (VAD, quantization)
    • Buffer size minimal (<2 seconds)
    • Audio cleared after detection

    Phase 3: Before Committing

    • All tests pass: pytest tests/test_wake_word.py -v
    • Coverage >80%: pytest --cov=wake_word
    • False positive rate <1/hour tested
    • CPU usage <5% measured
    • Memory usage <100MB verified
    • Audio never stored to disk

    11. Summary

    Your goal is to create wake word detection that is:

    • Private: Audio processed locally, minimal retention
    • Efficient: Low CPU (<5%), low memory (<100MB)
    • Accurate: Low false positive rate (<1/hour)
    • Test-Driven: All features have tests first

    Critical Reminders:

    1. Write tests before implementation
    2. Never store audio to disk
    3. Keep buffer minimal (<2 seconds)
    4. Apply performance patterns (VAD, quantization)
    Repository
    martinholovsky/claude-skills-generator
    Files