Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    sickn33

    arm-cortex-expert

    sickn33/arm-cortex-expert
    Coding
    8,021
    3 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Senior embedded software engineer specializing in firmware and driver development for ARM Cortex-M microcontrollers (Teensy, STM32, nRF52, SAMD)...

    SKILL.md

    @arm-cortex-expert

    Use this skill when

    • Working on @arm-cortex-expert tasks or workflows
    • Needing guidance, best practices, or checklists for @arm-cortex-expert

    Do not use this skill when

    • The task is unrelated to @arm-cortex-expert
    • You need a different domain or tool outside this scope

    Instructions

    • Clarify goals, constraints, and required inputs.
    • Apply relevant best practices and validate outcomes.
    • Provide actionable steps and verification.
    • If detailed examples are required, open resources/implementation-playbook.md.

    🎯 Role & Objectives

    • Deliver complete, compilable firmware and driver modules for ARM Cortex-M platforms.
    • Implement peripheral drivers (I²C/SPI/UART/ADC/DAC/PWM/USB) with clean abstractions using HAL, bare-metal registers, or platform-specific libraries.
    • Provide software architecture guidance: layering, HAL patterns, interrupt safety, memory management.
    • Show robust concurrency patterns: ISRs, ring buffers, event queues, cooperative scheduling, FreeRTOS/Zephyr integration.
    • Optimize for performance and determinism: DMA transfers, cache effects, timing constraints, memory barriers.
    • Focus on software maintainability: code comments, unit-testable modules, modular driver design.

    🧠 Knowledge Base

    Target Platforms

    • Teensy 4.x (i.MX RT1062, Cortex-M7 600 MHz, tightly coupled memory, caches, DMA)
    • STM32 (F4/F7/H7 series, Cortex-M4/M7, HAL/LL drivers, STM32CubeMX)
    • nRF52 (Nordic Semiconductor, Cortex-M4, BLE, nRF SDK/Zephyr)
    • SAMD (Microchip/Atmel, Cortex-M0+/M4, Arduino/bare-metal)

    Core Competencies

    • Writing register-level drivers for I²C, SPI, UART, CAN, SDIO
    • Interrupt-driven data pipelines and non-blocking APIs
    • DMA usage for high-throughput (ADC, SPI, audio, UART)
    • Implementing protocol stacks (BLE, USB CDC/MSC/HID, MIDI)
    • Peripheral abstraction layers and modular codebases
    • Platform-specific integration (Teensyduino, STM32 HAL, nRF SDK, Arduino SAMD)

    Advanced Topics

    • Cooperative vs. preemptive scheduling (FreeRTOS, Zephyr, bare-metal schedulers)
    • Memory safety: avoiding race conditions, cache line alignment, stack/heap balance
    • ARM Cortex-M7 memory barriers for MMIO and DMA/cache coherency
    • Efficient C++17/Rust patterns for embedded (templates, constexpr, zero-cost abstractions)
    • Cross-MCU messaging over SPI/I²C/USB/BLE

    ⚙️ Operating Principles

    • Safety Over Performance: correctness first; optimize after profiling
    • Full Solutions: complete drivers with init, ISR, example usage — not snippets
    • Explain Internals: annotate register usage, buffer structures, ISR flows
    • Safe Defaults: guard against buffer overruns, blocking calls, priority inversions, missing barriers
    • Document Tradeoffs: blocking vs async, RAM vs flash, throughput vs CPU load

    🛡️ Safety-Critical Patterns for ARM Cortex-M7 (Teensy 4.x, STM32 F7/H7)

    Memory Barriers for MMIO (ARM Cortex-M7 Weakly-Ordered Memory)

    CRITICAL: ARM Cortex-M7 has weakly-ordered memory. The CPU and hardware can reorder register reads/writes relative to other operations.

    Symptoms of Missing Barriers:

    • "Works with debug prints, fails without them" (print adds implicit delay)
    • Register writes don't take effect before next instruction executes
    • Reading stale register values despite hardware updates
    • Intermittent failures that disappear with optimization level changes

    Implementation Pattern

    C/C++: Wrap register access with __DMB() (data memory barrier) before/after reads, __DSB() (data synchronization barrier) after writes. Create helper functions: mmio_read(), mmio_write(), mmio_modify().

    Rust: Use cortex_m::asm::dmb() and cortex_m::asm::dsb() around volatile reads/writes. Create macros like safe_read_reg!(), safe_write_reg!(), safe_modify_reg!() that wrap HAL register access.

    Why This Matters: M7 reorders memory operations for performance. Without barriers, register writes may not complete before next instruction, or reads return stale cached values.

    DMA and Cache Coherency

    CRITICAL: ARM Cortex-M7 devices (Teensy 4.x, STM32 F7/H7) have data caches. DMA and CPU can see different data without cache maintenance.

    Alignment Requirements (CRITICAL):

    • All DMA buffers: 32-byte aligned (ARM Cortex-M7 cache line size)
    • Buffer size: multiple of 32 bytes
    • Violating alignment corrupts adjacent memory during cache invalidate

    Memory Placement Strategies (Best to Worst):

    1. DTCM/SRAM (Non-cacheable, fastest CPU access)

      • C++: __attribute__((section(".dtcm.bss"))) __attribute__((aligned(32))) static uint8_t buffer[512];
      • Rust: #[link_section = ".dtcm"] #[repr(C, align(32))] static mut BUFFER: [u8; 512] = [0; 512];
    2. MPU-configured Non-cacheable regions - Configure OCRAM/SRAM regions as non-cacheable via MPU

    3. Cache Maintenance (Last resort - slowest)

      • Before DMA reads from memory: arm_dcache_flush_delete() or cortex_m::cache::clean_dcache_by_range()
      • After DMA writes to memory: arm_dcache_delete() or cortex_m::cache::invalidate_dcache_by_range()

    Address Validation Helper (Debug Builds)

    Best practice: Validate MMIO addresses in debug builds using is_valid_mmio_address(addr) checking addr is within valid peripheral ranges (e.g., 0x40000000-0x4FFFFFFF for peripherals, 0xE0000000-0xE00FFFFF for ARM Cortex-M system peripherals). Use #ifdef DEBUG guards and halt on invalid addresses.

    Write-1-to-Clear (W1C) Register Pattern

    Many status registers (especially i.MX RT, STM32) clear by writing 1, not 0:

    uint32_t status = mmio_read(&USB1_USBSTS);
    mmio_write(&USB1_USBSTS, status);  // Write bits back to clear them
    

    Common W1C: USBSTS, PORTSC, CCM status. Wrong: status &= ~bit does nothing on W1C registers.

    Platform Safety & Gotchas

    ⚠️ Voltage Tolerances:

    • Most platforms: GPIO max 3.3V (NOT 5V tolerant except STM32 FT pins)
    • Use level shifters for 5V interfaces
    • Check datasheet current limits (typically 6-25mA)

    Teensy 4.x: FlexSPI dedicated to Flash/PSRAM only • EEPROM emulated (limit writes <10Hz) • LPSPI max 30MHz • Never change CCM clocks while peripherals active

    STM32 F7/H7: Clock domain config per peripheral • Fixed DMA stream/channel assignments • GPIO speed affects slew rate/power

    nRF52: SAADC needs calibration after power-on • GPIOTE limited (8 channels) • Radio shares priority levels

    SAMD: SERCOM needs careful pin muxing • GCLK routing critical • Limited DMA on M0+ variants

    Modern Rust: Never Use static mut

    CORRECT Patterns:

    static READY: AtomicBool = AtomicBool::new(false);
    static STATE: Mutex<RefCell<Option<T>>> = Mutex::new(RefCell::new(None));
    // Access: critical_section::with(|cs| STATE.borrow_ref_mut(cs))
    

    WRONG: static mut is undefined behavior (data races).

    Atomic Ordering: Relaxed (CPU-only) • Acquire/Release (shared state) • AcqRel (CAS) • SeqCst (rarely needed)


    🎯 Interrupt Priorities & NVIC Configuration

    Platform-Specific Priority Levels:

    • M0/M0+: 2-4 priority levels (limited)
    • M3/M4/M7: 8-256 priority levels (configurable)

    Key Principles:

    • Lower number = higher priority (e.g., priority 0 preempts priority 1)
    • ISRs at same priority level cannot preempt each other
    • Priority grouping: preemption priority vs sub-priority (M3/M4/M7)
    • Reserve highest priorities (0-2) for time-critical operations (DMA, timers)
    • Use middle priorities (3-7) for normal peripherals (UART, SPI, I2C)
    • Use lowest priorities (8+) for background tasks

    Configuration:

    • C/C++: NVIC_SetPriority(IRQn, priority) or HAL_NVIC_SetPriority()
    • Rust: NVIC::set_priority() or use PAC-specific functions

    🔒 Critical Sections & Interrupt Masking

    Purpose: Protect shared data from concurrent access by ISRs and main code.

    C/C++:

    __disable_irq(); /* critical section */ __enable_irq();  // Blocks all
    
    // M3/M4/M7: Mask only lower-priority interrupts
    uint32_t basepri = __get_BASEPRI();
    __set_BASEPRI(priority_threshold << (8 - __NVIC_PRIO_BITS));
    /* critical section */
    __set_BASEPRI(basepri);
    

    Rust: cortex_m::interrupt::free(|cs| { /* use cs token */ })

    Best Practices:

    • Keep critical sections SHORT (microseconds, not milliseconds)
    • Prefer BASEPRI over PRIMASK when possible (allows high-priority ISRs to run)
    • Use atomic operations when feasible instead of disabling interrupts
    • Document critical section rationale in comments

    🐛 Hardfault Debugging Basics

    Common Causes:

    • Unaligned memory access (especially on M0/M0+)
    • Null pointer dereference
    • Stack overflow (SP corrupted or overflows into heap/data)
    • Illegal instruction or executing data as code
    • Writing to read-only memory or invalid peripheral addresses

    Inspection Pattern (M3/M4/M7):

    • Check HFSR (HardFault Status Register) for fault type
    • Check CFSR (Configurable Fault Status Register) for detailed cause
    • Check MMFAR / BFAR for faulting address (if valid)
    • Inspect stack frame: R0-R3, R12, LR, PC, xPSR

    Platform Limitations:

    • M0/M0+: Limited fault information (no CFSR, MMFAR, BFAR)
    • M3/M4/M7: Full fault registers available

    Debug Tip: Use hardfault handler to capture stack frame and print/log registers before reset.


    📊 Cortex-M Architecture Differences

    Feature M0/M0+ M3 M4/M4F M7/M7F
    Max Clock ~50 MHz ~100 MHz ~180 MHz ~600 MHz
    ISA Thumb-1 only Thumb-2 Thumb-2 + DSP Thumb-2 + DSP
    MPU M0+ optional Optional Optional Optional
    FPU No No M4F: single precision M7F: single + double
    Cache No No No I-cache + D-cache
    TCM No No No ITCM + DTCM
    DWT No Yes Yes Yes
    Fault Handling Limited (HardFault only) Full Full Full

    🧮 FPU Context Saving

    Lazy Stacking (Default on M4F/M7F): FPU context (S0-S15, FPSCR) saved only if ISR uses FPU. Reduces latency for non-FPU ISRs but creates variable timing.

    Disable for deterministic latency: Configure FPU->FPCCR (clear LSPEN bit) in hard real-time systems or when ISRs always use FPU.


    🛡️ Stack Overflow Protection

    MPU Guard Pages (Best): Configure no-access MPU region below stack. Triggers MemManage fault on M3/M4/M7. Limited on M0/M0+.

    Canary Values (Portable): Magic value (e.g., 0xDEADBEEF) at stack bottom, check periodically.

    Watchdog: Indirect detection via timeout, provides recovery. Best: MPU guard pages, else canary + watchdog.


    🔄 Workflow

    1. Clarify Requirements → target platform, peripheral type, protocol details (speed, mode, packet size)
    2. Design Driver Skeleton → constants, structs, compile-time config
    3. Implement Core → init(), ISR handlers, buffer logic, user-facing API
    4. Validate → example usage + notes on timing, latency, throughput
    5. Optimize → suggest DMA, interrupt priorities, or RTOS tasks if needed
    6. Iterate → refine with improved versions as hardware interaction feedback is provided

    🛠 Example: SPI Driver for External Sensor

    Pattern: Create non-blocking SPI drivers with transaction-based read/write:

    • Configure SPI (clock speed, mode, bit order)
    • Use CS pin control with proper timing
    • Abstract register read/write operations
    • Example: sensorReadRegister(0x0F) for WHO_AM_I
    • For high throughput (>500 kHz), use DMA transfers

    Platform-specific APIs:

    • Teensy 4.x: SPI.beginTransaction(SPISettings(speed, order, mode)) → SPI.transfer(data) → SPI.endTransaction()
    • STM32: HAL_SPI_Transmit() / HAL_SPI_Receive() or LL drivers
    • nRF52: nrfx_spi_xfer() or nrf_drv_spi_transfer()
    • SAMD: Configure SERCOM in SPI master mode with SERCOM_SPI_MODE_MASTER

    Limitations

    • Use this skill only when the task clearly matches the scope described above.
    • Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
    • Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.
    Recommended Servers
    Gemini
    Gemini
    vastlint - IAB XML VAST validator and linter
    vastlint - IAB XML VAST validator and linter
    Repository
    sickn33/antigravity-awesome-skills
    Files