playwright-screenshot-inspector

erichowens/playwright-screenshot-inspector

AI & ML

4 installs

About

SKILL.md

playwright-screenshot-inspector

erichowens/playwright-screenshot-inspector

AI & ML

4 installs

About

LLM-powered visual testing expert for automated screenshot capture, analysis, and UI verification using Playwright with multimodal AI inspection.

SKILL.md

Playwright Screenshot Inspector

LLM-powered visual testing expert for automated screenshot capture, analysis, and UI verification using Playwright with multimodal AI inspection.

Activation Triggers

Activate on:

"screenshot test", "visual test", "screenshot inspection"
"playwright headless", "playwright screenshot"
"UI verification", "visual regression"
"theme compliance test", "dark mode test", "light mode test"
"automated screenshot", "capture and analyze"
"compare screenshots", "visual diff"

NOT for:

Simple one-off screenshots (use browser DevTools)
Pixel-perfect comparison without AI (use native Playwright toHaveScreenshot)
Non-web UI testing (use platform-specific tools)
Performance testing (use Lighthouse/WebPageTest)

Core Philosophy

Traditional visual testing compares pixels. LLM-powered visual testing understands semantics.

Instead of "these 50 pixels changed", LLM inspection answers:

"Is the content actually rendered?"
"Does the theme switch correctly?"
"Are interactive elements visible and properly styled?"
"What's broken vs. what's just different?"

The Screenshot Inspection Loop

┌─────────────────────────────────────────────────────────────┐
│                    LLM SCREENSHOT INSPECTION                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. CAPTURE (Playwright)                                    │
│     └─► Wait for React hydration, not just network          │
│                                                             │
│  2. READ (Claude vision)                                    │
│     └─► Pass screenshot to LLM with specific questions      │
│                                                             │
│  3. ANALYZE (Structured response)                           │
│     └─► Extract: content present? theme correct? errors?    │
│                                                             │
│  4. ACT (Conditional logic)                                 │
│     └─► Pass/fail based on semantic understanding           │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Critical: Waiting for React Content

The #1 failure mode: Taking screenshots before React hydrates.

Anti-Pattern: Network Idle Alone

# ❌ WRONG - React may not have rendered yet
page.goto(url)
page.wait_for_load_state('networkidle')
page.screenshot(path='broken.png')  # Often blank!

Correct Pattern: Wait for Actual Content

# ✅ CORRECT - Wait for React to mount
page.goto(url, wait_until='domcontentloaded')
page.wait_for_load_state('networkidle')

# Give React time to hydrate
import time
time.sleep(0.5)

# Wait for actual content selector
page.wait_for_selector('.main-content, h1, [data-testid="app"]',
                       state='visible',
                       timeout=10000)

# Verify content exists
body_text = page.locator('body').inner_text()
if len(body_text) < 50:
    time.sleep(2)  # Extra wait for slow hydration

page.screenshot(path='good.png', full_page=True)

Content Verification Function

def wait_for_react_content(page, selectors, timeout=10000):
    """Wait for React to hydrate by checking for actual content."""
    page.wait_for_load_state('domcontentloaded')
    page.wait_for_load_state('networkidle')
    time.sleep(0.5)  # React hydration buffer

    for selector in selectors.split(','):
        try:
            locator = page.locator(selector.strip())
            if locator.count() > 0:
                locator.first.wait_for(state='visible', timeout=timeout)
                return True
        except:
            continue

    # Fallback: wait for substantial body content
    try:
        page.wait_for_function(
            'document.body.innerText.length > 100',
            timeout=timeout
        )
        return True
    except:
        return False

Headless Mode: Preventing Window Spam

Always use headless=True to prevent browser windows from spawning:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    # CRITICAL: headless=True prevents visible browser windows
    browser = p.chromium.launch(headless=True)

    context = browser.new_context(
        viewport={'width': 1280, 'height': 800},
        color_scheme='dark'  # Initial theme
    )
    page = context.new_page()

    # ... your test logic ...

    browser.close()  # Always clean up

Theme Testing Pattern

# Dark mode screenshot
page.emulate_media(color_scheme='dark')  # Note: on PAGE, not context
page.goto(url)
wait_for_react_content(page, '.app-container, main, h1')
page.screenshot(path='dark.png', full_page=True)

# Light mode screenshot
page.emulate_media(color_scheme='light')
page.reload()
wait_for_react_content(page, '.app-container, main, h1')
page.screenshot(path='light.png', full_page=True)

LLM Screenshot Analysis Patterns

Pattern 1: Content Verification

Prompt: "Analyze this screenshot. Answer:
1. Is the main content rendered (not blank/loading)?
2. What major UI elements are visible?
3. Are there any error states or broken layouts?
4. Rate content completeness: FULL / PARTIAL / EMPTY"

Pattern 2: Theme Compliance

Prompt: "This is a {dark/light} mode screenshot. Verify:
1. Background color matches expected theme (dark bg for dark mode)
2. Text has sufficient contrast against background
3. Interactive elements are visible and styled correctly
4. No theme leakage (dark elements on light bg or vice versa)"

Pattern 3: Comparison Analysis

Prompt: "Compare these two screenshots (before/after). Identify:
1. What changed between them?
2. Are changes intentional (theme switch) or bugs?
3. Is any content missing in the 'after' version?
4. Rate similarity: IDENTICAL / MINOR_DIFF / MAJOR_DIFF / BROKEN"

Pattern 4: Accessibility Check

Prompt: "Evaluate this screenshot for visual accessibility:
1. Is text readable (sufficient size and contrast)?
2. Are interactive elements clearly identifiable?
3. Is there visual hierarchy (headings, sections)?
4. Any elements that would fail WCAG contrast requirements?"

Complete Test Script Template

#!/usr/bin/env python3
"""
LLM-Powered Screenshot Test Suite
Captures screenshots and uses Claude vision for semantic analysis.
"""

from playwright.sync_api import sync_playwright
import os
import time

PAGES_TO_TEST = [
    # (path, name, content_selectors)
    ('/', 'Home', '.hero, main, h1'),
    ('/about', 'About', '.about-content, main, h1'),
    ('/dashboard', 'Dashboard', '.dashboard, .stats, h1'),
]

BASE_URL = 'http://localhost:5173'
SCREENSHOT_DIR = '/tmp/visual-tests'


def wait_for_content(page, selectors, timeout=10000):
    """Wait for React/Vue/Svelte to hydrate."""
    page.wait_for_load_state('domcontentloaded')
    page.wait_for_load_state('networkidle')
    time.sleep(0.5)

    for selector in selectors.split(','):
        try:
            loc = page.locator(selector.strip())
            if loc.count() > 0:
                loc.first.wait_for(state='visible', timeout=timeout)
                return True
        except:
            continue

    try:
        page.wait_for_function('document.body.innerText.length > 100', timeout=timeout)
        return True
    except:
        return False


def capture_themed_screenshots(page, url, name, selectors):
    """Capture both dark and light mode screenshots."""
    safe_name = name.lower().replace(' ', '-')
    results = {'name': name, 'url': url}

    for theme in ['dark', 'light']:
        page.emulate_media(color_scheme=theme)

        if theme == 'dark':
            page.goto(url, wait_until='domcontentloaded')
        else:
            page.reload(wait_until='domcontentloaded')

        content_loaded = wait_for_content(page, selectors)

        if not content_loaded:
            print(f"  ⚠️  {theme} mode: Content slow to load, waiting...")
            time.sleep(2)

        screenshot_path = f'{SCREENSHOT_DIR}/{safe_name}-{theme}.png'
        page.screenshot(path=screenshot_path, full_page=True)

        # Check content length
        body_text = page.locator('body').inner_text().strip()
        results[f'{theme}_screenshot'] = screenshot_path
        results[f'{theme}_content_length'] = len(body_text)
        results[f'{theme}_has_content'] = len(body_text) > 50

        print(f"  {theme}: {'✅' if results[f'{theme}_has_content'] else '❌'} ({len(body_text)} chars)")

    return results


def run_tests():
    """Run visual tests on all pages."""
    os.makedirs(SCREENSHOT_DIR, exist_ok=True)

    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context(
            viewport={'width': 1280, 'height': 800},
            color_scheme='dark'
        )
        page = context.new_page()

        # Capture console errors
        errors = []
        page.on('console', lambda m: errors.append(m.text) if m.type == 'error' else None)

        results = []

        for path, name, selectors in PAGES_TO_TEST:
            print(f"Testing {name}...")
            url = f'{BASE_URL}{path}'
            result = capture_themed_screenshots(page, url, name, selectors)
            result['errors'] = list(errors)
            errors.clear()
            results.append(result)

        browser.close()

        # Summary
        print("\n" + "=" * 50)
        print("VISUAL TEST SUMMARY")
        print("=" * 50)

        passed = sum(1 for r in results
                     if r.get('dark_has_content') and r.get('light_has_content'))
        print(f"\nPassed: {passed}/{len(results)}")
        print(f"Screenshots: {SCREENSHOT_DIR}")

        return results


if __name__ == '__main__':
    run_tests()

MCP vs Native Playwright Decision Tree

What are you doing?
│
├─ Interactive debugging / exploring
│  └─► Playwright MCP (see live browser)
│
├─ Automated test suite
│  └─► Native Python Playwright (headless)
│
├─ CI/CD pipeline
│  └─► Native Python Playwright (headless)
│
├─ Screenshot capture for LLM analysis
│  └─► Native Python Playwright (headless)
│
└─ One-off inspection
   └─► Either works, MCP is convenient

Common Failures and Fixes

Failure: Blank Screenshots

Cause: Screenshot taken before React hydrates Fix: Wait for content selectors, add hydration buffer

Failure: "Reconnecting..." Badge Visible

Cause: HMR/WebSocket not connected (cosmetic in tests) Fix: This is often fine - focus on actual content

Failure: Theme Not Applied

Cause: emulate_media called on context instead of page Fix: Use page.emulate_media(color_scheme='dark')

Failure: Browser Windows Spawning

Cause: headless=False or using MCP instead of native Fix: Use p.chromium.launch(headless=True)

Failure: Timeout on Content

Cause: Wrong selectors or page actually broken Fix: Verify selectors exist, check console errors

Integration with Claude Code

When Claude reads screenshots captured by this pattern:

Request specific analysis: Don't just show screenshot - ask targeted questions
Provide context: "This should be dark mode" or "This is the login page"
Compare systematically: Before/after, dark/light, desktop/mobile
Trust semantic analysis: LLM can tell "blank page" from "content loaded"

References

Research Papers

Using Vision LLMs For UI Testing - University of Washington
Vision-driven Automated Mobile GUI Testing - Multimodal LLM approach
ScreenLLM: Stateful Screen Schema - UI understanding framework

Tools & Integrations

Official Documentation

Playwright Visual Comparisons

Version History

2026-01-23: Initial skill creation
- Researched multimodal LLM screenshot analysis best practices
- Documented React hydration waiting patterns
- Added headless mode requirements
- Created complete test script template

Core Insight: The difference between useless and useful screenshot tests is waiting for content, not just network. LLMs can analyze semantics, but only if there's actually content to analyze.

About

SKILL.md

About

LLM-powered visual testing expert for automated screenshot capture, analysis, and UI verification using Playwright with multimodal AI inspection.

SKILL.md

Playwright Screenshot Inspector

LLM-powered visual testing expert for automated screenshot capture, analysis, and UI verification using Playwright with multimodal AI inspection.

Activation Triggers

Activate on:

"screenshot test", "visual test", "screenshot inspection"
"playwright headless", "playwright screenshot"
"UI verification", "visual regression"
"theme compliance test", "dark mode test", "light mode test"
"automated screenshot", "capture and analyze"
"compare screenshots", "visual diff"

NOT for:

Simple one-off screenshots (use browser DevTools)
Pixel-perfect comparison without AI (use native Playwright toHaveScreenshot)
Non-web UI testing (use platform-specific tools)
Performance testing (use Lighthouse/WebPageTest)

Core Philosophy

Traditional visual testing compares pixels. LLM-powered visual testing understands semantics.

Instead of "these 50 pixels changed", LLM inspection answers:

"Is the content actually rendered?"
"Does the theme switch correctly?"
"Are interactive elements visible and properly styled?"
"What's broken vs. what's just different?"

The Screenshot Inspection Loop

┌─────────────────────────────────────────────────────────────┐
│                    LLM SCREENSHOT INSPECTION                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. CAPTURE (Playwright)                                    │
│     └─► Wait for React hydration, not just network          │
│                                                             │
│  2. READ (Claude vision)                                    │
│     └─► Pass screenshot to LLM with specific questions      │
│                                                             │
│  3. ANALYZE (Structured response)                           │
│     └─► Extract: content present? theme correct? errors?    │
│                                                             │
│  4. ACT (Conditional logic)                                 │
│     └─► Pass/fail based on semantic understanding           │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Critical: Waiting for React Content

The #1 failure mode: Taking screenshots before React hydrates.

Anti-Pattern: Network Idle Alone

# ❌ WRONG - React may not have rendered yet
page.goto(url)
page.wait_for_load_state('networkidle')
page.screenshot(path='broken.png')  # Often blank!

Correct Pattern: Wait for Actual Content

# ✅ CORRECT - Wait for React to mount
page.goto(url, wait_until='domcontentloaded')
page.wait_for_load_state('networkidle')

# Give React time to hydrate
import time
time.sleep(0.5)

# Wait for actual content selector
page.wait_for_selector('.main-content, h1, [data-testid="app"]',
                       state='visible',
                       timeout=10000)

# Verify content exists
body_text = page.locator('body').inner_text()
if len(body_text) < 50:
    time.sleep(2)  # Extra wait for slow hydration

page.screenshot(path='good.png', full_page=True)

Content Verification Function

def wait_for_react_content(page, selectors, timeout=10000):
    """Wait for React to hydrate by checking for actual content."""
    page.wait_for_load_state('domcontentloaded')
    page.wait_for_load_state('networkidle')
    time.sleep(0.5)  # React hydration buffer

    for selector in selectors.split(','):
        try:
            locator = page.locator(selector.strip())
            if locator.count() > 0:
                locator.first.wait_for(state='visible', timeout=timeout)
                return True
        except:
            continue

    # Fallback: wait for substantial body content
    try:
        page.wait_for_function(
            'document.body.innerText.length > 100',
            timeout=timeout
        )
        return True
    except:
        return False

Headless Mode: Preventing Window Spam

Always use headless=True to prevent browser windows from spawning:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    # CRITICAL: headless=True prevents visible browser windows
    browser = p.chromium.launch(headless=True)

    context = browser.new_context(
        viewport={'width': 1280, 'height': 800},
        color_scheme='dark'  # Initial theme
    )
    page = context.new_page()

    # ... your test logic ...

    browser.close()  # Always clean up

Theme Testing Pattern

# Dark mode screenshot
page.emulate_media(color_scheme='dark')  # Note: on PAGE, not context
page.goto(url)
wait_for_react_content(page, '.app-container, main, h1')
page.screenshot(path='dark.png', full_page=True)

# Light mode screenshot
page.emulate_media(color_scheme='light')
page.reload()
wait_for_react_content(page, '.app-container, main, h1')
page.screenshot(path='light.png', full_page=True)

LLM Screenshot Analysis Patterns

Pattern 1: Content Verification

Prompt: "Analyze this screenshot. Answer:
1. Is the main content rendered (not blank/loading)?
2. What major UI elements are visible?
3. Are there any error states or broken layouts?
4. Rate content completeness: FULL / PARTIAL / EMPTY"

Pattern 2: Theme Compliance

Prompt: "This is a {dark/light} mode screenshot. Verify:
1. Background color matches expected theme (dark bg for dark mode)
2. Text has sufficient contrast against background
3. Interactive elements are visible and styled correctly
4. No theme leakage (dark elements on light bg or vice versa)"

Pattern 3: Comparison Analysis

Prompt: "Compare these two screenshots (before/after). Identify:
1. What changed between them?
2. Are changes intentional (theme switch) or bugs?
3. Is any content missing in the 'after' version?
4. Rate similarity: IDENTICAL / MINOR_DIFF / MAJOR_DIFF / BROKEN"

Pattern 4: Accessibility Check

Prompt: "Evaluate this screenshot for visual accessibility:
1. Is text readable (sufficient size and contrast)?
2. Are interactive elements clearly identifiable?
3. Is there visual hierarchy (headings, sections)?
4. Any elements that would fail WCAG contrast requirements?"

Complete Test Script Template

#!/usr/bin/env python3
"""
LLM-Powered Screenshot Test Suite
Captures screenshots and uses Claude vision for semantic analysis.
"""

from playwright.sync_api import sync_playwright
import os
import time

PAGES_TO_TEST = [
    # (path, name, content_selectors)
    ('/', 'Home', '.hero, main, h1'),
    ('/about', 'About', '.about-content, main, h1'),
    ('/dashboard', 'Dashboard', '.dashboard, .stats, h1'),
]

BASE_URL = 'http://localhost:5173'
SCREENSHOT_DIR = '/tmp/visual-tests'


def wait_for_content(page, selectors, timeout=10000):
    """Wait for React/Vue/Svelte to hydrate."""
    page.wait_for_load_state('domcontentloaded')
    page.wait_for_load_state('networkidle')
    time.sleep(0.5)

    for selector in selectors.split(','):
        try:
            loc = page.locator(selector.strip())
            if loc.count() > 0:
                loc.first.wait_for(state='visible', timeout=timeout)
                return True
        except:
            continue

    try:
        page.wait_for_function('document.body.innerText.length > 100', timeout=timeout)
        return True
    except:
        return False


def capture_themed_screenshots(page, url, name, selectors):
    """Capture both dark and light mode screenshots."""
    safe_name = name.lower().replace(' ', '-')
    results = {'name': name, 'url': url}

    for theme in ['dark', 'light']:
        page.emulate_media(color_scheme=theme)

        if theme == 'dark':
            page.goto(url, wait_until='domcontentloaded')
        else:
            page.reload(wait_until='domcontentloaded')

        content_loaded = wait_for_content(page, selectors)

        if not content_loaded:
            print(f"  ⚠️  {theme} mode: Content slow to load, waiting...")
            time.sleep(2)

        screenshot_path = f'{SCREENSHOT_DIR}/{safe_name}-{theme}.png'
        page.screenshot(path=screenshot_path, full_page=True)

        # Check content length
        body_text = page.locator('body').inner_text().strip()
        results[f'{theme}_screenshot'] = screenshot_path
        results[f'{theme}_content_length'] = len(body_text)
        results[f'{theme}_has_content'] = len(body_text) > 50

        print(f"  {theme}: {'✅' if results[f'{theme}_has_content'] else '❌'} ({len(body_text)} chars)")

    return results


def run_tests():
    """Run visual tests on all pages."""
    os.makedirs(SCREENSHOT_DIR, exist_ok=True)

    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context(
            viewport={'width': 1280, 'height': 800},
            color_scheme='dark'
        )
        page = context.new_page()

        # Capture console errors
        errors = []
        page.on('console', lambda m: errors.append(m.text) if m.type == 'error' else None)

        results = []

        for path, name, selectors in PAGES_TO_TEST:
            print(f"Testing {name}...")
            url = f'{BASE_URL}{path}'
            result = capture_themed_screenshots(page, url, name, selectors)
            result['errors'] = list(errors)
            errors.clear()
            results.append(result)

        browser.close()

        # Summary
        print("\n" + "=" * 50)
        print("VISUAL TEST SUMMARY")
        print("=" * 50)

        passed = sum(1 for r in results
                     if r.get('dark_has_content') and r.get('light_has_content'))
        print(f"\nPassed: {passed}/{len(results)}")
        print(f"Screenshots: {SCREENSHOT_DIR}")

        return results


if __name__ == '__main__':
    run_tests()

MCP vs Native Playwright Decision Tree

What are you doing?
│
├─ Interactive debugging / exploring
│  └─► Playwright MCP (see live browser)
│
├─ Automated test suite
│  └─► Native Python Playwright (headless)
│
├─ CI/CD pipeline
│  └─► Native Python Playwright (headless)
│
├─ Screenshot capture for LLM analysis
│  └─► Native Python Playwright (headless)
│
└─ One-off inspection
   └─► Either works, MCP is convenient

Common Failures and Fixes

Failure: Blank Screenshots

Cause: Screenshot taken before React hydrates Fix: Wait for content selectors, add hydration buffer

Failure: "Reconnecting..." Badge Visible

Cause: HMR/WebSocket not connected (cosmetic in tests) Fix: This is often fine - focus on actual content

Failure: Theme Not Applied

Cause: emulate_media called on context instead of page Fix: Use page.emulate_media(color_scheme='dark')

Failure: Browser Windows Spawning

Cause: headless=False or using MCP instead of native Fix: Use p.chromium.launch(headless=True)

Failure: Timeout on Content

Cause: Wrong selectors or page actually broken Fix: Verify selectors exist, check console errors

Integration with Claude Code

When Claude reads screenshots captured by this pattern:

Request specific analysis: Don't just show screenshot - ask targeted questions
Provide context: "This should be dark mode" or "This is the login page"
Compare systematically: Before/after, dark/light, desktop/mobile
Trust semantic analysis: LLM can tell "blank page" from "content loaded"

References

Research Papers

Using Vision LLMs For UI Testing - University of Washington
Vision-driven Automated Mobile GUI Testing - Multimodal LLM approach
ScreenLLM: Stateful Screen Schema - UI understanding framework

Tools & Integrations

Official Documentation

Playwright Visual Comparisons

Version History

2026-01-23: Initial skill creation
- Researched multimodal LLM screenshot analysis best practices
- Documented React hydration waiting patterns
- Added headless mode requirements
- Created complete test script template

Core Insight: The difference between useless and useful screenshot tests is waiting for content, not just network. LLMs can analyze semantics, but only if there's actually content to analyze.