LLM-powered visual testing expert for automated screenshot capture, analysis, and UI verification using Playwright with multimodal AI inspection.
LLM-powered visual testing expert for automated screenshot capture, analysis, and UI verification using Playwright with multimodal AI inspection.
Activate on:
NOT for:
toHaveScreenshot)Traditional visual testing compares pixels. LLM-powered visual testing understands semantics.
Instead of "these 50 pixels changed", LLM inspection answers:
┌─────────────────────────────────────────────────────────────┐
│ LLM SCREENSHOT INSPECTION │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. CAPTURE (Playwright) │
│ └─► Wait for React hydration, not just network │
│ │
│ 2. READ (Claude vision) │
│ └─► Pass screenshot to LLM with specific questions │
│ │
│ 3. ANALYZE (Structured response) │
│ └─► Extract: content present? theme correct? errors? │
│ │
│ 4. ACT (Conditional logic) │
│ └─► Pass/fail based on semantic understanding │
│ │
└─────────────────────────────────────────────────────────────┘
The #1 failure mode: Taking screenshots before React hydrates.
# ❌ WRONG - React may not have rendered yet
page.goto(url)
page.wait_for_load_state('networkidle')
page.screenshot(path='broken.png') # Often blank!
# ✅ CORRECT - Wait for React to mount
page.goto(url, wait_until='domcontentloaded')
page.wait_for_load_state('networkidle')
# Give React time to hydrate
import time
time.sleep(0.5)
# Wait for actual content selector
page.wait_for_selector('.main-content, h1, [data-testid="app"]',
state='visible',
timeout=10000)
# Verify content exists
body_text = page.locator('body').inner_text()
if len(body_text) < 50:
time.sleep(2) # Extra wait for slow hydration
page.screenshot(path='good.png', full_page=True)
def wait_for_react_content(page, selectors, timeout=10000):
"""Wait for React to hydrate by checking for actual content."""
page.wait_for_load_state('domcontentloaded')
page.wait_for_load_state('networkidle')
time.sleep(0.5) # React hydration buffer
for selector in selectors.split(','):
try:
locator = page.locator(selector.strip())
if locator.count() > 0:
locator.first.wait_for(state='visible', timeout=timeout)
return True
except:
continue
# Fallback: wait for substantial body content
try:
page.wait_for_function(
'document.body.innerText.length > 100',
timeout=timeout
)
return True
except:
return False
Always use headless=True to prevent browser windows from spawning:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
# CRITICAL: headless=True prevents visible browser windows
browser = p.chromium.launch(headless=True)
context = browser.new_context(
viewport={'width': 1280, 'height': 800},
color_scheme='dark' # Initial theme
)
page = context.new_page()
# ... your test logic ...
browser.close() # Always clean up
# Dark mode screenshot
page.emulate_media(color_scheme='dark') # Note: on PAGE, not context
page.goto(url)
wait_for_react_content(page, '.app-container, main, h1')
page.screenshot(path='dark.png', full_page=True)
# Light mode screenshot
page.emulate_media(color_scheme='light')
page.reload()
wait_for_react_content(page, '.app-container, main, h1')
page.screenshot(path='light.png', full_page=True)
Prompt: "Analyze this screenshot. Answer:
1. Is the main content rendered (not blank/loading)?
2. What major UI elements are visible?
3. Are there any error states or broken layouts?
4. Rate content completeness: FULL / PARTIAL / EMPTY"
Prompt: "This is a {dark/light} mode screenshot. Verify:
1. Background color matches expected theme (dark bg for dark mode)
2. Text has sufficient contrast against background
3. Interactive elements are visible and styled correctly
4. No theme leakage (dark elements on light bg or vice versa)"
Prompt: "Compare these two screenshots (before/after). Identify:
1. What changed between them?
2. Are changes intentional (theme switch) or bugs?
3. Is any content missing in the 'after' version?
4. Rate similarity: IDENTICAL / MINOR_DIFF / MAJOR_DIFF / BROKEN"
Prompt: "Evaluate this screenshot for visual accessibility:
1. Is text readable (sufficient size and contrast)?
2. Are interactive elements clearly identifiable?
3. Is there visual hierarchy (headings, sections)?
4. Any elements that would fail WCAG contrast requirements?"
#!/usr/bin/env python3
"""
LLM-Powered Screenshot Test Suite
Captures screenshots and uses Claude vision for semantic analysis.
"""
from playwright.sync_api import sync_playwright
import os
import time
PAGES_TO_TEST = [
# (path, name, content_selectors)
('/', 'Home', '.hero, main, h1'),
('/about', 'About', '.about-content, main, h1'),
('/dashboard', 'Dashboard', '.dashboard, .stats, h1'),
]
BASE_URL = 'http://localhost:5173'
SCREENSHOT_DIR = '/tmp/visual-tests'
def wait_for_content(page, selectors, timeout=10000):
"""Wait for React/Vue/Svelte to hydrate."""
page.wait_for_load_state('domcontentloaded')
page.wait_for_load_state('networkidle')
time.sleep(0.5)
for selector in selectors.split(','):
try:
loc = page.locator(selector.strip())
if loc.count() > 0:
loc.first.wait_for(state='visible', timeout=timeout)
return True
except:
continue
try:
page.wait_for_function('document.body.innerText.length > 100', timeout=timeout)
return True
except:
return False
def capture_themed_screenshots(page, url, name, selectors):
"""Capture both dark and light mode screenshots."""
safe_name = name.lower().replace(' ', '-')
results = {'name': name, 'url': url}
for theme in ['dark', 'light']:
page.emulate_media(color_scheme=theme)
if theme == 'dark':
page.goto(url, wait_until='domcontentloaded')
else:
page.reload(wait_until='domcontentloaded')
content_loaded = wait_for_content(page, selectors)
if not content_loaded:
print(f" ⚠️ {theme} mode: Content slow to load, waiting...")
time.sleep(2)
screenshot_path = f'{SCREENSHOT_DIR}/{safe_name}-{theme}.png'
page.screenshot(path=screenshot_path, full_page=True)
# Check content length
body_text = page.locator('body').inner_text().strip()
results[f'{theme}_screenshot'] = screenshot_path
results[f'{theme}_content_length'] = len(body_text)
results[f'{theme}_has_content'] = len(body_text) > 50
print(f" {theme}: {'✅' if results[f'{theme}_has_content'] else '❌'} ({len(body_text)} chars)")
return results
def run_tests():
"""Run visual tests on all pages."""
os.makedirs(SCREENSHOT_DIR, exist_ok=True)
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
context = browser.new_context(
viewport={'width': 1280, 'height': 800},
color_scheme='dark'
)
page = context.new_page()
# Capture console errors
errors = []
page.on('console', lambda m: errors.append(m.text) if m.type == 'error' else None)
results = []
for path, name, selectors in PAGES_TO_TEST:
print(f"Testing {name}...")
url = f'{BASE_URL}{path}'
result = capture_themed_screenshots(page, url, name, selectors)
result['errors'] = list(errors)
errors.clear()
results.append(result)
browser.close()
# Summary
print("\n" + "=" * 50)
print("VISUAL TEST SUMMARY")
print("=" * 50)
passed = sum(1 for r in results
if r.get('dark_has_content') and r.get('light_has_content'))
print(f"\nPassed: {passed}/{len(results)}")
print(f"Screenshots: {SCREENSHOT_DIR}")
return results
if __name__ == '__main__':
run_tests()
What are you doing?
│
├─ Interactive debugging / exploring
│ └─► Playwright MCP (see live browser)
│
├─ Automated test suite
│ └─► Native Python Playwright (headless)
│
├─ CI/CD pipeline
│ └─► Native Python Playwright (headless)
│
├─ Screenshot capture for LLM analysis
│ └─► Native Python Playwright (headless)
│
└─ One-off inspection
└─► Either works, MCP is convenient
Cause: Screenshot taken before React hydrates Fix: Wait for content selectors, add hydration buffer
Cause: HMR/WebSocket not connected (cosmetic in tests) Fix: This is often fine - focus on actual content
Cause: emulate_media called on context instead of page
Fix: Use page.emulate_media(color_scheme='dark')
Cause: headless=False or using MCP instead of native
Fix: Use p.chromium.launch(headless=True)
Cause: Wrong selectors or page actually broken Fix: Verify selectors exist, check console errors
When Claude reads screenshots captured by this pattern:
Core Insight: The difference between useless and useful screenshot tests is waiting for content, not just network. LLMs can analyze semantics, but only if there's actually content to analyze.