LLM and AI application security testing skill for prompt injection, jailbreaking, and AI system vulnerabilities...
Thin router skill for security testing of LLM applications and AI agents. Covers the OWASP LLM Top 10 (2025) with a 2026-grade threat model for frontier-model agentic systems: indirect injection, multimodal injection, MCP supply chain, memory poisoning, skill-file injection, computer-use UI injection, and agentic tool misuse.
Defensive / educational framing. Every workflow here assumes written authorization to test the target. Canary strings, throwaway accounts, and controlled endpoints are preferred over real-data exploitation at every step.
"test this LLM for prompt injection", "jailbreak this model" (authorized), "test AI guardrails", "assess RAG security", "poison this RAG corpus", "test MCP server injection", "red-team this agent", "extract system prompt", "test agent tool misuse", "test computer use UI injection", "audit LLM application security", "test multimodal injection", "test memory poisoning", "audit CLAUDE.md for injection".
api-security.sast-orchestration.cloud-security / iac-security.web-security.Many engagements need multiple skills; call them in parallel when scopes don't overlap.
Is the target an agent with tools? ─ yes ─▶ excessive_agency_testing.md
│ └─▶ agentic_tool_misuse.md
no
▼
Does it ingest external content (RAG/web/email)? ─ yes ─▶ indirect_injection_testing.md
│ └─▶ rag_poisoning.md (if RAG)
no
▼
Multimodal input accepted? ─ yes ─▶ payloads/multimodal_injection.md
│ └─▶ computer_use_abuse.md (if screen-controller)
no
▼
MCP servers attached? ─ yes ─▶ mcp_server_injection.md
no
▼
Persistent memory / cross-session state? ─ yes ─▶ memory_poisoning.md
no
▼
Project loads CLAUDE.md / skills / rules? ─ yes ─▶ skill_file_injection.md
no
▼
Always run last: direct_injection_testing.md + system_prompt_extraction.md
Parallelizable (fire concurrently, per rate limits):
agentic_tool_misuse.mdSequential (must observe one at a time):
Two clean partitions — pick whichever matches the engagement:
By OWASP category (one sub-agent each) for comprehensive coverage:
By attack surface for deep-dive on one class:
Parent agent aggregates findings (schemas/finding.json), de-dupes, and
cross-references overlapping findings (e.g. an MCP injection that enables
tool misuse).
Use extended thinking for:
Minimal thinking for:
payloads/multimodal_injection.md.workflows/computer_use_abuse.md.evidence.screenshot) for every
multimodal finding — visual proof is essential.Frontier models are also useful as testing tools: use a separate vision-capable model to generate candidate adversarial images and to judge whether OCR extraction succeeded.
All findings use schemas/finding.json. Required fields:
id, title, severity, attack_class, evidence, reproduction,
remediation. Skill-specific fields include attack_class,
target_model, target_agent, payload (with modality and delivery
vector), success_indicator, owasp_llm_id, defense_bypassed.
| Workflow | When |
|---|---|
workflows/direct_injection_testing.md |
Text prompts directly in user channel |
workflows/indirect_injection_testing.md |
Content arrives via retrieval / tools / email |
workflows/system_prompt_extraction.md |
Recover system prompt / tool schemas |
workflows/rag_poisoning.md |
RAG corpus + retrieval-layer attacks |
workflows/agentic_tool_misuse.md |
Coerce agent to misuse file/http/shell tools |
workflows/memory_poisoning.md |
Persistent cross-session memory attacks |
workflows/mcp_server_injection.md |
Malicious MCP server → host agent |
workflows/skill_file_injection.md |
CLAUDE.md / .cursor/rules / SKILL.md as vector |
workflows/computer_use_abuse.md |
Screenshot/UI-based injection for computer-use agents |
workflows/excessive_agency_testing.md |
Blast-radius assessment (OWASP LLM06) |
| File | Contents |
|---|---|
payloads/injection_2026.txt |
Modern direct/indirect injection patterns (trust-boundary, authority spoof, tool-result spoof, CoT injection) |
payloads/system_prompt_extraction.txt |
Full-dump + partial-leak + tool-schema extraction |
payloads/encoding_obfuscation.txt |
Base64, ROT, hex, unicode homoglyph, zero-width, emoji smuggle, tag-char |
payloads/multimodal_injection.md |
Image / audio / video / screenshot payload descriptions |
payloads/legacy_jailbreaks.txt |
DAN / STAN / DUDE / roleplay — regression only |
| File | Contents |
|---|---|
references/owasp_llm_top10_2025.md |
OWASP LLM Top 10 table + 2026 coverage checklist |
references/defense_patterns_2026.md |
Constitutional, classifiers, spotlighting, HITL, allowlisting — with known bypass hints |
references/threat_model_agents.md |
Actors, assets, surfaces, T1-T10 scenarios for agentic systems |
references/bounty_patterns_2024_2026.md |
Post-2023 public bug-bounty TTPs (RAG poisoning, CVE-2025-53773 tool-chain RCE, multimodal injection, adaptive defense evasion) |
| File | Contents |
|---|---|
examples/indirect_injection_doc.md |
Ready-to-deploy injection doc for RAG / shared drive |
examples/malicious_mcp_response.json |
Malicious MCP tool-response body |
examples/poisoned_rag_chunk.md |
Retrieval-optimized poisoning chunk |
| Tool | Purpose | Install |
|---|---|---|
promptfoo |
Automated prompt-injection sweeps and eval | npm i -g promptfoo |
garak |
LLM vulnerability scanner (NVIDIA) | pip install garak |
giskard |
LLM testing & evaluation | pip install giskard |
pyrit |
Microsoft's AI red-team toolkit | pip install pyrit |
@modelcontextprotocol/sdk |
Build controlled test MCP servers | npm i @modelcontextprotocol/sdk |
| custom HTTP server | Attacker-endpoint for exfil signal | any language |
anthropic, openai, google-genai SDKs |
Drive target APIs | per-SDK |
Use your own logging endpoint for exfil-signal tests so you can unambiguously confirm tool invocation.
Every engagement MUST have:
Populate authorization.scope_document and authorization.contact on
every finding record.
2026-04. Minimum tool versions tested: