Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    davila7

    stable-diffusion-image-generation

    davila7/stable-diffusion-image-generation
    AI & ML
    19,892
    4 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    State-of-the-art text-to-image generation with Stable Diffusion models via HuggingFace Diffusers...

    SKILL.md

    Stable Diffusion Image Generation

    Comprehensive guide to generating images with Stable Diffusion using the HuggingFace Diffusers library.

    When to use Stable Diffusion

    Use Stable Diffusion when:

    • Generating images from text descriptions
    • Performing image-to-image translation (style transfer, enhancement)
    • Inpainting (filling in masked regions)
    • Outpainting (extending images beyond boundaries)
    • Creating variations of existing images
    • Building custom image generation workflows

    Key features:

    • Text-to-Image: Generate images from natural language prompts
    • Image-to-Image: Transform existing images with text guidance
    • Inpainting: Fill masked regions with context-aware content
    • ControlNet: Add spatial conditioning (edges, poses, depth)
    • LoRA Support: Efficient fine-tuning and style adaptation
    • Multiple Models: SD 1.5, SDXL, SD 3.0, Flux support

    Use alternatives instead:

    • DALL-E 3: For API-based generation without GPU
    • Midjourney: For artistic, stylized outputs
    • Imagen: For Google Cloud integration
    • Leonardo.ai: For web-based creative workflows

    Quick start

    Installation

    pip install diffusers transformers accelerate torch
    pip install xformers  # Optional: memory-efficient attention
    

    Basic text-to-image

    from diffusers import DiffusionPipeline
    import torch
    
    # Load pipeline (auto-detects model type)
    pipe = DiffusionPipeline.from_pretrained(
        "stable-diffusion-v1-5/stable-diffusion-v1-5",
        torch_dtype=torch.float16
    )
    pipe.to("cuda")
    
    # Generate image
    image = pipe(
        "A serene mountain landscape at sunset, highly detailed",
        num_inference_steps=50,
        guidance_scale=7.5
    ).images[0]
    
    image.save("output.png")
    

    Using SDXL (higher quality)

    from diffusers import AutoPipelineForText2Image
    import torch
    
    pipe = AutoPipelineForText2Image.from_pretrained(
        "stabilityai/stable-diffusion-xl-base-1.0",
        torch_dtype=torch.float16,
        variant="fp16"
    )
    pipe.to("cuda")
    
    # Enable memory optimization
    pipe.enable_model_cpu_offload()
    
    image = pipe(
        prompt="A futuristic city with flying cars, cinematic lighting",
        height=1024,
        width=1024,
        num_inference_steps=30
    ).images[0]
    

    Architecture overview

    Three-pillar design

    Diffusers is built around three core components:

    Pipeline (orchestration)
    ├── Model (neural networks)
    │   ├── UNet / Transformer (noise prediction)
    │   ├── VAE (latent encoding/decoding)
    │   └── Text Encoder (CLIP/T5)
    └── Scheduler (denoising algorithm)
    

    Pipeline inference flow

    Text Prompt → Text Encoder → Text Embeddings
                                        ↓
    Random Noise → [Denoising Loop] ← Scheduler
                          ↓
                   Predicted Noise
                          ↓
                  VAE Decoder → Final Image
    

    Core concepts

    Pipelines

    Pipelines orchestrate complete workflows:

    Pipeline Purpose
    StableDiffusionPipeline Text-to-image (SD 1.x/2.x)
    StableDiffusionXLPipeline Text-to-image (SDXL)
    StableDiffusion3Pipeline Text-to-image (SD 3.0)
    FluxPipeline Text-to-image (Flux models)
    StableDiffusionImg2ImgPipeline Image-to-image
    StableDiffusionInpaintPipeline Inpainting

    Schedulers

    Schedulers control the denoising process:

    Scheduler Steps Quality Use Case
    EulerDiscreteScheduler 20-50 Good Default choice
    EulerAncestralDiscreteScheduler 20-50 Good More variation
    DPMSolverMultistepScheduler 15-25 Excellent Fast, high quality
    DDIMScheduler 50-100 Good Deterministic
    LCMScheduler 4-8 Good Very fast
    UniPCMultistepScheduler 15-25 Excellent Fast convergence

    Swapping schedulers

    from diffusers import DPMSolverMultistepScheduler
    
    # Swap for faster generation
    pipe.scheduler = DPMSolverMultistepScheduler.from_config(
        pipe.scheduler.config
    )
    
    # Now generate with fewer steps
    image = pipe(prompt, num_inference_steps=20).images[0]
    

    Generation parameters

    Key parameters

    Parameter Default Description
    prompt Required Text description of desired image
    negative_prompt None What to avoid in the image
    num_inference_steps 50 Denoising steps (more = better quality)
    guidance_scale 7.5 Prompt adherence (7-12 typical)
    height, width 512/1024 Output dimensions (multiples of 8)
    generator None Torch generator for reproducibility
    num_images_per_prompt 1 Batch size

    Reproducible generation

    import torch
    
    generator = torch.Generator(device="cuda").manual_seed(42)
    
    image = pipe(
        prompt="A cat wearing a top hat",
        generator=generator,
        num_inference_steps=50
    ).images[0]
    

    Negative prompts

    image = pipe(
        prompt="Professional photo of a dog in a garden",
        negative_prompt="blurry, low quality, distorted, ugly, bad anatomy",
        guidance_scale=7.5
    ).images[0]
    

    Image-to-image

    Transform existing images with text guidance:

    from diffusers import AutoPipelineForImage2Image
    from PIL import Image
    
    pipe = AutoPipelineForImage2Image.from_pretrained(
        "stable-diffusion-v1-5/stable-diffusion-v1-5",
        torch_dtype=torch.float16
    ).to("cuda")
    
    init_image = Image.open("input.jpg").resize((512, 512))
    
    image = pipe(
        prompt="A watercolor painting of the scene",
        image=init_image,
        strength=0.75,  # How much to transform (0-1)
        num_inference_steps=50
    ).images[0]
    

    Inpainting

    Fill masked regions:

    from diffusers import AutoPipelineForInpainting
    from PIL import Image
    
    pipe = AutoPipelineForInpainting.from_pretrained(
        "runwayml/stable-diffusion-inpainting",
        torch_dtype=torch.float16
    ).to("cuda")
    
    image = Image.open("photo.jpg")
    mask = Image.open("mask.png")  # White = inpaint region
    
    result = pipe(
        prompt="A red car parked on the street",
        image=image,
        mask_image=mask,
        num_inference_steps=50
    ).images[0]
    

    ControlNet

    Add spatial conditioning for precise control:

    from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
    import torch
    
    # Load ControlNet for edge conditioning
    controlnet = ControlNetModel.from_pretrained(
        "lllyasviel/control_v11p_sd15_canny",
        torch_dtype=torch.float16
    )
    
    pipe = StableDiffusionControlNetPipeline.from_pretrained(
        "stable-diffusion-v1-5/stable-diffusion-v1-5",
        controlnet=controlnet,
        torch_dtype=torch.float16
    ).to("cuda")
    
    # Use Canny edge image as control
    control_image = get_canny_image(input_image)
    
    image = pipe(
        prompt="A beautiful house in the style of Van Gogh",
        image=control_image,
        num_inference_steps=30
    ).images[0]
    

    Available ControlNets

    ControlNet Input Type Use Case
    canny Edge maps Preserve structure
    openpose Pose skeletons Human poses
    depth Depth maps 3D-aware generation
    normal Normal maps Surface details
    mlsd Line segments Architectural lines
    scribble Rough sketches Sketch-to-image

    LoRA adapters

    Load fine-tuned style adapters:

    from diffusers import DiffusionPipeline
    
    pipe = DiffusionPipeline.from_pretrained(
        "stable-diffusion-v1-5/stable-diffusion-v1-5",
        torch_dtype=torch.float16
    ).to("cuda")
    
    # Load LoRA weights
    pipe.load_lora_weights("path/to/lora", weight_name="style.safetensors")
    
    # Generate with LoRA style
    image = pipe("A portrait in the trained style").images[0]
    
    # Adjust LoRA strength
    pipe.fuse_lora(lora_scale=0.8)
    
    # Unload LoRA
    pipe.unload_lora_weights()
    

    Multiple LoRAs

    # Load multiple LoRAs
    pipe.load_lora_weights("lora1", adapter_name="style")
    pipe.load_lora_weights("lora2", adapter_name="character")
    
    # Set weights for each
    pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5])
    
    image = pipe("A portrait").images[0]
    

    Memory optimization

    Enable CPU offloading

    # Model CPU offload - moves models to CPU when not in use
    pipe.enable_model_cpu_offload()
    
    # Sequential CPU offload - more aggressive, slower
    pipe.enable_sequential_cpu_offload()
    

    Attention slicing

    # Reduce memory by computing attention in chunks
    pipe.enable_attention_slicing()
    
    # Or specific chunk size
    pipe.enable_attention_slicing("max")
    

    xFormers memory-efficient attention

    # Requires xformers package
    pipe.enable_xformers_memory_efficient_attention()
    

    VAE slicing for large images

    # Decode latents in tiles for large images
    pipe.enable_vae_slicing()
    pipe.enable_vae_tiling()
    

    Model variants

    Loading different precisions

    # FP16 (recommended for GPU)
    pipe = DiffusionPipeline.from_pretrained(
        "model-id",
        torch_dtype=torch.float16,
        variant="fp16"
    )
    
    # BF16 (better precision, requires Ampere+ GPU)
    pipe = DiffusionPipeline.from_pretrained(
        "model-id",
        torch_dtype=torch.bfloat16
    )
    

    Loading specific components

    from diffusers import UNet2DConditionModel, AutoencoderKL
    
    # Load custom VAE
    vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
    
    # Use with pipeline
    pipe = DiffusionPipeline.from_pretrained(
        "stable-diffusion-v1-5/stable-diffusion-v1-5",
        vae=vae,
        torch_dtype=torch.float16
    )
    

    Batch generation

    Generate multiple images efficiently:

    # Multiple prompts
    prompts = [
        "A cat playing piano",
        "A dog reading a book",
        "A bird painting a picture"
    ]
    
    images = pipe(prompts, num_inference_steps=30).images
    
    # Multiple images per prompt
    images = pipe(
        "A beautiful sunset",
        num_images_per_prompt=4,
        num_inference_steps=30
    ).images
    

    Common workflows

    Workflow 1: High-quality generation

    from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
    import torch
    
    # 1. Load SDXL with optimizations
    pipe = StableDiffusionXLPipeline.from_pretrained(
        "stabilityai/stable-diffusion-xl-base-1.0",
        torch_dtype=torch.float16,
        variant="fp16"
    )
    pipe.to("cuda")
    pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
    pipe.enable_model_cpu_offload()
    
    # 2. Generate with quality settings
    image = pipe(
        prompt="A majestic lion in the savanna, golden hour lighting, 8k, detailed fur",
        negative_prompt="blurry, low quality, cartoon, anime, sketch",
        num_inference_steps=30,
        guidance_scale=7.5,
        height=1024,
        width=1024
    ).images[0]
    

    Workflow 2: Fast prototyping

    from diffusers import AutoPipelineForText2Image, LCMScheduler
    import torch
    
    # Use LCM for 4-8 step generation
    pipe = AutoPipelineForText2Image.from_pretrained(
        "stabilityai/stable-diffusion-xl-base-1.0",
        torch_dtype=torch.float16
    ).to("cuda")
    
    # Load LCM LoRA for fast generation
    pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
    pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
    pipe.fuse_lora()
    
    # Generate in ~1 second
    image = pipe(
        "A beautiful landscape",
        num_inference_steps=4,
        guidance_scale=1.0
    ).images[0]
    

    Common issues

    CUDA out of memory:

    # Enable memory optimizations
    pipe.enable_model_cpu_offload()
    pipe.enable_attention_slicing()
    pipe.enable_vae_slicing()
    
    # Or use lower precision
    pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
    

    Black/noise images:

    # Check VAE configuration
    # Use safety checker bypass if needed
    pipe.safety_checker = None
    
    # Ensure proper dtype consistency
    pipe = pipe.to(dtype=torch.float16)
    

    Slow generation:

    # Use faster scheduler
    from diffusers import DPMSolverMultistepScheduler
    pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
    
    # Reduce steps
    image = pipe(prompt, num_inference_steps=20).images[0]
    

    References

    • Advanced Usage - Custom pipelines, fine-tuning, deployment
    • Troubleshooting - Common issues and solutions

    Resources

    • Documentation: https://huggingface.co/docs/diffusers
    • Repository: https://github.com/huggingface/diffusers
    • Model Hub: https://huggingface.co/models?library=diffusers
    • Discord: https://discord.gg/diffusers
    Recommended Servers
    Gemini
    Gemini
    Nanobanana
    Nanobanana
    LILT
    LILT
    Repository
    davila7/claude-code-templates
    Files