Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    eyadsibai

    modal

    eyadsibai/modal
    AI & ML

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Use when "Modal", "serverless GPU", "cloud GPU", "deploy ML model", or asking about "serverless containers", "GPU compute", "batch processing", "scheduled jobs", "autoscaling ML"

    SKILL.md

    Modal Serverless Cloud Platform

    Serverless Python execution with GPUs, autoscaling, and pay-per-use compute.

    When to Use

    • Deploy and serve ML models (LLMs, image generation)
    • Run GPU-accelerated computation
    • Batch process large datasets in parallel
    • Schedule compute-intensive jobs
    • Build serverless APIs with autoscaling

    Quick Start

    # Install
    pip install modal
    
    # Authenticate
    modal token new
    
    import modal
    
    app = modal.App("my-app")
    
    @app.function()
    def hello():
        return "Hello from Modal!"
    
    # Run with: modal run script.py
    

    Container Images

    # Build image with dependencies
    image = (
        modal.Image.debian_slim(python_version="3.12")
        .pip_install("torch", "transformers", "numpy")
    )
    
    app = modal.App("ml-app", image=image)
    

    GPU Functions

    @app.function(gpu="H100")
    def train_model():
        import torch
        assert torch.cuda.is_available()
        # GPU code here
    
    # Available GPUs: T4, L4, A10, A100, L40S, H100, H200, B200
    # Multi-GPU: gpu="H100:8"
    

    Web Endpoints

    @app.function()
    @modal.web_endpoint(method="POST")
    def predict(data: dict):
        result = model.predict(data["input"])
        return {"prediction": result}
    
    # Deploy: modal deploy script.py
    

    Scheduled Jobs

    @app.function(schedule=modal.Cron("0 2 * * *"))  # Daily at 2 AM
    def daily_backup():
        pass
    
    @app.function(schedule=modal.Period(hours=4))  # Every 4 hours
    def refresh_cache():
        pass
    

    Autoscaling

    @app.function()
    def process_item(item_id: int):
        return analyze(item_id)
    
    @app.local_entrypoint()
    def main():
        items = range(1000)
        # Automatically parallelized across containers
        results = list(process_item.map(items))
    

    Persistent Storage

    volume = modal.Volume.from_name("my-data", create_if_missing=True)
    
    @app.function(volumes={"/data": volume})
    def save_results(data):
        with open("/data/results.txt", "w") as f:
            f.write(data)
        volume.commit()  # Persist changes
    

    Secrets Management

    @app.function(secrets=[modal.Secret.from_name("huggingface")])
    def download_model():
        import os
        token = os.environ["HF_TOKEN"]
    

    ML Model Serving

    @app.cls(gpu="L40S")
    class Model:
        @modal.enter()
        def load_model(self):
            from transformers import pipeline
            self.pipe = pipeline("text-classification", device="cuda")
    
        @modal.method()
        def predict(self, text: str):
            return self.pipe(text)
    
    @app.local_entrypoint()
    def main():
        model = Model()
        result = model.predict.remote("Modal is great!")
    

    Resource Configuration

    @app.function(
        cpu=8.0,              # 8 CPU cores
        memory=32768,         # 32 GiB RAM
        ephemeral_disk=10240, # 10 GiB disk
        timeout=3600          # 1 hour timeout
    )
    def memory_intensive_task():
        pass
    

    Best Practices

    1. Pin dependencies for reproducible builds
    2. Use appropriate GPU types - L40S for inference, H100 for training
    3. Leverage caching via Volumes for model weights
    4. Use .map() for parallel processing
    5. Import packages inside functions if not available locally
    6. Store secrets securely - never hardcode API keys

    vs Alternatives

    Platform Best For
    Modal Serverless GPUs, autoscaling, Python-native
    RunPod GPU rental, long-running jobs
    AWS Lambda CPU workloads, AWS ecosystem
    Replicate Model hosting, simple deployments

    Resources

    • Docs: https://modal.com/docs
    • Examples: https://github.com/modal-labs/modal-examples
    Recommended Servers
    Google Compute Engine
    Google Compute Engine
    Vercel
    Vercel
    Kernel
    Kernel
    Repository
    eyadsibai/ltk
    Files