cost-calculator

vanman2024/cost-calculator

AI & ML

About

SKILL.md

cost-calculator

vanman2024/cost-calculator

AI & ML

About

Cost estimation scripts and tools for calculating GPU hours, training costs, and inference pricing across Modal, Lambda Labs, and RunPod platforms...

SKILL.md

ML Training Cost Calculator

Purpose: Provide production-ready cost estimation tools for ML training and inference across cloud GPU platforms (Modal, Lambda Labs, RunPod).

Activation Triggers:

Estimating training costs for ML models
Comparing GPU platform pricing
Calculating GPU hours for training jobs
Budgeting for ML projects
Optimizing inference costs
Evaluating cost-effectiveness of different GPU types
Planning resource allocation

Key Resources:

scripts/estimate-training-cost.sh - Calculate training costs based on model size, data, GPU type
scripts/estimate-inference-cost.sh - Estimate inference costs for production workloads
scripts/calculate-gpu-hours.sh - Convert training parameters to GPU hours
scripts/compare-platforms.sh - Compare costs across Modal, Lambda, RunPod
templates/cost-breakdown.json - Structured cost breakdown template
templates/platform-pricing.yaml - Up-to-date platform pricing data
examples/training-cost-estimate.md - Example training cost calculation
examples/inference-cost-estimate.md - Example inference cost analysis

Platform Pricing Overview

Modal (Serverless - Pay Per Second)

GPU Options:

T4: $0.000164/sec ($0.59/hr) - Development, small models
L4: $0.000222/sec ($0.80/hr) - Cost-effective training
A10: $0.000306/sec ($1.10/hr) - Mid-range training
A100 40GB: $0.000583/sec ($2.10/hr) - Large model training
A100 80GB: $0.000694/sec ($2.50/hr) - Very large models
H100: $0.001097/sec ($3.95/hr) - Cutting-edge training
H200: $0.001261/sec ($4.54/hr) - Latest generation
B200: $0.001736/sec ($6.25/hr) - Maximum performance

Free Credits:

Starter: $30/month free
Startup credits: Up to $50,000 FREE

Lambda Labs (On-Demand Hourly)

Single GPU:

1x A10: $0.31/hr - Cheapest single GPU option
1x V100 16GB: $0.55/hr - Most affordable multi-GPU base

8x GPU Clusters:

8x V100: $4.40/hr ($0.55/GPU) - Most affordable multi-GPU
8x A100 40GB: $10.32/hr ($1.29/GPU)
8x A100 80GB: $14.32/hr ($1.79/GPU)
8x H100: $23.92/hr ($2.99/GPU)

RunPod (Serverless - Pay Per Minute)

Key Features:

Pay-per-minute billing
FlashBoot <200ms cold-starts
Zero egress fees on storage
30+ GPU SKUs available

Cost Estimation Scripts

1. Estimate Training Cost

Script: scripts/estimate-training-cost.sh

Usage:

bash scripts/estimate-training-cost.sh \
  --model-size 7B \
  --dataset-size 10000 \
  --epochs 3 \
  --gpu t4 \
  --platform modal

Parameters:

--model-size: Model size (125M, 350M, 1B, 3B, 7B, 13B, 70B)
--dataset-size: Number of training samples
--epochs: Number of training epochs
--batch-size: Training batch size (default: auto-calculated)
--gpu: GPU type (t4, a10, a100-40gb, a100-80gb, h100)
--platform: Cloud platform (modal, lambda, runpod)
--peft: Use PEFT/LoRA (yes/no, default: no)
--mixed-precision: Use FP16/BF16 (yes/no, default: yes)

Output:

{
  "model": "7B",
  "dataset_size": 10000,
  "epochs": 3,
  "gpu": "T4",
  "platform": "Modal",
  "estimated_hours": 4.2,
  "cost_breakdown": {
    "compute_cost": 2.48,
    "storage_cost": 0.05,
    "total_cost": 2.53
  },
  "cost_optimizations": {
    "with_peft": 1.26,
    "savings_percentage": 50
  },
  "alternative_platforms": {
    "lambda_a10": 1.30,
    "runpod_t4": 2.40
  }
}

Calculation Methodology:

Estimates tokens per sample (avg 500 tokens)
Calculates total training tokens
Applies throughput rates per GPU type
Accounts for PEFT (90% memory reduction)
Accounts for mixed precision (2x speedup)

2. Estimate Inference Cost

Script: scripts/estimate-inference-cost.sh

Usage:

bash scripts/estimate-inference-cost.sh \
  --requests-per-day 1000 \
  --avg-latency 2 \
  --gpu t4 \
  --platform modal \
  --deployment serverless

Parameters:

--requests-per-day: Expected daily requests
--avg-latency: Average inference time (seconds)
--gpu: GPU type
--platform: Cloud platform
--deployment: Deployment type (serverless, dedicated)
--batch-inference: Batch requests (yes/no, default: no)

Output:

{
  "requests_per_day": 1000,
  "requests_per_month": 30000,
  "avg_latency_sec": 2,
  "gpu": "T4",
  "platform": "Modal Serverless",
  "cost_breakdown": {
    "daily_compute_seconds": 2000,
    "daily_cost": 0.33,
    "monthly_cost": 9.90,
    "cost_per_request": 0.00033
  },
  "scaling_analysis": {
    "requests_10k_day": 99.00,
    "requests_100k_day": 990.00
  },
  "dedicated_alternative": {
    "monthly_cost": 442.50,
    "break_even_requests_day": 4500
  }
}

3. Calculate GPU Hours

Script: scripts/calculate-gpu-hours.sh

Usage:

bash scripts/calculate-gpu-hours.sh \
  --model-params 7B \
  --tokens-total 30M \
  --gpu a100-40gb

Parameters:

--model-params: Model parameters (125M, 350M, 1B, 3B, 7B, 13B, 70B)
--tokens-total: Total training tokens
--gpu: GPU type
--peft: Use PEFT (yes/no)
--multi-gpu: Number of GPUs (default: 1)

GPU Throughput Benchmarks:

T4 (16GB):
  - 7B full fine-tune: 150 tokens/sec
  - 7B with PEFT: 600 tokens/sec

A100 40GB:
  - 7B full fine-tune: 800 tokens/sec
  - 7B with PEFT: 3200 tokens/sec
  - 13B with PEFT: 1600 tokens/sec

A100 80GB:
  - 13B full fine-tune: 600 tokens/sec
  - 70B with PEFT: 400 tokens/sec

H100:
  - 70B with PEFT: 1200 tokens/sec

4. Compare Platforms

Script: scripts/compare-platforms.sh

Usage:

bash scripts/compare-platforms.sh \
  --training-hours 4 \
  --gpu-type a100-40gb

Output:

# Platform Cost Comparison

## Training Job: 4 hours on A100 40GB

| Platform | GPU Cost | Egress Fees | Total | Notes |
|----------|----------|-------------|-------|-------|
| Modal | $8.40 | $0.00 | $8.40 | Serverless, pay-per-second |
| Lambda | $5.16 | $0.00 | $5.16 | Cheapest for dedicated |
| RunPod | $8.00 | $0.00 | $8.00 | Pay-per-minute |

## Winner: Lambda Labs ($5.16)

Savings: $3.24 (38.6% vs Modal)

Recommendation: Use Lambda for long-running dedicated training, Modal for
serverless/bursty workloads.

Cost Templates

Cost Breakdown Template

Template: templates/cost-breakdown.json

{
  "project_name": "ML Training Project",
  "cost_estimate": {
    "training": {
      "model_size": "7B",
      "training_runs": 4,
      "hours_per_run": 4.2,
      "gpu_type": "T4",
      "platform": "Modal",
      "cost_per_run": 2.48,
      "total_training_cost": 9.92
    },
    "inference": {
      "deployment_type": "serverless",
      "expected_requests_month": 30000,
      "gpu_type": "T4",
      "platform": "Modal",
      "monthly_cost": 9.90
    },
    "storage": {
      "model_artifacts_gb": 14,
      "dataset_storage_gb": 5,
      "monthly_storage_cost": 0.50
    },
    "total_monthly_cost": 20.32,
    "breakdown_percentage": {
      "training": 48.8,
      "inference": 48.7,
      "storage": 2.5
    }
  },
  "cost_optimizations_applied": {
    "peft_lora": "50% training cost reduction",
    "mixed_precision": "2x faster training",
    "serverless_inference": "Pay only for actual usage",
    "batch_inference": "Up to 10x reduction in inference cost"
  },
  "potential_savings": {
    "without_optimizations": 45.00,
    "with_optimizations": 20.32,
    "total_savings": 24.68,
    "savings_percentage": 54.8
  }
}

Platform Pricing Data

Template: templates/platform-pricing.yaml

platforms:
  modal:
    billing: per-second
    free_credits: 30  # USD per month
    startup_credits: 50000  # USD for eligible startups
    gpus:
      t4:
        price_per_sec: 0.000164
        price_per_hour: 0.59
        vram_gb: 16
      a100_40gb:
        price_per_sec: 0.000583
        price_per_hour: 2.10
        vram_gb: 40
      h100:
        price_per_sec: 0.001097
        price_per_hour: 3.95
        vram_gb: 80

  lambda:
    billing: per-hour
    free_credits: 0
    minimum_billing: 1-hour
    gpus:
      a10_1x:
        price_per_hour: 0.31
        vram_gb: 24
      a100_40gb_1x:
        price_per_hour: 1.29
        vram_gb: 40
      a100_40gb_8x:
        price_per_hour: 10.32
        total_vram_gb: 320

  runpod:
    billing: per-minute
    free_credits: 0
    features:
      - zero_egress_fees
      - flashboot_200ms
    gpus:
      t4:
        price_per_hour: 0.60  # Approximate
        vram_gb: 16

Cost Estimation Examples

Example 1: Training 7B Model

File: examples/training-cost-estimate.md

Scenario:

Model: Llama 2 7B fine-tuning
Dataset: 10,000 samples (5M tokens)
Epochs: 3
Total tokens: 15M
Method: LoRA/PEFT

Cost Calculation:

bash scripts/estimate-training-cost.sh \
  --model-size 7B \
  --dataset-size 10000 \
  --epochs 3 \
  --gpu t4 \
  --platform modal \
  --peft yes

Results:

Training Time: 4.2 hours
Modal T4 Cost: $2.48
Alternative (Lambda A10): $1.30 (47% cheaper)

Optimization Impact:
- Without PEFT: $12.40 (5x more expensive)
- With PEFT: $2.48
- Savings: $9.92 (80%)

Recommendation: Use Lambda A10 for cheapest option, or Modal T4 for serverless convenience.

Example 2: Production Inference

File: examples/inference-cost-estimate.md

Scenario:

Model: Custom 7B classifier
Expected traffic: 1,000 requests/day
Avg latency: 2 seconds per request
Growth: 10x in 6 months

Cost Calculation:

bash scripts/estimate-inference-cost.sh \
  --requests-per-day 1000 \
  --avg-latency 2 \
  --gpu t4 \
  --platform modal \
  --deployment serverless

Current (1K requests/day):

Serverless Modal T4:
- Daily cost: $0.33
- Monthly cost: $9.90
- Cost per request: $0.00033

Dedicated Lambda A10:
- Monthly cost: $223 (24/7 instance)
- Break-even: 2,250 requests/day
- Not recommended for current traffic

After Growth (10K requests/day):

Serverless Modal T4:
- Monthly cost: $99.00
- Still cost-effective

Dedicated Lambda A10:
- Monthly cost: $223
- Break-even reached at 2,250 requests/day
- Recommendation: Stay serverless until 10K+ daily

Cost Optimization Strategies

1. Use PEFT/LoRA

Savings: 50-90% training cost reduction

# Calculate savings
bash scripts/estimate-training-cost.sh --model-size 7B --peft no
# Cost: $12.40

bash scripts/estimate-training-cost.sh --model-size 7B --peft yes
# Cost: $2.48
# Savings: $9.92 (80%)

2. Mixed Precision Training

Savings: 2x faster training, 50% cost reduction

Automatically enabled in cost estimations with --mixed-precision yes

3. Platform Selection

Use Case Guidelines:

# Short jobs (<1 hour): Modal serverless
bash scripts/compare-platforms.sh --training-hours 0.5 --gpu-type t4
# Winner: Modal ($0.30 vs Lambda $0.31 minimum)

# Long jobs (4+ hours): Lambda dedicated
bash scripts/compare-platforms.sh --training-hours 4 --gpu-type a100-40gb
# Winner: Lambda ($5.16 vs Modal $8.40)

# Variable workloads: Modal serverless
# Pay only for actual usage, no idle cost

4. Batch Inference

Savings: Up to 10x reduction in inference cost

# Single inference
bash scripts/estimate-inference-cost.sh \
  --requests-per-day 1000 \
  --avg-latency 2 \
  --batch-inference no
# Cost: $9.90/month

# Batch inference (10 requests per batch)
bash scripts/estimate-inference-cost.sh \
  --requests-per-day 1000 \
  --avg-latency 0.3 \
  --batch-inference yes
# Cost: $1.49/month
# Savings: $8.41 (85%)

Quick Reference: Cost Per Use Case

Small Model Training (< 1B params)

Best GPU: T4
Best Platform: Modal (serverless)
Typical Cost: $0.50-$2.00 per run
Time: 30 min - 2 hours

Medium Model Training (1B-7B params)

Best GPU: T4 (with PEFT) or A100 40GB
Best Platform: Lambda A10 (cheapest) or Modal T4 (convenience)
Typical Cost: $1.00-$8.00 per run
Time: 2-8 hours

Large Model Training (7B-70B params)

Best GPU: A100 80GB or H100 (with PEFT)
Best Platform: Lambda (dedicated) or Modal (serverless)
Typical Cost: $10-$100 per run
Time: 8-48 hours

Low-Traffic Inference (<1K requests/day)

Best Deployment: Modal serverless
Best GPU: T4
Typical Cost: $5-$15/month

High-Traffic Inference (>10K requests/day)

Best Deployment: Dedicated or batch serverless
Best GPU: A10 or A100
Typical Cost: $100-$500/month

Dependencies

Required for scripts:

# Bash 4.0+ (for associative arrays)
bash --version

# jq (for JSON processing)
sudo apt-get install jq

# bc (for floating-point calculations)
sudo apt-get install bc

# yq (for YAML processing)
pip install yq

Best Practices Summary

Always estimate before training - Use cost scripts to avoid surprises
Use PEFT for large models - 50-90% cost savings
Enable mixed precision - 2x speedup with no quality loss
Choose platform based on workload:
- Modal: Serverless, short jobs, variable workloads
- Lambda: Long-running, dedicated, multi-GPU
- RunPod: Per-minute billing flexibility
Batch inference when possible - Up to 10x cost reduction
Apply for startup credits - Modal offers $50K free
Monitor actual costs - Compare estimates to actuals, optimize
Use smallest viable GPU - T4 often sufficient with PEFT

Supported Platforms: Modal, Lambda Labs, RunPod GPU Types: T4, L4, A10, A100 (40GB/80GB), H100, H200, B200 Output Format: JSON cost breakdowns and markdown reports Version: 1.0.0

About

SKILL.md

About

Cost estimation scripts and tools for calculating GPU hours, training costs, and inference pricing across Modal, Lambda Labs, and RunPod platforms...

SKILL.md

ML Training Cost Calculator

Purpose: Provide production-ready cost estimation tools for ML training and inference across cloud GPU platforms (Modal, Lambda Labs, RunPod).

Activation Triggers:

Estimating training costs for ML models
Comparing GPU platform pricing
Calculating GPU hours for training jobs
Budgeting for ML projects
Optimizing inference costs
Evaluating cost-effectiveness of different GPU types
Planning resource allocation

Key Resources:

scripts/estimate-training-cost.sh - Calculate training costs based on model size, data, GPU type
scripts/estimate-inference-cost.sh - Estimate inference costs for production workloads
scripts/calculate-gpu-hours.sh - Convert training parameters to GPU hours
scripts/compare-platforms.sh - Compare costs across Modal, Lambda, RunPod
templates/cost-breakdown.json - Structured cost breakdown template
templates/platform-pricing.yaml - Up-to-date platform pricing data
examples/training-cost-estimate.md - Example training cost calculation
examples/inference-cost-estimate.md - Example inference cost analysis

Platform Pricing Overview

Modal (Serverless - Pay Per Second)

GPU Options:

T4: $0.000164/sec ($0.59/hr) - Development, small models
L4: $0.000222/sec ($0.80/hr) - Cost-effective training
A10: $0.000306/sec ($1.10/hr) - Mid-range training
A100 40GB: $0.000583/sec ($2.10/hr) - Large model training
A100 80GB: $0.000694/sec ($2.50/hr) - Very large models
H100: $0.001097/sec ($3.95/hr) - Cutting-edge training
H200: $0.001261/sec ($4.54/hr) - Latest generation
B200: $0.001736/sec ($6.25/hr) - Maximum performance

Free Credits:

Starter: $30/month free
Startup credits: Up to $50,000 FREE

Lambda Labs (On-Demand Hourly)

Single GPU:

1x A10: $0.31/hr - Cheapest single GPU option
1x V100 16GB: $0.55/hr - Most affordable multi-GPU base

8x GPU Clusters:

8x V100: $4.40/hr ($0.55/GPU) - Most affordable multi-GPU
8x A100 40GB: $10.32/hr ($1.29/GPU)
8x A100 80GB: $14.32/hr ($1.79/GPU)
8x H100: $23.92/hr ($2.99/GPU)

RunPod (Serverless - Pay Per Minute)

Key Features:

Pay-per-minute billing
FlashBoot <200ms cold-starts
Zero egress fees on storage
30+ GPU SKUs available

Cost Estimation Scripts

1. Estimate Training Cost

Script: scripts/estimate-training-cost.sh

Usage:

bash scripts/estimate-training-cost.sh \
  --model-size 7B \
  --dataset-size 10000 \
  --epochs 3 \
  --gpu t4 \
  --platform modal

Parameters:

--model-size: Model size (125M, 350M, 1B, 3B, 7B, 13B, 70B)
--dataset-size: Number of training samples
--epochs: Number of training epochs
--batch-size: Training batch size (default: auto-calculated)
--gpu: GPU type (t4, a10, a100-40gb, a100-80gb, h100)
--platform: Cloud platform (modal, lambda, runpod)
--peft: Use PEFT/LoRA (yes/no, default: no)
--mixed-precision: Use FP16/BF16 (yes/no, default: yes)

Output:

{
  "model": "7B",
  "dataset_size": 10000,
  "epochs": 3,
  "gpu": "T4",
  "platform": "Modal",
  "estimated_hours": 4.2,
  "cost_breakdown": {
    "compute_cost": 2.48,
    "storage_cost": 0.05,
    "total_cost": 2.53
  },
  "cost_optimizations": {
    "with_peft": 1.26,
    "savings_percentage": 50
  },
  "alternative_platforms": {
    "lambda_a10": 1.30,
    "runpod_t4": 2.40
  }
}

Calculation Methodology:

Estimates tokens per sample (avg 500 tokens)
Calculates total training tokens
Applies throughput rates per GPU type
Accounts for PEFT (90% memory reduction)
Accounts for mixed precision (2x speedup)

2. Estimate Inference Cost

Script: scripts/estimate-inference-cost.sh

Usage:

bash scripts/estimate-inference-cost.sh \
  --requests-per-day 1000 \
  --avg-latency 2 \
  --gpu t4 \
  --platform modal \
  --deployment serverless

Parameters:

--requests-per-day: Expected daily requests
--avg-latency: Average inference time (seconds)
--gpu: GPU type
--platform: Cloud platform
--deployment: Deployment type (serverless, dedicated)
--batch-inference: Batch requests (yes/no, default: no)

Output:

{
  "requests_per_day": 1000,
  "requests_per_month": 30000,
  "avg_latency_sec": 2,
  "gpu": "T4",
  "platform": "Modal Serverless",
  "cost_breakdown": {
    "daily_compute_seconds": 2000,
    "daily_cost": 0.33,
    "monthly_cost": 9.90,
    "cost_per_request": 0.00033
  },
  "scaling_analysis": {
    "requests_10k_day": 99.00,
    "requests_100k_day": 990.00
  },
  "dedicated_alternative": {
    "monthly_cost": 442.50,
    "break_even_requests_day": 4500
  }
}

3. Calculate GPU Hours

Script: scripts/calculate-gpu-hours.sh

Usage:

bash scripts/calculate-gpu-hours.sh \
  --model-params 7B \
  --tokens-total 30M \
  --gpu a100-40gb

Parameters:

--model-params: Model parameters (125M, 350M, 1B, 3B, 7B, 13B, 70B)
--tokens-total: Total training tokens
--gpu: GPU type
--peft: Use PEFT (yes/no)
--multi-gpu: Number of GPUs (default: 1)

GPU Throughput Benchmarks:

T4 (16GB):
  - 7B full fine-tune: 150 tokens/sec
  - 7B with PEFT: 600 tokens/sec

A100 40GB:
  - 7B full fine-tune: 800 tokens/sec
  - 7B with PEFT: 3200 tokens/sec
  - 13B with PEFT: 1600 tokens/sec

A100 80GB:
  - 13B full fine-tune: 600 tokens/sec
  - 70B with PEFT: 400 tokens/sec

H100:
  - 70B with PEFT: 1200 tokens/sec

4. Compare Platforms

Script: scripts/compare-platforms.sh

Usage:

bash scripts/compare-platforms.sh \
  --training-hours 4 \
  --gpu-type a100-40gb

Output:

# Platform Cost Comparison

## Training Job: 4 hours on A100 40GB

| Platform | GPU Cost | Egress Fees | Total | Notes |
|----------|----------|-------------|-------|-------|
| Modal | $8.40 | $0.00 | $8.40 | Serverless, pay-per-second |
| Lambda | $5.16 | $0.00 | $5.16 | Cheapest for dedicated |
| RunPod | $8.00 | $0.00 | $8.00 | Pay-per-minute |

## Winner: Lambda Labs ($5.16)

Savings: $3.24 (38.6% vs Modal)

Recommendation: Use Lambda for long-running dedicated training, Modal for
serverless/bursty workloads.

Cost Templates

Cost Breakdown Template

Template: templates/cost-breakdown.json

{
  "project_name": "ML Training Project",
  "cost_estimate": {
    "training": {
      "model_size": "7B",
      "training_runs": 4,
      "hours_per_run": 4.2,
      "gpu_type": "T4",
      "platform": "Modal",
      "cost_per_run": 2.48,
      "total_training_cost": 9.92
    },
    "inference": {
      "deployment_type": "serverless",
      "expected_requests_month": 30000,
      "gpu_type": "T4",
      "platform": "Modal",
      "monthly_cost": 9.90
    },
    "storage": {
      "model_artifacts_gb": 14,
      "dataset_storage_gb": 5,
      "monthly_storage_cost": 0.50
    },
    "total_monthly_cost": 20.32,
    "breakdown_percentage": {
      "training": 48.8,
      "inference": 48.7,
      "storage": 2.5
    }
  },
  "cost_optimizations_applied": {
    "peft_lora": "50% training cost reduction",
    "mixed_precision": "2x faster training",
    "serverless_inference": "Pay only for actual usage",
    "batch_inference": "Up to 10x reduction in inference cost"
  },
  "potential_savings": {
    "without_optimizations": 45.00,
    "with_optimizations": 20.32,
    "total_savings": 24.68,
    "savings_percentage": 54.8
  }
}

Platform Pricing Data

Template: templates/platform-pricing.yaml

platforms:
  modal:
    billing: per-second
    free_credits: 30  # USD per month
    startup_credits: 50000  # USD for eligible startups
    gpus:
      t4:
        price_per_sec: 0.000164
        price_per_hour: 0.59
        vram_gb: 16
      a100_40gb:
        price_per_sec: 0.000583
        price_per_hour: 2.10
        vram_gb: 40
      h100:
        price_per_sec: 0.001097
        price_per_hour: 3.95
        vram_gb: 80

  lambda:
    billing: per-hour
    free_credits: 0
    minimum_billing: 1-hour
    gpus:
      a10_1x:
        price_per_hour: 0.31
        vram_gb: 24
      a100_40gb_1x:
        price_per_hour: 1.29
        vram_gb: 40
      a100_40gb_8x:
        price_per_hour: 10.32
        total_vram_gb: 320

  runpod:
    billing: per-minute
    free_credits: 0
    features:
      - zero_egress_fees
      - flashboot_200ms
    gpus:
      t4:
        price_per_hour: 0.60  # Approximate
        vram_gb: 16

Cost Estimation Examples

Example 1: Training 7B Model

File: examples/training-cost-estimate.md

Scenario:

Model: Llama 2 7B fine-tuning
Dataset: 10,000 samples (5M tokens)
Epochs: 3
Total tokens: 15M
Method: LoRA/PEFT

Cost Calculation:

bash scripts/estimate-training-cost.sh \
  --model-size 7B \
  --dataset-size 10000 \
  --epochs 3 \
  --gpu t4 \
  --platform modal \
  --peft yes

Results:

Training Time: 4.2 hours
Modal T4 Cost: $2.48
Alternative (Lambda A10): $1.30 (47% cheaper)

Optimization Impact:
- Without PEFT: $12.40 (5x more expensive)
- With PEFT: $2.48
- Savings: $9.92 (80%)

Recommendation: Use Lambda A10 for cheapest option, or Modal T4 for serverless convenience.

Example 2: Production Inference

File: examples/inference-cost-estimate.md

Scenario:

Model: Custom 7B classifier
Expected traffic: 1,000 requests/day
Avg latency: 2 seconds per request
Growth: 10x in 6 months

Cost Calculation:

bash scripts/estimate-inference-cost.sh \
  --requests-per-day 1000 \
  --avg-latency 2 \
  --gpu t4 \
  --platform modal \
  --deployment serverless

Current (1K requests/day):

Serverless Modal T4:
- Daily cost: $0.33
- Monthly cost: $9.90
- Cost per request: $0.00033

Dedicated Lambda A10:
- Monthly cost: $223 (24/7 instance)
- Break-even: 2,250 requests/day
- Not recommended for current traffic

After Growth (10K requests/day):

Serverless Modal T4:
- Monthly cost: $99.00
- Still cost-effective

Dedicated Lambda A10:
- Monthly cost: $223
- Break-even reached at 2,250 requests/day
- Recommendation: Stay serverless until 10K+ daily

Cost Optimization Strategies

1. Use PEFT/LoRA

Savings: 50-90% training cost reduction

# Calculate savings
bash scripts/estimate-training-cost.sh --model-size 7B --peft no
# Cost: $12.40

bash scripts/estimate-training-cost.sh --model-size 7B --peft yes
# Cost: $2.48
# Savings: $9.92 (80%)

2. Mixed Precision Training

Savings: 2x faster training, 50% cost reduction

Automatically enabled in cost estimations with --mixed-precision yes

3. Platform Selection

Use Case Guidelines:

# Short jobs (<1 hour): Modal serverless
bash scripts/compare-platforms.sh --training-hours 0.5 --gpu-type t4
# Winner: Modal ($0.30 vs Lambda $0.31 minimum)

# Long jobs (4+ hours): Lambda dedicated
bash scripts/compare-platforms.sh --training-hours 4 --gpu-type a100-40gb
# Winner: Lambda ($5.16 vs Modal $8.40)

# Variable workloads: Modal serverless
# Pay only for actual usage, no idle cost

4. Batch Inference

Savings: Up to 10x reduction in inference cost

# Single inference
bash scripts/estimate-inference-cost.sh \
  --requests-per-day 1000 \
  --avg-latency 2 \
  --batch-inference no
# Cost: $9.90/month

# Batch inference (10 requests per batch)
bash scripts/estimate-inference-cost.sh \
  --requests-per-day 1000 \
  --avg-latency 0.3 \
  --batch-inference yes
# Cost: $1.49/month
# Savings: $8.41 (85%)

Quick Reference: Cost Per Use Case

Small Model Training (< 1B params)

Best GPU: T4
Best Platform: Modal (serverless)
Typical Cost: $0.50-$2.00 per run
Time: 30 min - 2 hours

Medium Model Training (1B-7B params)

Best GPU: T4 (with PEFT) or A100 40GB
Best Platform: Lambda A10 (cheapest) or Modal T4 (convenience)
Typical Cost: $1.00-$8.00 per run
Time: 2-8 hours

Large Model Training (7B-70B params)

Best GPU: A100 80GB or H100 (with PEFT)
Best Platform: Lambda (dedicated) or Modal (serverless)
Typical Cost: $10-$100 per run
Time: 8-48 hours

Low-Traffic Inference (<1K requests/day)

Best Deployment: Modal serverless
Best GPU: T4
Typical Cost: $5-$15/month

High-Traffic Inference (>10K requests/day)

Best Deployment: Dedicated or batch serverless
Best GPU: A10 or A100
Typical Cost: $100-$500/month

Dependencies

Required for scripts:

# Bash 4.0+ (for associative arrays)
bash --version

# jq (for JSON processing)
sudo apt-get install jq

# bc (for floating-point calculations)
sudo apt-get install bc

# yq (for YAML processing)
pip install yq

Best Practices Summary

Always estimate before training - Use cost scripts to avoid surprises
Use PEFT for large models - 50-90% cost savings
Enable mixed precision - 2x speedup with no quality loss
Choose platform based on workload:
- Modal: Serverless, short jobs, variable workloads
- Lambda: Long-running, dedicated, multi-GPU
- RunPod: Per-minute billing flexibility
Batch inference when possible - Up to 10x cost reduction
Apply for startup credits - Modal offers $50K free
Monitor actual costs - Compare estimates to actuals, optimize
Use smallest viable GPU - T4 often sufficient with PEFT

Supported Platforms: Modal, Lambda Labs, RunPod GPU Types: T4, L4, A10, A100 (40GB/80GB), H100, H200, B200 Output Format: JSON cost breakdowns and markdown reports Version: 1.0.0