Cost estimation scripts and tools for calculating GPU hours, training costs, and inference pricing across Modal, Lambda Labs, and RunPod platforms...
Purpose: Provide production-ready cost estimation tools for ML training and inference across cloud GPU platforms (Modal, Lambda Labs, RunPod).
Activation Triggers:
Key Resources:
scripts/estimate-training-cost.sh - Calculate training costs based on model size, data, GPU typescripts/estimate-inference-cost.sh - Estimate inference costs for production workloadsscripts/calculate-gpu-hours.sh - Convert training parameters to GPU hoursscripts/compare-platforms.sh - Compare costs across Modal, Lambda, RunPodtemplates/cost-breakdown.json - Structured cost breakdown templatetemplates/platform-pricing.yaml - Up-to-date platform pricing dataexamples/training-cost-estimate.md - Example training cost calculationexamples/inference-cost-estimate.md - Example inference cost analysisGPU Options:
Free Credits:
Single GPU:
8x GPU Clusters:
Key Features:
Script: scripts/estimate-training-cost.sh
Usage:
bash scripts/estimate-training-cost.sh \
--model-size 7B \
--dataset-size 10000 \
--epochs 3 \
--gpu t4 \
--platform modal
Parameters:
--model-size: Model size (125M, 350M, 1B, 3B, 7B, 13B, 70B)--dataset-size: Number of training samples--epochs: Number of training epochs--batch-size: Training batch size (default: auto-calculated)--gpu: GPU type (t4, a10, a100-40gb, a100-80gb, h100)--platform: Cloud platform (modal, lambda, runpod)--peft: Use PEFT/LoRA (yes/no, default: no)--mixed-precision: Use FP16/BF16 (yes/no, default: yes)Output:
{
"model": "7B",
"dataset_size": 10000,
"epochs": 3,
"gpu": "T4",
"platform": "Modal",
"estimated_hours": 4.2,
"cost_breakdown": {
"compute_cost": 2.48,
"storage_cost": 0.05,
"total_cost": 2.53
},
"cost_optimizations": {
"with_peft": 1.26,
"savings_percentage": 50
},
"alternative_platforms": {
"lambda_a10": 1.30,
"runpod_t4": 2.40
}
}
Calculation Methodology:
Script: scripts/estimate-inference-cost.sh
Usage:
bash scripts/estimate-inference-cost.sh \
--requests-per-day 1000 \
--avg-latency 2 \
--gpu t4 \
--platform modal \
--deployment serverless
Parameters:
--requests-per-day: Expected daily requests--avg-latency: Average inference time (seconds)--gpu: GPU type--platform: Cloud platform--deployment: Deployment type (serverless, dedicated)--batch-inference: Batch requests (yes/no, default: no)Output:
{
"requests_per_day": 1000,
"requests_per_month": 30000,
"avg_latency_sec": 2,
"gpu": "T4",
"platform": "Modal Serverless",
"cost_breakdown": {
"daily_compute_seconds": 2000,
"daily_cost": 0.33,
"monthly_cost": 9.90,
"cost_per_request": 0.00033
},
"scaling_analysis": {
"requests_10k_day": 99.00,
"requests_100k_day": 990.00
},
"dedicated_alternative": {
"monthly_cost": 442.50,
"break_even_requests_day": 4500
}
}
Script: scripts/calculate-gpu-hours.sh
Usage:
bash scripts/calculate-gpu-hours.sh \
--model-params 7B \
--tokens-total 30M \
--gpu a100-40gb
Parameters:
--model-params: Model parameters (125M, 350M, 1B, 3B, 7B, 13B, 70B)--tokens-total: Total training tokens--gpu: GPU type--peft: Use PEFT (yes/no)--multi-gpu: Number of GPUs (default: 1)GPU Throughput Benchmarks:
T4 (16GB):
- 7B full fine-tune: 150 tokens/sec
- 7B with PEFT: 600 tokens/sec
A100 40GB:
- 7B full fine-tune: 800 tokens/sec
- 7B with PEFT: 3200 tokens/sec
- 13B with PEFT: 1600 tokens/sec
A100 80GB:
- 13B full fine-tune: 600 tokens/sec
- 70B with PEFT: 400 tokens/sec
H100:
- 70B with PEFT: 1200 tokens/sec
Script: scripts/compare-platforms.sh
Usage:
bash scripts/compare-platforms.sh \
--training-hours 4 \
--gpu-type a100-40gb
Output:
# Platform Cost Comparison
## Training Job: 4 hours on A100 40GB
| Platform | GPU Cost | Egress Fees | Total | Notes |
|----------|----------|-------------|-------|-------|
| Modal | $8.40 | $0.00 | $8.40 | Serverless, pay-per-second |
| Lambda | $5.16 | $0.00 | $5.16 | Cheapest for dedicated |
| RunPod | $8.00 | $0.00 | $8.00 | Pay-per-minute |
## Winner: Lambda Labs ($5.16)
Savings: $3.24 (38.6% vs Modal)
Recommendation: Use Lambda for long-running dedicated training, Modal for
serverless/bursty workloads.
Template: templates/cost-breakdown.json
{
"project_name": "ML Training Project",
"cost_estimate": {
"training": {
"model_size": "7B",
"training_runs": 4,
"hours_per_run": 4.2,
"gpu_type": "T4",
"platform": "Modal",
"cost_per_run": 2.48,
"total_training_cost": 9.92
},
"inference": {
"deployment_type": "serverless",
"expected_requests_month": 30000,
"gpu_type": "T4",
"platform": "Modal",
"monthly_cost": 9.90
},
"storage": {
"model_artifacts_gb": 14,
"dataset_storage_gb": 5,
"monthly_storage_cost": 0.50
},
"total_monthly_cost": 20.32,
"breakdown_percentage": {
"training": 48.8,
"inference": 48.7,
"storage": 2.5
}
},
"cost_optimizations_applied": {
"peft_lora": "50% training cost reduction",
"mixed_precision": "2x faster training",
"serverless_inference": "Pay only for actual usage",
"batch_inference": "Up to 10x reduction in inference cost"
},
"potential_savings": {
"without_optimizations": 45.00,
"with_optimizations": 20.32,
"total_savings": 24.68,
"savings_percentage": 54.8
}
}
Template: templates/platform-pricing.yaml
platforms:
modal:
billing: per-second
free_credits: 30 # USD per month
startup_credits: 50000 # USD for eligible startups
gpus:
t4:
price_per_sec: 0.000164
price_per_hour: 0.59
vram_gb: 16
a100_40gb:
price_per_sec: 0.000583
price_per_hour: 2.10
vram_gb: 40
h100:
price_per_sec: 0.001097
price_per_hour: 3.95
vram_gb: 80
lambda:
billing: per-hour
free_credits: 0
minimum_billing: 1-hour
gpus:
a10_1x:
price_per_hour: 0.31
vram_gb: 24
a100_40gb_1x:
price_per_hour: 1.29
vram_gb: 40
a100_40gb_8x:
price_per_hour: 10.32
total_vram_gb: 320
runpod:
billing: per-minute
free_credits: 0
features:
- zero_egress_fees
- flashboot_200ms
gpus:
t4:
price_per_hour: 0.60 # Approximate
vram_gb: 16
File: examples/training-cost-estimate.md
Scenario:
Cost Calculation:
bash scripts/estimate-training-cost.sh \
--model-size 7B \
--dataset-size 10000 \
--epochs 3 \
--gpu t4 \
--platform modal \
--peft yes
Results:
Training Time: 4.2 hours
Modal T4 Cost: $2.48
Alternative (Lambda A10): $1.30 (47% cheaper)
Optimization Impact:
- Without PEFT: $12.40 (5x more expensive)
- With PEFT: $2.48
- Savings: $9.92 (80%)
Recommendation: Use Lambda A10 for cheapest option, or Modal T4 for serverless convenience.
File: examples/inference-cost-estimate.md
Scenario:
Cost Calculation:
bash scripts/estimate-inference-cost.sh \
--requests-per-day 1000 \
--avg-latency 2 \
--gpu t4 \
--platform modal \
--deployment serverless
Current (1K requests/day):
Serverless Modal T4:
- Daily cost: $0.33
- Monthly cost: $9.90
- Cost per request: $0.00033
Dedicated Lambda A10:
- Monthly cost: $223 (24/7 instance)
- Break-even: 2,250 requests/day
- Not recommended for current traffic
After Growth (10K requests/day):
Serverless Modal T4:
- Monthly cost: $99.00
- Still cost-effective
Dedicated Lambda A10:
- Monthly cost: $223
- Break-even reached at 2,250 requests/day
- Recommendation: Stay serverless until 10K+ daily
Savings: 50-90% training cost reduction
# Calculate savings
bash scripts/estimate-training-cost.sh --model-size 7B --peft no
# Cost: $12.40
bash scripts/estimate-training-cost.sh --model-size 7B --peft yes
# Cost: $2.48
# Savings: $9.92 (80%)
Savings: 2x faster training, 50% cost reduction
Automatically enabled in cost estimations with --mixed-precision yes
Use Case Guidelines:
# Short jobs (<1 hour): Modal serverless
bash scripts/compare-platforms.sh --training-hours 0.5 --gpu-type t4
# Winner: Modal ($0.30 vs Lambda $0.31 minimum)
# Long jobs (4+ hours): Lambda dedicated
bash scripts/compare-platforms.sh --training-hours 4 --gpu-type a100-40gb
# Winner: Lambda ($5.16 vs Modal $8.40)
# Variable workloads: Modal serverless
# Pay only for actual usage, no idle cost
Savings: Up to 10x reduction in inference cost
# Single inference
bash scripts/estimate-inference-cost.sh \
--requests-per-day 1000 \
--avg-latency 2 \
--batch-inference no
# Cost: $9.90/month
# Batch inference (10 requests per batch)
bash scripts/estimate-inference-cost.sh \
--requests-per-day 1000 \
--avg-latency 0.3 \
--batch-inference yes
# Cost: $1.49/month
# Savings: $8.41 (85%)
Required for scripts:
# Bash 4.0+ (for associative arrays)
bash --version
# jq (for JSON processing)
sudo apt-get install jq
# bc (for floating-point calculations)
sudo apt-get install bc
# yq (for YAML processing)
pip install yq
Supported Platforms: Modal, Lambda Labs, RunPod GPU Types: T4, L4, A10, A100 (40GB/80GB), H100, H200, B200 Output Format: JSON cost breakdowns and markdown reports Version: 1.0.0