Rollback failed deployments, restore previous versions, and handle deployment emergencies. Use when deployments fail, bugs are discovered in production, or emergency recovery is needed.
This skill helps you safely rollback failed deployments and restore previous working versions.
Development → Staging → Production
↓ ↓ ↓
Rollback Rollback Rollback
Rollback Points:
# View deployment history
sst version list
# Output:
# Version Stage Deployed
# v1.2.0 production 2024-01-15 10:00:00
# v1.1.0 production 2024-01-10 09:30:00
# v1.0.0 production 2024-01-05 08:00:00
# Rollback to previous version
sst deploy --stage production --to v1.1.0
# Or rollback to specific git commit
git checkout v1.1.0
sst deploy --stage production
# Rollback API only
sst deploy api --stage production --to v1.1.0
# Rollback Web only
sst deploy web --stage production --to v1.1.0
# Rollback infrastructure only
sst deploy --stage production --only infra --to v1.1.0
# Check current migration status
pnpm -F @sgcarstrends/database db:status
# Rollback last migration
pnpm -F @sgcarstrends/database db:rollback
# Rollback to specific migration
pnpm -F @sgcarstrends/database db:rollback --to 20240115_initial
# Rollback multiple migrations
pnpm -F @sgcarstrends/database db:rollback --step 3
# Create backup before deployment
pg_dump $DATABASE_URL > backup-$(date +%Y%m%d-%H%M%S).sql
# Restore from backup
psql $DATABASE_URL < backup-20240115-100000.sql
# Or use automated backup
# Restore from RDS snapshot (AWS)
aws rds restore-db-instance-from-snapshot \
--db-instance-identifier sgcarstrends-restored \
--db-snapshot-identifier sgcarstrends-snapshot-20240115
# List Lambda versions
aws lambda list-versions-by-function \
--function-name sgcarstrends-api-prod
# Update alias to previous version
aws lambda update-alias \
--function-name sgcarstrends-api-prod \
--name production \
--function-version 42 # Previous working version
# Verify rollback
aws lambda get-alias \
--function-name sgcarstrends-api-prod \
--name production
# Get previous configuration
aws lambda get-function-configuration \
--function-name sgcarstrends-api-prod \
--qualifier 42 # Previous version
# Update environment variables
aws lambda update-function-configuration \
--function-name sgcarstrends-api-prod \
--environment Variables="{KEY1=value1,KEY2=value2}"
# If deployed with SST
sst deploy web --stage production --to v1.1.0
# If using custom deployment
# Redeploy previous version
git checkout v1.1.0
pnpm -F @sgcarstrends/web build
pnpm -F @sgcarstrends/web deploy:prod
# Or point CloudFront to previous S3 deployment
aws cloudfront create-invalidation \
--distribution-id E1234567890ABC \
--paths "/*"
# Find deployment commit
git log --oneline
# Revert specific commit
git revert <commit-hash>
# Or revert multiple commits
git revert <commit1>..<commit2>
# Push revert
git push origin main
# CI automatically deploys reverted version
# Create rollback branch
git checkout -b rollback/v1.1.0
# Reset to previous version
git reset --hard v1.1.0
# Force push (use with caution)
git push origin rollback/v1.1.0 --force
# Create PR to merge rollback
gh pr create --title "Rollback to v1.1.0" --body "Emergency rollback"
# .github/workflows/deploy-with-rollback.yml
name: Deploy with Rollback
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get previous version
id: prev
run: |
PREV_TAG=$(git describe --tags --abbrev=0 HEAD^)
echo "tag=$PREV_TAG" >> $GITHUB_OUTPUT
- name: Deploy
id: deploy
run: pnpm deploy:prod
- name: Health check
id: health
run: |
sleep 30 # Wait for deployment
STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://api.sgcarstrends.com/health)
if [ $STATUS -ne 200 ]; then
echo "Health check failed: $STATUS"
exit 1
fi
- name: Smoke tests
if: success()
run: pnpm test:e2e:prod
- name: Rollback on failure
if: failure()
run: |
echo "Deployment failed, rolling back to ${{ steps.prev.outputs.tag }}"
git checkout ${{ steps.prev.outputs.tag }}
pnpm deploy:prod
- name: Notify on rollback
if: failure()
uses: slackapi/slack-github-action@v1
with:
webhook-url: ${{ secrets.SLACK_WEBHOOK_URL }}
payload: |
{
"text": "🚨 Deployment failed and was rolled back to ${{ steps.prev.outputs.tag }}"
}
# 1. Assess impact
# Check error rates, user reports
# 2. Quick rollback via SST
sst deploy --stage production --to v1.1.0
# 3. Verify rollback
curl https://api.sgcarstrends.com/health
# 4. Clear CDN cache
aws cloudfront create-invalidation \
--distribution-id E1234567890ABC \
--paths "/*"
# 5. Monitor
# Check logs, metrics, error rates
# 6. Communicate
# Update status page, notify users
# 1. Stop application (prevent data corruption)
# Scale down or put in maintenance mode
# 2. Rollback migration
pnpm -F @sgcarstrends/database db:rollback
# 3. Restore from backup if needed
psql $DATABASE_URL < backup-latest.sql
# 4. Verify database state
pnpm -F @sgcarstrends/database db:status
# 5. Rollback application code
git checkout v1.1.0
pnpm deploy:prod
# 6. Resume application
# Remove maintenance mode
# 1. Check metrics
# Response times, CPU, memory usage
# 2. Quick rollback
sst deploy --stage production --to v1.1.0
# 3. Clear caches
redis-cli FLUSHALL
aws cloudfront create-invalidation --distribution-id E123 --paths "/*"
# 4. Monitor performance
# Check if performance restored
# 5. Investigate
# Profile code, check database queries
# Keep web app, rollback API only
# 1. Rollback API
sst deploy api --stage production --to v1.1.0
# 2. Verify API health
curl https://api.sgcarstrends.com/health
# 3. Test web app still works
# Check web app functionality
# 4. Monitor for errors
# Watch for API compatibility issues
# If using load balancer with multiple instances
# 1. Deploy old version to 50% of instances
# Update 1 instance at a time
# 2. Monitor metrics
# Check error rates on rolled-back instances
# 3. Gradually increase rollback
# Update more instances if stable
# 4. Complete rollback
# Once verified, update all instances
# Switch traffic back to blue environment
# 1. Update load balancer
aws elbv2 modify-listener \
--listener-arn arn:aws:... \
--default-actions TargetGroupArn=arn:aws:...-blue
# 2. Wait for traffic to shift
sleep 60
# 3. Verify metrics
# Check error rates on blue environment
# 4. Keep green for investigation
# Don't destroy immediately
# Redis cache
redis-cli -h $REDIS_HOST -p $REDIS_PORT FLUSHALL
# Or selective flush
redis-cli -h $REDIS_HOST -p $REDIS_PORT --scan --pattern "cache:*" | xargs redis-cli DEL
# Upstash Redis (via API)
curl -X POST https://your-redis.upstash.io/flushall \
-H "Authorization: Bearer $UPSTASH_TOKEN"
# CloudFront invalidation
aws cloudfront create-invalidation \
--distribution-id E1234567890ABC \
--paths "/*"
# Wait for invalidation
aws cloudfront wait invalidation-completed \
--distribution-id E1234567890ABC \
--id I2J3K4L5M6N7O8P9
# API health
curl -f https://api.sgcarstrends.com/health || echo "API unhealthy"
# Web app health
curl -f https://sgcarstrends.com || echo "Web unhealthy"
# Database connectivity
psql $DATABASE_URL -c "SELECT 1" || echo "Database unreachable"
# Redis connectivity
redis-cli -h $REDIS_HOST ping || echo "Redis unreachable"
# Check CloudWatch metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/Lambda \
--metric-name Errors \
--dimensions Name=FunctionName,Value=sgcarstrends-api-prod \
--start-time $(date -u -d '5 minutes ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 60 \
--statistics Sum
# Check logs for errors
aws logs filter-log-events \
--log-group-name /aws/lambda/sgcarstrends-api-prod \
--start-time $(($(date +%s) - 300))000 \
--filter-pattern "ERROR"
# Status Page Template
## Incident: Deployment Rollback in Progress
**Status:** Investigating
**Started:** 2024-01-15 10:00 UTC
**Services Affected:** API, Web Application
### Timeline
**10:00 UTC** - Deployment completed
**10:05 UTC** - Increased error rates detected
**10:10 UTC** - Rollback initiated
**10:15 UTC** - Rollback completed
**10:20 UTC** - Services restored
### Impact
Some users may have experienced errors during the rollback.
### Next Steps
We're investigating the root cause and will provide updates.
# Slack notification
curl -X POST $SLACK_WEBHOOK_URL \
-H 'Content-Type: application/json' \
-d '{
"text": "🚨 Rollback in progress",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Deployment Rollback*\nRolling back from v1.2.0 to v1.1.0\nReason: Critical bug affecting user login"
}
}
]
}'
# ✅ Create backups before deployment
pg_dump $DATABASE_URL > backup-pre-deploy-$(date +%Y%m%d-%H%M%S).sql
# Store in S3
aws s3 cp backup.sql s3://sgcarstrends-backups/$(date +%Y%m%d)/
# ✅ Practice rollback in staging
sst deploy --stage staging --to v1.0.0
# Verify functionality
pnpm test:e2e:staging
// ✅ Enable gradual rollout and quick disable
const ENABLE_NEW_FEATURE = process.env.ENABLE_NEW_FEATURE === "true";
if (ENABLE_NEW_FEATURE) {
// New feature code
} else {
// Old feature code
}
// Disable feature without rollback
// Set ENABLE_NEW_FEATURE=false
# ✅ Set up alerts for key metrics
# - Error rate
# - Response time
# - CPU/Memory usage
# - Database connections
# Issue: Rollback command fails
# Solution: Manual intervention
# 1. Check current state
sst version list
# 2. Force redeploy previous version
git checkout v1.1.0
pnpm install
pnpm build
pnpm deploy:prod --force
# 3. Verify deployment
curl https://api.sgcarstrends.com/health
# Issue: Code rolled back but database not
# Solution: Rollback database
# 1. Rollback migrations
pnpm -F @sgcarstrends/database db:rollback
# 2. Or restore backup
psql $DATABASE_URL < backup-pre-deploy.sql
# 3. Verify schema version
pnpm -F @sgcarstrends/database db:status
.github/workflows/ - Deployment workflows