Production Checklist
Essential steps for deploying Cerberus IAM to production.
Pre-Deployment Checklist
Security
- [ ] Set
NODE_ENV=production - [ ] Generate strong
SECRET_ENCRYPTION_KEY(32 random bytes, base64-encoded) - [ ] Use
SESSION_COOKIE_SECURE=true(HTTPS only) - [ ] Configure CORS with specific origins (no wildcards)
- [ ] Set appropriate rate limits for your traffic
- [ ] Enable MFA for admin accounts
- [ ] Review and configure password policies
- [ ] Set up SSL/TLS certificates
- [ ] Configure security headers (Helmet already included)
- [ ] Disable SMTP auth for unauthorized servers
Database
- [ ] Use managed PostgreSQL service (RDS, Cloud SQL, etc.)
- [ ] Enable automated backups
- [ ] Configure connection pooling
- [ ] Set up read replicas (if needed)
- [ ] Enable SSL/TLS for database connections
- [ ] Run migrations before deployment
- [ ] Test rollback procedures
- [ ] Monitor connection pool usage
Application
- [ ] Build production bundle (
npm run build) - [ ] Remove development dependencies
- [ ] Set appropriate log level (
infoorwarn) - [ ] Configure remote log export (SIEM)
- [ ] Set up health check monitoring
- [ ] Configure graceful shutdown
- [ ] Test error handling
- [ ] Verify email delivery
Infrastructure
- [ ] Set up load balancer
- [ ] Configure auto-scaling (if applicable)
- [ ] Set up CDN for static assets
- [ ] Configure DNS records
- [ ] Set up monitoring and alerting
- [ ] Create runbooks for common issues
- [ ] Document disaster recovery procedures
- [ ] Test backup restoration
Compliance
- [ ] Review audit logging configuration
- [ ] Set up log retention policies
- [ ] Configure data retention schedules
- [ ] Document GDPR compliance measures
- [ ] Review and sign BAA (if handling PHI)
- [ ] Conduct security audit
- [ ] Perform penetration testing
Environment Configuration
Required Variables
env
NODE_ENV=production
PORT=4000
ISSUER_URL=https://auth.yourdomain.com
DATABASE_URL=postgresql://user:[email protected]:5432/cerberus_iam
SECRET_ENCRYPTION_KEY=<base64-encoded-32-bytes>
JWT_ALG=EdDSA
JWKS_ROTATE_DAYS=30
SESSION_COOKIE_SECURE=true
SESSION_COOKIE_DOMAIN=.yourdomain.com
ADMIN_WEB_ORIGIN=https://admin.yourdomain.com
[email protected]
SMTP_HOST=smtp.sendgrid.net
SMTP_PORT=587
SMTP_USER=apikey
SMTP_PASS=<sendgrid-api-key>
LOG_LEVEL=info
LOG_REMOTE_URL=https://logs.yourdomain.com/ingest
LOG_REMOTE_API_KEY=<your-api-key>
RATE_MAX=100
AUTH_RATE_MAX=20
TOKEN_RATE_MAX=20Secrets Management
Use a secrets manager:
AWS Secrets Manager:
bash
aws secretsmanager create-secret \
--name cerberus-iam/production \
--secret-string file://.env.productionKubernetes Secrets:
bash
kubectl create secret generic cerberus-env \
--from-env-file=.env.productionHashiCorp Vault:
bash
vault kv put secret/cerberus-iam @.env.productionDeployment Strategies
Blue-Green Deployment
- Deploy new version (green) alongside current (blue)
- Run health checks on green
- Switch traffic to green
- Keep blue running for quick rollback
- After validation, decommission blue
Rolling Deployment
- Update instances one at a time
- Health check each instance before proceeding
- Automatic rollback on failure
Canary Deployment
- Deploy to small percentage of traffic (5-10%)
- Monitor metrics and errors
- Gradually increase traffic
- Full rollout or rollback based on metrics
Infrastructure Patterns
Load Balancer Configuration
AWS ALB:
yaml
TargetGroup:
HealthCheckPath: /health
HealthCheckInterval: 30
HealthCheckTimeout: 5
HealthyThresholdCount: 2
UnhealthyThresholdCount: 3
Matcher: 200
LoadBalancer:
Scheme: internet-facing
SecurityGroups: [sg-xxx]
Subnets: [subnet-xxx, subnet-yyy]NGINX:
nginx
upstream cerberus_iam {
least_conn;
server app1:4000 max_fails=3 fail_timeout=30s;
server app2:4000 max_fails=3 fail_timeout=30s;
server app3:4000 max_fails=3 fail_timeout=30s;
}
server {
listen 443 ssl http2;
server_name auth.yourdomain.com;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
location / {
proxy_pass http://cerberus_iam;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
location /health {
proxy_pass http://cerberus_iam/health;
access_log off;
}
}Kubernetes Deployment
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: cerberus-iam
namespace: production
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: cerberus-iam
template:
metadata:
labels:
app: cerberus-iam
spec:
containers:
- name: api
image: cerberus-iam:v1.0.0
ports:
- containerPort: 4000
envFrom:
- secretRef:
name: cerberus-env
resources:
requests:
memory: '512Mi'
cpu: '500m'
limits:
memory: '2Gi'
cpu: '2000m'
livenessProbe:
httpGet:
path: /health
port: 4000
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: 4000
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 2
---
apiVersion: v1
kind: Service
metadata:
name: cerberus-iam
namespace: production
spec:
selector:
app: cerberus-iam
ports:
- port: 80
targetPort: 4000
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: cerberus-iam
namespace: production
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/rate-limit: '100'
spec:
ingressClassName: nginx
tls:
- hosts:
- auth.yourdomain.com
secretName: cerberus-tls
rules:
- host: auth.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: cerberus-iam
port:
number: 80Database Setup
PostgreSQL Configuration
Recommended settings:
sql
-- Connection pooling
max_connections = 200
shared_buffers = 2GB
effective_cache_size = 6GB
work_mem = 10MB
-- Performance
random_page_cost = 1.1 -- For SSD
effective_io_concurrency = 200
-- Logging
log_min_duration_statement = 1000 -- Log slow queries (>1s)
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
-- Replication (if using read replicas)
wal_level = replica
max_wal_senders = 3Migrations
Run migrations before deployment:
bash
# Test migration (dry run)
npm run db:migrate -- --create-only
# Apply migration
npm run db:migrate
# Verify
npm run db:studioBackups
Automated backups:
bash
# PostgreSQL dump
pg_dump -U cerberus -h db.internal cerberus_iam > backup.sql
# Compressed
pg_dump -U cerberus -h db.internal cerberus_iam | gzip > backup.sql.gz
# Point-in-time recovery (PITR)
# Enable in PostgreSQL configurationMonitoring
Health Checks
Endpoint: GET /health
Response:
json
{
"status": "ok",
"timestamp": "2024-01-15T10:00:00.000Z"
}Load balancer configuration:
- Path:
/health - Interval: 30 seconds
- Timeout: 5 seconds
- Healthy threshold: 2
- Unhealthy threshold: 3
Metrics
Track key metrics:
- Request metrics: Rate, latency (p50, p95, p99), error rate
- Database metrics: Connection pool usage, query duration, deadlocks
- Session metrics: Active sessions, creation rate, expiration rate
- Token metrics: Token issuance rate, revocation rate, refresh rate
- Auth metrics: Login success/failure rate, MFA usage
- System metrics: CPU, memory, disk I/O, network
Alerting
Configure alerts for:
- Health check failures
- High error rate (>1%)
- High latency (p95 >1s)
- Database connection pool exhaustion
- Disk space <20%
- Memory usage >90%
- Failed login rate spike
- Token revocation spike (potential breach)
Security Hardening
Application Security
- Secrets: Never commit to version control
- Encryption: Use
SECRET_ENCRYPTION_KEYfor all secrets - HTTPS: Enforce SSL/TLS everywhere
- CORS: Whitelist specific origins only
- Rate Limiting: Adjust for your traffic patterns
- CSRF: Already enabled for session-based auth
- Headers: Helmet middleware already configured
Network Security
- Firewall: Restrict ingress to load balancer only
- VPC: Use private subnets for database
- Security Groups: Minimum required permissions
- WAF: Consider AWS WAF, Cloudflare, etc.
- DDoS Protection: Use cloud provider DDoS protection
Database Security
- Encryption at rest: Enable on database
- Encryption in transit: Use SSL/TLS connections
- Least privilege: Dedicated database user with minimal permissions
- No public access: Database in private subnet
- Audit logging: Enable PostgreSQL audit logs
Performance Optimization
Database Optimization
sql
-- Add indexes for common queries
CREATE INDEX idx_users_email ON users(email) WHERE deleted_at IS NULL;
CREATE INDEX idx_users_org ON users(organisation_id) WHERE deleted_at IS NULL;
CREATE INDEX idx_sessions_token ON sessions(session_token);
CREATE INDEX idx_sessions_expires ON sessions(expires_at);
CREATE INDEX idx_tokens_jti ON tokens(jti);
CREATE INDEX idx_refresh_tokens_token ON refresh_tokens(token_hash);
-- Analyze tables
ANALYZE users;
ANALYZE sessions;
ANALYZE tokens;Connection Pooling
env
# Prisma connection pool
DATABASE_URL="postgresql://user:pass@host:5432/db?connection_limit=20&pool_timeout=10"Caching
Consider adding Redis for:
- Session storage (faster than database)
- Rate limiting (distributed)
- Permission caching
Disaster Recovery
Backup Strategy
- Daily automated backups
- Retention: 30 days
- Cross-region replication
- Test restoration monthly
Recovery Procedures
Database corruption:
- Stop application
- Restore from latest backup
- Apply WAL logs (point-in-time recovery)
- Verify data integrity
- Restart application
Data breach:
- Revoke all refresh tokens
- Force password reset for all users
- Rotate JWT signing keys
- Rotate encryption keys
- Notify affected users
- Conduct security audit
Troubleshooting
High CPU Usage
- Check for slow database queries
- Review request patterns
- Scale horizontally
- Optimize business logic
Memory Leaks
- Monitor memory over time
- Check for unclosed connections
- Review event listeners
- Use heap snapshots for analysis
Database Connection Exhaustion
- Increase connection pool size
- Reduce connection timeout
- Check for connection leaks
- Scale database
High Latency
- Enable database query logging
- Add database indexes
- Implement caching
- Use CDN for static assets
Rollback Procedures
Application Rollback
bash
# Kubernetes
kubectl rollout undo deployment/cerberus-iam
# Docker Swarm
docker service update --rollback cerberus-iam
# Manual
# Deploy previous versionDatabase Rollback
bash
# If migration is reversible
npm run db:migrate -- --rollback
# Otherwise, restore from backup
psql -U cerberus -h db.internal cerberus_iam < backup.sqlPost-Deployment
- [ ] Verify health endpoint
- [ ] Test user login flow
- [ ] Verify email delivery
- [ ] Test OAuth2 flow
- [ ] Check logs for errors
- [ ] Monitor metrics dashboard
- [ ] Verify backups running
- [ ] Test rollback procedure
- [ ] Document deployment
Support
For production issues:
- Check health endpoint and logs
- Review metrics and alerts
- Check GitHub Issues
- Consult troubleshooting guide
Next Steps
- Monitoring Guide - Observability setup
- Docker Deployment - Container deployment
- Architecture - System architecture