Backup and Restore Runbook

Procedures for backing up and restoring SatGate Gateway data.

Data Inventory

Component	Data	Criticality	Loss Impact
PostgreSQL	Tokens, audit log, tenants, config	Critical	Service disruption, compliance failure
Redis	Ban list, rate limits, sessions	High	Temporary governance gap
Gateway Config	`gateway.yaml`	Medium	Manual reconfiguration needed
Secrets	Admin token, JWT secret, audit key	Critical	Complete lockout

RPO/RTO Targets

Metric	Target	Method
RPO (Recovery Point Objective)	≤ 1 hour	WAL archiving + hourly snapshots
RTO (Recovery Time Objective)	≤ 4 hours	Restore + verify + DNS failover

PostgreSQL Backup

Automated Daily Backup

Kubernetes CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: satgate-pg-backup
  namespace: satgate
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM UTC
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: backup
              image: postgres:15-alpine
              command:
                - /bin/sh
                - -c
                - |
                  TIMESTAMP=$(date +%Y%m%d_%H%M%S)
                  pg_dump $DATABASE_URL | gzip > /backups/satgate_$TIMESTAMP.sql.gz
                  # Upload to S3 (if AWS CLI available)
                  # aws s3 cp /backups/satgate_$TIMESTAMP.sql.gz s3://backups/satgate/
                  # Cleanup old local backups (keep 7 days)
                  find /backups -name "satgate_*.sql.gz" -mtime +7 -delete
              env:
                - name: DATABASE_URL
                  valueFrom:
                    secretKeyRef:
                      name: satgate-database
                      key: DATABASE_URL
              volumeMounts:
                - name: backups
                  mountPath: /backups
          restartPolicy: OnFailure
          volumes:
            - name: backups
              persistentVolumeClaim:
                claimName: satgate-backups

Manual Backup

# Get database URL
export DATABASE_URL=$(kubectl get secret satgate-database -n satgate \
  -o jsonpath='{.data.DATABASE_URL}' | base64 -d)

# Create backup
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
pg_dump $DATABASE_URL | gzip > satgate_backup_$TIMESTAMP.sql.gz

# Upload to S3 (optional)
aws s3 cp satgate_backup_$TIMESTAMP.sql.gz s3://your-bucket/satgate/backups/

WAL Archiving (Point-in-Time Recovery)

For production, enable WAL archiving in PostgreSQL:

-- postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'aws s3 cp %p s3://your-bucket/satgate/wal/%f'

PostgreSQL Restore

From Backup File

# 1. Stop the gateway to prevent writes
kubectl scale deployment satgate-gateway -n satgate --replicas=0

# 2. Get database credentials
export DATABASE_URL=$(kubectl get secret satgate-database -n satgate \
  -o jsonpath='{.data.DATABASE_URL}' | base64 -d)

# 3. Drop and recreate database (DESTRUCTIVE!)
psql $DATABASE_URL -c "DROP SCHEMA public CASCADE; CREATE SCHEMA public;"

# 4. Restore from backup
gunzip -c satgate_backup_YYYYMMDD_HHMMSS.sql.gz | psql $DATABASE_URL

# 5. Verify restoration
psql $DATABASE_URL -c "SELECT COUNT(*) FROM audit_log;"
psql $DATABASE_URL -c "SELECT COUNT(*) FROM tenants;"

# 6. Restart gateway
kubectl scale deployment satgate-gateway -n satgate --replicas=3

# 7. Verify health
kubectl port-forward svc/satgate-gateway 8080:8080 -n satgate &
curl http://localhost:8080/healthz

Point-in-Time Recovery

# 1. Restore base backup
pg_restore --dbname=$DATABASE_URL base_backup.tar

# 2. Apply WAL files up to target time
# (Requires recovery.conf or recovery.signal in PG 12+)

Redis Backup

Manual Backup

# Connect to Redis pod
kubectl exec -it $(kubectl get pod -l app=redis -n satgate -o jsonpath='{.items[0].metadata.name}') \
  -n satgate -- redis-cli BGSAVE

# Copy RDB file
kubectl cp satgate/$(kubectl get pod -l app=redis -n satgate -o jsonpath='{.items[0].metadata.name}'):/data/dump.rdb \
  ./redis_backup_$(date +%Y%m%d).rdb

Automated Backup (CronJob)

apiVersion: batch/v1
kind: CronJob
metadata:
  name: satgate-redis-backup
  namespace: satgate
spec:
  schedule: "30 2 * * *"  # Daily at 2:30 AM UTC
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: backup
              image: redis:7-alpine
              command:
                - /bin/sh
                - -c
                - |
                  redis-cli -h redis -a $REDIS_PASSWORD BGSAVE
                  sleep 5
                  cp /data/dump.rdb /backups/redis_$(date +%Y%m%d).rdb
              env:
                - name: REDIS_PASSWORD
                  valueFrom:
                    secretKeyRef:
                      name: satgate-redis
                      key: password
              volumeMounts:
                - name: redis-data
                  mountPath: /data
                - name: backups
                  mountPath: /backups
          restartPolicy: OnFailure

Redis Restore

# 1. Stop gateway
kubectl scale deployment satgate-gateway -n satgate --replicas=0

# 2. Copy backup to Redis pod
kubectl cp redis_backup_YYYYMMDD.rdb \
  satgate/$(kubectl get pod -l app=redis -n satgate -o jsonpath='{.items[0].metadata.name}'):/data/dump.rdb

# 3. Restart Redis
kubectl delete pod -l app=redis -n satgate

# 4. Wait for Redis to load data
kubectl exec -it $(kubectl get pod -l app=redis -n satgate -o jsonpath='{.items[0].metadata.name}') \
  -n satgate -- redis-cli PING

# 5. Restart gateway
kubectl scale deployment satgate-gateway -n satgate --replicas=3

Secrets Backup

Critical: Losing secrets means complete lockout.

Export to Encrypted File

# Export secrets (store securely!)
kubectl get secret satgate-auth -n satgate -o yaml > satgate-secrets.yaml

# Encrypt with GPG (recommended)
gpg --encrypt --recipient security@yourcompany.com satgate-secrets.yaml
rm satgate-secrets.yaml

# Store encrypted file in secure location (Vault, encrypted S3, etc.)

Restore Secrets

# Decrypt
gpg --decrypt satgate-secrets.yaml.gpg > satgate-secrets.yaml

# Apply (careful - this overwrites existing secrets)
kubectl apply -f satgate-secrets.yaml

# Clean up
rm satgate-secrets.yaml

Gateway Config Backup

Configuration should be in version control (GitOps), but as a fallback:

# Export current config
kubectl get configmap satgate-gateway-config -n satgate -o yaml > satgate-config-backup.yaml

# Export Helm values
helm get values satgate-gateway -n satgate -o yaml > satgate-values-backup.yaml

Disaster Recovery Procedure

Complete Cluster Loss

Provision new cluster

# Using Terraform
cd deploy/terraform/aws  # or gcp
terraform apply

Restore secrets

gpg --decrypt satgate-secrets.yaml.gpg | kubectl apply -f -

Install Helm chart

helm install satgate-gateway oci://ghcr.io/satgate-io/charts/satgate-gateway \
  --namespace satgate \
  --values satgate-values-backup.yaml

Restore PostgreSQL

gunzip -c satgate_backup_latest.sql.gz | psql $DATABASE_URL

Restore Redis (optional)
- Redis data can be rebuilt from PostgreSQL
- Only needed if you want to preserve in-flight sessions

Verify

curl https://api.example.com/healthz
curl https://api.example.com/api/v1/system/preflight \
  -H "X-Admin-Token: $ADMIN_TOKEN"

Update DNS
- Point domain to new load balancer
- Wait for propagation (TTL)

Verification Checklist

After any restore:

Backup Retention Policy

Data	Hot (Immediate)	Warm (Archive)	Cold (Long-term)
PostgreSQL	7 days	90 days	7 years (compliance)
Redis	3 days	30 days	N/A
Secrets	Current only	N/A	Rotate, don't archive

Monitoring Backup Health

Add Prometheus alerts:

groups:
  - name: backup-alerts
    rules:
      - alert: BackupFailed
        expr: kube_job_status_failed{job_name=~"satgate-.*-backup.*"} > 0
        for: 1h
        labels:
          severity: critical
        annotations:
          summary: "SatGate backup job failed"

      - alert: BackupMissing
        expr: time() - max(kube_job_status_completion_time{job_name=~"satgate-pg-backup.*"}) > 86400 * 2
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "No successful backup in 2 days"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backup and Restore Runbook

Data Inventory

RPO/RTO Targets

PostgreSQL Backup

Automated Daily Backup

Manual Backup

WAL Archiving (Point-in-Time Recovery)

PostgreSQL Restore

From Backup File

Point-in-Time Recovery

Redis Backup

Manual Backup

Automated Backup (CronJob)

Redis Restore

Secrets Backup

Export to Encrypted File

Restore Secrets

Gateway Config Backup

Disaster Recovery Procedure

Complete Cluster Loss

Verification Checklist

Backup Retention Policy

Monitoring Backup Health

FilesExpand file tree

backup-restore.md

Latest commit

History

backup-restore.md

File metadata and controls

Backup and Restore Runbook

Data Inventory

RPO/RTO Targets

PostgreSQL Backup

Automated Daily Backup

Manual Backup

WAL Archiving (Point-in-Time Recovery)

PostgreSQL Restore

From Backup File

Point-in-Time Recovery

Redis Backup

Manual Backup

Automated Backup (CronJob)

Redis Restore

Secrets Backup

Export to Encrypted File

Restore Secrets

Gateway Config Backup

Disaster Recovery Procedure

Complete Cluster Loss

Verification Checklist

Backup Retention Policy

Monitoring Backup Health