-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Lagoon has created backup schedules for two different environments of the same project. Here is the diff showing an identical schedule in two different namespaces:
apiVersion: backup.appuio.ch/v1alpha1
kind: Schedule
metadata:
name: k8up-lagoon-backup-schedule
- namespace: foo-staging
+ namespace: foo-pr-1037
spec:
backend:
repoPasswordSecretRef:
key: repo-pw
name: baas-repo-pw
s3:
bucket: baas-cluster-id0/baas-foo
backup:
resources: {}
schedule: 23 1 * * *
check:
resources: {}
schedule: 23 7 * * 1
prune:
resources: {}
retention:
keepDaily: 7
keepMonthly: 1
keepWeekly: 6
schedule: 23 4 * * 0These schedules cause two checks to run at the same time (one in each namespace). This sometimes works if the repository is small (so the check is quick) but often doesn't work because restic takes an exclusive lock on the repository during a check. So if the check that wins the race to get a lock on the repository takes longer than the retry time of the other check jobs (which seems to be 5x over ~2 minutes) the other checks always fail.
The same problem exists with prune schedules because that command also takes an exclusive lock.
Backups are not affected because that command only take an append lock.
Ideas for solving the issue:
- Maybe Lagoon should only add a
checkschedule to a single environment (the first production env in the project?). And thenpruneon the first of each ofdevelopmentandproductionsince they can have differing policies. - Somehow ensure that
checkandprunefor each env runs at different times. Though it seems difficult and pointless since the command only needs to run once per repository? - Lagoon maintains a special namespace per-project or per-cluster which has a
Schedulefor each repository containing the prod and devcheckandpruneschedules? - Something else???