K8SPG-619: restart backup jobs on failure #969

pooknull · 2024-12-03T00:03:24Z

https://perconadev.atlassian.net/browse/K8SPG-619

DESCRIPTION

Problem:
The backup pod currently fails on the first attempt, resulting in the creation of a new pod on failure. This behavior may not be reliable in all Kubernetes environments, due to potential delays in establishing communication with the Kubernetes API.

Cause:
The backup job’s restartPolicy is set to Never, preventing the existing pod from retrying after a failure.

Solution:
Add new .spec.backups.pgbackrest.jobs.restartPolicy and .spec.backups.pgbackrest.jobs.backoffLimit fields to the cr.yaml file so that the user can change it to suit their needs.

CHECKLIST

Jira

Is the Jira ticket created and referenced properly?
Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

Is an E2E test/test case added for the new feature/change?
Are unit tests added where appropriate?

Config/Logging/Testability

Are all needed new/changed options added to default YAML files?
Are all needed new/changed options added to the Helm Chart?
Did we add proper logging messages for operator actions?
Did we ensure compatibility with the previous version or cluster upgrade process?
Does the change support oldest and newest supported PG version?
Does the change support oldest and newest supported Kubernetes version?

https://perconadev.atlassian.net/browse/K8SPG-619

hors · 2024-12-20T10:20:10Z

deploy/cr.yaml

@@ -329,6 +329,8 @@ spec:
 #        - secret:
 #            name: cluster1-pgbackrest-secrets
 #      jobs:
+#        restartPolicy: OnFailure


Please do not change the defaults. We need to add the possibility of configuring it, but we need to have the old behavior by default.

JNKPercona · 2024-12-23T06:53:11Z

Test name	Status
custom-extensions	passed
custom-tls	passed
demand-backup	passed
finalizers	passed
init-deploy	passed
monitoring	passed
one-pod	passed
operator-self-healing	passed
pitr	passed
scaling	passed
scheduled-backup	passed
self-healing	passed
sidecars	passed
start-from-backup	passed
tablespaces	passed
telemetry-transfer	passed
upgrade-consistency	passed
upgrade-minor	passed
users	passed
We run 19 out of 19

commit: d8b4d5d
image: perconalab/percona-postgresql-operator:PR-969-d8b4d5de5

pooknull added 2 commits December 3, 2024 02:02

K8SPG-619: restart backup jobs on failure

09a5bff

https://perconadev.atlassian.net/browse/K8SPG-619

fix unit-test

adb83ce

pooknull marked this pull request as ready for review December 3, 2024 12:26

pooknull requested review from hors, egegunes and inelpandzic as code owners December 3, 2024 12:26

pooknull and others added 5 commits December 3, 2024 14:26

Merge branch 'main' into K8SPG-619

d109270

Merge branch 'main' into K8SPG-619

a50d1b0

Add backoffLimit and restartPolicy to .spec.backups.pgbackrest.jobs

f01c9bb

fix unit-tests

5080f96

fix upgrade-minor test

aaa2c86

pooknull requested review from tplavcic, nmarukovich, ptankov, jvpasinatto and eleo007 as code owners December 16, 2024 19:14

pooknull added 2 commits December 18, 2024 13:11

Merge remote-tracking branch 'origin/main' into K8SPG-619

57b343e

fix unit-tests

450fbd4

hors requested changes Dec 20, 2024

View reviewed changes

pooknull added 2 commits December 23, 2024 07:50

keep defaults

39c7804

Merge branch 'main' into K8SPG-619

d8b4d5d

pooknull requested a review from hors December 23, 2024 05:52

inelpandzic approved these changes Dec 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8SPG-619: restart backup jobs on failure #969

K8SPG-619: restart backup jobs on failure #969

pooknull commented Dec 3, 2024 •

edited

Loading

hors Dec 20, 2024

pooknull Dec 23, 2024

JNKPercona commented Dec 23, 2024

K8SPG-619: restart backup jobs on failure #969

Are you sure you want to change the base?

K8SPG-619: restart backup jobs on failure #969

Conversation

pooknull commented Dec 3, 2024 • edited Loading

DESCRIPTION

CHECKLIST

hors Dec 20, 2024

Choose a reason for hiding this comment

pooknull Dec 23, 2024

Choose a reason for hiding this comment

JNKPercona commented Dec 23, 2024

pooknull commented Dec 3, 2024 •

edited

Loading