Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8SPG-619: restart backup jobs on failure #969

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

K8SPG-619: restart backup jobs on failure #969

wants to merge 11 commits into from

Conversation

pooknull
Copy link
Contributor

@pooknull pooknull commented Dec 3, 2024

K8SPG-619 Powered by Pull Request Badge

https://perconadev.atlassian.net/browse/K8SPG-619

DESCRIPTION

Problem:
The backup pod currently fails on the first attempt, resulting in the creation of a new pod on failure. This behavior may not be reliable in all Kubernetes environments, due to potential delays in establishing communication with the Kubernetes API.

Cause:
The backup job’s restartPolicy is set to Never, preventing the existing pod from retrying after a failure.

Solution:
Add new .spec.backups.pgbackrest.jobs.restartPolicy and .spec.backups.pgbackrest.jobs.backoffLimit fields to the cr.yaml file so that the user can change it to suit their needs.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported PG version?
  • Does the change support oldest and newest supported Kubernetes version?

@pooknull pooknull marked this pull request as ready for review December 3, 2024 12:26
@@ -329,6 +329,8 @@ spec:
# - secret:
# name: cluster1-pgbackrest-secrets
# jobs:
# restartPolicy: OnFailure
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not change the defaults. We need to add the possibility of configuring it, but we need to have the old behavior by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pooknull pooknull requested a review from hors December 23, 2024 05:52
@JNKPercona
Copy link
Collaborator

Test name Status
custom-extensions passed
custom-tls passed
demand-backup passed
finalizers passed
init-deploy passed
monitoring passed
one-pod passed
operator-self-healing passed
pitr passed
scaling passed
scheduled-backup passed
self-healing passed
sidecars passed
start-from-backup passed
tablespaces passed
telemetry-transfer passed
upgrade-consistency passed
upgrade-minor passed
users passed
We run 19 out of 19

commit: d8b4d5d
image: perconalab/percona-postgresql-operator:PR-969-d8b4d5de5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants