Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restic snapshot command try to get S3 bucket's credentials in the wrong path #477

Closed
ZoroXV opened this issue Jul 12, 2023 · 7 comments
Closed
Labels

Comments

@ZoroXV
Copy link

ZoroXV commented Jul 12, 2023

What steps did you take and what happened:

  • I have a on-premise kubernetes cluster and we used to have a velero running daily backup on it

  • We have installed velero using the Helm chart and we used to run under v3.1.4 of it

  • Recently i found out that the backups where partially failed so i first upgrade the chart to the version 4.1.3

  • Unfortunatelly this does not solve the issue

  • To explain more our setup, we want to store backups in two S3 buckets, one in our office in a Minio Server and the other one in OVH.

  • For that we have velero installed in two namespace in the cluster "velero-syno" and "velero-ovh"

  • And i don't know why but backups on "velero-syno", restic try to find credentials of bucket in the wrong path

Here is the error log in a backup in namespace "velero-syno"

{
  "backup": "velero-syno/test-12",
  "error.file": "/go/src/github.com/vmware-tanzu/velero/pkg/podvolume/backupper.go:250",
  "error.function": "github.com/vmware-tanzu/velero/pkg/podvolume.(*backupper).BackupPodVolumes",
  "error.message": "pod volume backup failed: error creating uploader: failed to connect repository: error running command=restic snapshots --repo=s3:https://minio.example.com:10000/velero-bucket/restic/test-ceph-rbd --password-file=/tmp/credentials/velero-ovh/velero-repo-credentials-repository-password --cache-dir=/scratch/.cache/restic --latest=1 --insecure-tls=true, stdout=, stderr=Fatal: unable to open config file: Stat: The Access Key Id you provided does not exist in our records.\nIs there a repository at the following location?\ns3:https://minio.example.com:10000/velero-bucket/restic/test-ceph-rbd\n: exit status 1",
  "level": "error",
  "logSource": "pkg/backup/backup.go:435",
  "msg": "Error backing up item",
  "name": "test-585dbb4d95-zh4gb",
  "time": "2023-07-12T08:27:18Z"
}

And the error is identical on velero-ovh namespace except restic try to find credentials in velero-syno

Otherwise, Velero access normally to my S3 buckets for backuping Kubernetes Object.
Only Pod Volume are not backuped

What did you expect to happen:
Restic using the correct path to retrieve credentials for S3 buckets

Environment:

  • helm version (use helm version): v3.2.4
  • helm chart version and app version (use helm list -n <YOUR NAMESPACE>): v4.1.3
  • Kubernetes version (use kubectl version): v1.24.13
  • Kubernetes installer & version: RKE1
  • Cloud provider or hardware configuration: VM
  • OS (e.g. from /etc/os-release): Ubuntu 20.04
@jenting jenting added the velero label Jul 12, 2023
@ZoroXV
Copy link
Author

ZoroXV commented Jul 17, 2023

Actually it seems that having Velero setuped in multiple namespace is not working. I think the Velero Pod can not make the difference between node agent pods of each namespace.

For me, when i create a backup with Restic Volume Backup on namespace A, Velero tried to use the node-agent pods of the namespace B

The solution to use is simply to merge all Velero Backups in a single namespace. The reason to use 2 namespace where that before Velero can not handle multiple BackupStorageLocation if you had multiple Provider

@ZoroXV ZoroXV closed this as completed Jul 17, 2023
@sseago
Copy link
Contributor

sseago commented Jul 17, 2023

@ZoroXV Hmm. If that's happening to you, that's a bug. Velero should be installable in multiple namespaces, and each velero instance (including node-agent pods) should ignore velero CRs in the other namespace.

@reasonerjt @blackpiglet maybe it's good to test this? We may have a regression here with recent kopia/node-agent refactoring.

@blackpiglet
Copy link
Collaborator

@sseago
For the case, this seems expected, so I didn't understand what kind of test should be considered.
Do you suggest adding some tests for the Velero helm chart?

@sseago
Copy link
Contributor

sseago commented Jul 18, 2023

@blackpiglet what seems expected? If velero is installed in 2 namespces, pod volume backups should work just fine in each namespace. It looks like the pod volume backup controller is grabbing PVBs in the wrong namespaces sometimes. This is a regression -- this all worked fine in Velero 1.9, but I suspect we introduced a bug in this area with the kopia refactor. I don't think this is a helm chart issue but a controller issue. I'll create a velero issue referencing this.

@sseago
Copy link
Contributor

sseago commented Jul 18, 2023

By "test this" I meant "lets make sure that kopia/restic backup/restore still works fine in both velero installs if velero is installed in multiple namespaces."

@blackpiglet
Copy link
Collaborator

blackpiglet commented Jul 18, 2023

@sseago
OK.

If velero is installed in 2 namespces, pod volume backups should work just fine in each namespace. It looks like the pod volume backup controller is grabbing PVBs in the wrong namespaces sometimes.

@Lyndon-Li
We should consider making some verification in this scenario.

@Lyndon-Li
Copy link

Lyndon-Li commented Jul 19, 2023

@sseago @blackpiglet
The problem is indeed reproducible, though I couldn't back trace when the problem was introduced.
I guess the problem was there for PVB/PVR after integrating with kubuilder. As the code here , the server's namespace is not set to the controller manager's cache.

Anyway, I will fix the problem by adding the namespace parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants