Created attachment 1653095 [details] logs Description of problem: When a noobaa bucket is used in as a replication repository, the migrations fail because the backups cannot be restored. Version-Release number of selected component (if applicable): CAM 1.1 stage Controller: image: registry.stage.redhat.io/rhcam-1-1/openshift-migration-controller-rhel8@sha256:b55c0c36333656a46e1d0500cf11cc8aa06e093d312e7c54f8e1075d4ab4c6c1 Velero: image: registry.stage.redhat.io/rhcam-1-1/openshift-migration-velero-rhel8@sha256:612f56e7ea8ec7e1a591fa315c474eb979578f7c18743b060b87d0ac72aeb4a9 image: registry.stage.redhat.io/rhcam-1-1/openshift-migration-plugin-rhel8@sha256:d87438517fb7c332650bb18c579d91b07e4260bd7aa2eb48f2d938c2d09e123a image: registry.stage.redhat.io/rhcam-1-1/openshift-migration-velero-plugin-for-aws-rhel8@sha256:5235eeeee330165eef77ac8d823eed384c9108884f6be49c9ab47944051af91e image: registry.stage.redhat.io/rhcam-1-1/openshift-migration-velero-plugin-for-gcp-rhel8@sha256:789b12ff351d3edde735b9f5eebe494a8ac5a94604b419dfd84e87d073b04e9e image: registry.stage.redhat.io/rhcam-1-1/openshift-migration-velero-plugin-for-microsoft-azure-rhel8@sha256:b98f1c61ba347aaa0c8dac5c34b6be4b8cce20c8ff462f476a3347d767ad0a93 Operator: imageID: registry.stage.redhat.io/rhcam-1-1/openshift-migration-rhel7-operator@sha256:df9a2c75bccbc1a78c5dea6125730cc7ddbb11dc7d58331a3a937f10e3baaff8 How reproducible: Always Steps to Reproduce: 1. Add a noobaa bucket replication repository using https endpoint 2. Create a migration plan for a namespace with any PVC in it. 3. Execute the migration Actual results: The migration fails with "StageRestoreFailed" message. In velero logs we can find this error time="2020-01-17T12:17:28Z" level=error msg="Error restoring volume" controller=pod-volume-restore error="error running restic restore, cmd=restic restore --repo=s3:https://s3-noobaa.apps.cluster-xxxxx-ocp4-tgt.xxxxx-ocp4-tgt.qe.devcluster.openshift.com/noobaabucket/velero/restic/ng-cloud2ocso --password-file=/tmp/velero-restic-credentials-ng-cloud2ocso964688865 --insecure-skip-tls-verify --cache-dir=/scratch/.cache/restic c00940ac --target=. --skip-unchanged --delete, stdout=, stderr=: error getting snapshot size: error running command, stderr=Fatal: unable to open config file: Stat: Get https://s3-noobaa.apps.cluster-xxxxx-ocp4-tgt.xxxxx-ocp4-tgt.qe.devcluster.openshift.com/noobaabucket/?location=: x509: certificate signed by unknown authority\nIs there a repository at the following location?\ns3:https://s3-noobaa.apps.cluster-xxxxx-ocp4-tgt.xxxxx-ocp4-tgt.qe.devcluster.openshift.com/noobaabucket/velero/restic/ng-cloud2ocso\n: exit status 1" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/restic/exec_commands.go:235" error.function=github.com/vmware-tanzu/velero/pkg/restic.getSnapshotSize logSource="pkg/controller/pod_volume_restore_controller.go:299" name=ff25a570-3922-11ea-a562-496b9c6262e5-jqxhs-cmnnw namespace=openshift-migration restore=openshift-migration/ff25a570-3922-11ea-a562-496b9c6262e5-jqxhs time="2020-01-17T12:17:28Z" level=info msg="Restore starting" controller=pod-volume-restore logSource="pkg/controller/pod_volume_restore_controller.go:263" name=ff25a570-3922-11ea-a562-496b9c6262e5-jqxhs-4mj27 namespace=openshift-migration restore=openshift-migration/ff25a570-3922-11ea-a562-496b9c6262e5-jqxhs Expected results: The migrations should be executed without failures. Additional info: All logs are attached to this issue.
I was able to reproduce this on my cluster, and I believe I found the culprit. Migration completes successfully with the following patch: https://github.com/fusor/velero/pull/51
Verified in CAM 1.1 stage Controller: imageID: registry.stage.redhat.io/rhcam-1-1/openshift-migration-controller-rhel8@sha256:44e0b889db53f97abea549d9dcd4ad9b2491a4ff31e6d1afc251596d60c104b5 Velero: imageID: registry.stage.redhat.io/rhcam-1-1/openshift-migration-plugin-rhel8@sha256:9c6eceba0c422b9f375c3ab785ff392093493ce33def7c761d7cedc51cde775d imageID: registry.stage.redhat.io/rhcam-1-1/openshift-migration-velero-plugin-for-aws-rhel8@sha256:5235eeeee330165eef77ac8d823eed384c9108884f6be49c9ab47944051af91e imageID: registry.stage.redhat.io/rhcam-1-1/openshift-migration-velero-plugin-for-gcp-rhel8@sha256:789b12ff351d3edde735b9f5eebe494a8ac5a94604b419dfd84e87d073b04e9e imageID: registry.stage.redhat.io/rhcam-1-1/openshift-migration-velero-plugin-for-microsoft-azure-rhel8@sha256:b98f1c61ba347aaa0c8dac5c34b6be4b8cce20c8ff462f476a3347d767ad0a93 imageID: registry.stage.redhat.io/rhcam-1-1/openshift-migration-velero-rhel8@sha256:29ab439545c0dc765af23b287721a766879647a750443e39658e1894d38555fc
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:0440