Description of problem: When a migration involving internal images and PVCs is executed using Indirect Image Migration and Direct Volume Migration, the migration fails in StageBackup phase. Version-Release number of selected component (if applicable): MTC 1.4.0 SOURCE CLUSTER: AWS OCP3.11 (storage class gp2) TARGET CLUSTER: AWS OCP4.5 (storage class gp2) REPLICATION REPOSITORY: AWS S3 How reproducible: Always Steps to Reproduce: 1. In source cluster. Deploy an application using internal images and PVCs oc new-project bztest oc -n bztest new-app --template django-psql-persistent 2. Create a migration plan for this namespace. Select: - Indirect Image Migration - Direct Volume Migration 3. Migrate the migration plan. Do not quiesce the pods. Actual results: In StageBackup a failure happens, and an error in the backup is reported. In the backup logs we can see this error: time="2021-01-26T14:07:18Z" level=error msg="Error backing up item" backup=openshift-migration/8bbf2e00-5fdf-11eb-938b-f5eff88f2b85-5qp9d error="error getting volume info: rpc error: code = Unknown desc = InvalidVolume.NotFound: The volume 'vol-0d3f70c4c7ee100c9' does not exist.\n\tstatus code: 400, request id: 6a072926-0c82-4718-ad4a-dfcfd4bc56d2" logSource="pkg/backup/backup.go:455" name=postgresql Expected results: There should be no errors. Additional info: If we use MCG (noobaa) instead of AWS S3 as replication repository, the error that we get is this other one: time="2021-01-26T14:00:45Z" level=error msg="Error backing up item" backup=openshift-migration/a39dfa70-5fde-11eb-938b-f5eff88f2b85-q7gz5 error="error getting volume info: rpc error: code = Unknown desc = AuthFailure: AWS was not able to validate the provided access credentials\n\tstatus code: 401, request id: 1b434c64-0846-4003-956b-3a8c832b0cff" logSource="pkg/backup/backup.go:455" name=postgresql
in 4.x -> 4.x it does not seem to happen though. @xjiang cannot reproduce it at least in 4.4 -> 4.7
This issue occurs when a user has any volumes which are backed by an associated cloud provider that has a registered velero snapshot plugin regardless if `snapshot` was actually selected. I am updating the code to only include PVCs in the stage backup that actually requested a snapshot.
Verified using MTC 1.4.0, 3.11 -> 4.3 AWS, AWS S3 openshift-migration-rhel7-operator@sha256:622e42cef37e3e445d04c0c7f28455b322ed5ddb11b0063c2af9950de09121ab - name: MIG_CONTROLLER_REPO value: openshift-migration-controller-rhel8@sha256 - name: MIG_CONTROLLER_TAG value: 5590dc251338f1d909bb6c76722d251c5de114c272d6425455549623a5472c4d - name: VELERO_TAG value: 8f2737eb2a9245b945f08007459c3fb7cd304901cadaaff3a673d88e5980c6b5 - name: VELERO_PLUGIN_REPO value: openshift-velero-plugin-rhel8@sha256 - name: VELERO_PLUGIN_TAG value: 2398f40ec877039f3216702c31ea2881f5618f0580df0adcdee2b79e0d99ee57 - name: VELERO_AWS_PLUGIN_REPO value: openshift-migration-velero-plugin-for-aws-rhel8@sha256 - name: VELERO_AWS_PLUGIN_TAG value: df442c91afdda47807f61a5749c4e83b7bdafba107b831c86f28c21ae74f281f - name: VELERO_GCP_PLUGIN_REPO value: openshift-migration-velero-plugin-for-gcp-rhel8@sha256 - name: VELERO_GCP_PLUGIN_TAG value: 2ec9701726854f62c7acea2059492f1343ee8579aa5721e751593ea953b91dc5 - name: VELERO_AZURE_PLUGIN_REPO value: openshift-migration-velero-plugin-for-microsoft-azure-rhel8@sha256 - name: VELERO_AZURE_PLUGIN_TAG value: b8db59eb4b9a2d4748142e6e435dcfbf3187032b64302b88affbff98cb728e3c Moved to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Migration Toolkit for Containers (MTC) tool image release advisory 1.4.0), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5329