Bug 1920911 - Migration fails when Indirect Image Migration and Direct Volume Migration are configured at the same time
Summary: Migration fails when Indirect Image Migration and Direct Volume Migration are...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: General
Version: 1.4.0
Hardware: Unspecified
OS: Unspecified
urgent
medium
Target Milestone: ---
: 1.4.0
Assignee: Dylan Murray
QA Contact: Xin jiang
Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-27 09:16 UTC by Sergio
Modified: 2021-02-11 12:55 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-11 12:55:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github konveyor mig-controller pull 936 0 None closed Bug 1920911: Do not include DVM volumes in stage backups 2021-02-02 02:08:49 UTC
Github konveyor mig-controller pull 937 0 None open Bug 1920911: Do not include DVM volumes in stage backups (#936) 2021-01-29 17:31:51 UTC
Red Hat Product Errata RHBA-2020:5329 0 None None None 2021-02-11 12:55:45 UTC

Description Sergio 2021-01-27 09:16:00 UTC
Description of problem:
When a migration involving internal images and PVCs is executed using Indirect Image Migration and Direct Volume Migration, the migration fails in StageBackup phase.

Version-Release number of selected component (if applicable):
MTC 1.4.0
SOURCE CLUSTER: AWS OCP3.11 (storage class gp2)
TARGET CLUSTER: AWS OCP4.5 (storage class gp2)
REPLICATION REPOSITORY: AWS S3

How reproducible:
Always

Steps to Reproduce:
1. In source cluster. Deploy an application using internal images and PVCs

oc new-project bztest
oc -n bztest new-app --template django-psql-persistent

2. Create a migration plan for this namespace.

Select:
- Indirect Image Migration
- Direct Volume Migration

3. Migrate the migration plan. Do not quiesce the pods.

Actual results:
In StageBackup a failure happens, and an error in the backup is reported.

In the backup logs we can see this error:

time="2021-01-26T14:07:18Z" level=error msg="Error backing up item" backup=openshift-migration/8bbf2e00-5fdf-11eb-938b-f5eff88f2b85-5qp9d error="error getting volume info: rpc error: code = Unknown desc = InvalidVolume.NotFound: The volume 'vol-0d3f70c4c7ee100c9' does not exist.\n\tstatus code: 400, request id: 6a072926-0c82-4718-ad4a-dfcfd4bc56d2" logSource="pkg/backup/backup.go:455" name=postgresql


Expected results:
There should be no errors.

Additional info:
If we use MCG (noobaa) instead of AWS S3 as replication repository, the error that we get is this other one:

time="2021-01-26T14:00:45Z" level=error msg="Error backing up item" backup=openshift-migration/a39dfa70-5fde-11eb-938b-f5eff88f2b85-q7gz5 error="error getting volume info: rpc error: code = Unknown desc = AuthFailure: AWS was not able to validate the provided access credentials\n\tstatus code: 401, request id: 1b434c64-0846-4003-956b-3a8c832b0cff" logSource="pkg/backup/backup.go:455" name=postgresql

Comment 1 Erik Nelson 2021-01-27 14:27:13 UTC
in 4.x -> 4.x it does not seem to happen though. @xjiang cannot reproduce it at least in 4.4 -> 4.7

Comment 2 Dylan Murray 2021-01-27 18:29:53 UTC
This issue occurs when a user has any volumes which are backed by an associated cloud provider that has a registered velero snapshot plugin regardless if `snapshot` was actually selected.

I am updating the code to only include PVCs in the stage backup that actually requested a snapshot.

Comment 6 Sergio 2021-02-02 13:02:47 UTC
Verified using MTC 1.4.0, 3.11 -> 4.3 AWS, AWS S3

openshift-migration-rhel7-operator@sha256:622e42cef37e3e445d04c0c7f28455b322ed5ddb11b0063c2af9950de09121ab
    - name: MIG_CONTROLLER_REPO
      value: openshift-migration-controller-rhel8@sha256
    - name: MIG_CONTROLLER_TAG
      value: 5590dc251338f1d909bb6c76722d251c5de114c272d6425455549623a5472c4d
    - name: VELERO_TAG
      value: 8f2737eb2a9245b945f08007459c3fb7cd304901cadaaff3a673d88e5980c6b5
    - name: VELERO_PLUGIN_REPO
      value: openshift-velero-plugin-rhel8@sha256
    - name: VELERO_PLUGIN_TAG
      value: 2398f40ec877039f3216702c31ea2881f5618f0580df0adcdee2b79e0d99ee57
    - name: VELERO_AWS_PLUGIN_REPO
      value: openshift-migration-velero-plugin-for-aws-rhel8@sha256
    - name: VELERO_AWS_PLUGIN_TAG
      value: df442c91afdda47807f61a5749c4e83b7bdafba107b831c86f28c21ae74f281f
    - name: VELERO_GCP_PLUGIN_REPO
      value: openshift-migration-velero-plugin-for-gcp-rhel8@sha256
    - name: VELERO_GCP_PLUGIN_TAG
      value: 2ec9701726854f62c7acea2059492f1343ee8579aa5721e751593ea953b91dc5
    - name: VELERO_AZURE_PLUGIN_REPO
      value: openshift-migration-velero-plugin-for-microsoft-azure-rhel8@sha256
    - name: VELERO_AZURE_PLUGIN_TAG
      value: b8db59eb4b9a2d4748142e6e435dcfbf3187032b64302b88affbff98cb728e3c


Moved to VERIFIED.

Comment 8 errata-xmlrpc 2021-02-11 12:55:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Migration Toolkit for Containers (MTC) tool image release advisory 1.4.0), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5329


Note You need to log in before you can comment on or make changes to this bug.