Bug 1844638

Summary: Automatic rollback of migrated workloads is not configurable
Product: OpenShift Container Platform Reporter: Derek Whatley <dwhatley>
Component: Migration ToolingAssignee: Derek Whatley <dwhatley>
Status: CLOSED ERRATA QA Contact: Xin jiang <xjiang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.4CC: chezhang, jmatthew, mberube, rjohnson, sregidor, whu
Target Milestone: ---   
Target Release: 4.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1845092 (view as bug list) Environment:
Last Closed: 2020-06-17 00:04:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1845092    
Bug Blocks:    

Description Derek Whatley 2020-06-05 20:58:09 UTC
Description of problem:
When a migration fails, the user running a migration has no option to leave partially migrated workloads in place on the destination so that they can finish the migration manually.

Version-Release number of selected component (if applicable):
1.2.0

How reproducible:
Always

Steps to Reproduce:
1. Run a migration
2. Encounter an error

Actual results:
Migration enters failed state and then runs "FailedItinerary" which deletes resources from target cluster and scales them back up on source


Expected results:
User is given a configuration option to allow for enabling or disabling of automatic rollback functionality


Additional info:

Comment 1 Derek Whatley 2020-06-05 20:58:51 UTC
PRs adding a config switch for this are up, working on getting them tested.

mig-controller: https://github.com/konveyor/mig-controller/pull/560
mig-operator: https://github.com/konveyor/mig-operator/pull/370

Comment 2 Derek Whatley 2020-06-08 20:28:41 UTC
PRs merged and cherry-picked to release-1.2.2 branches, waiting for next build.


https://github.com/konveyor/mig-controller/pull/560
https://github.com/konveyor/mig-operator/pull/370

Comment 6 Sergio 2020-06-10 15:25:06 UTC
Verified using CAM 1.2.2 stage

In order to verify the issue, we configured a very short restic timeout and removed the restic pods to force a failure in 'StageRestoreCreated' stage, and run a quiesced migration of a django application

By default (not configured mig_failure_rollback attribute in MigrationController resource) the migration failed, and the result was pods being quiesced in the source application.

When we configured "mig_failure_rollback: true" and run the application again the PVC was deleted in the target cluster and the pods in the source cluster were scaled up again and were working fine.


    - name: MIG_CONTROLLER_REPO
      value: openshift-migration-controller-rhel8@sha256
    - name: MIG_CONTROLLER_TAG
      value: 3923f6000eaff8c5f02d778e1d7b93515a8bc23990d54f917c30a108f7a37b3a
    - name: MIG_UI_REPO
      value: openshift-migration-ui-rhel8@sha256
    - name: MIG_UI_TAG
      value: 6abfaea8ac04e3b5bbf9648a3479b420b4baec35201033471020c9cae1fe1e11
    - name: MIGRATION_REGISTRY_REPO
      value: openshift-migration-registry-rhel8@sha256
    - name: MIGRATION_REGISTRY_TAG
      value: ea6301a15277d448c8756881c7e2e712893ca8041c913476640f52da9e76cad9
    - name: VELERO_REPO
      value: openshift-migration-velero-rhel8@sha256
    - name: VELERO_TAG
      value: 1a33e327dd610f0eebaaeae5b3c9b4170ab5db572b01a170be35b9ce946c0281
    - name: VELERO_PLUGIN_REPO
      value: openshift-migration-plugin-rhel8@sha256
    - name: VELERO_PLUGIN_TAG
      value: 7eba00127497c4ca6452f9be0c167c2276bed462b648edf51d8bbe7265392879

Comment 8 errata-xmlrpc 2020-06-17 00:04:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2571