Description of problem: Raised by user: if a DC is paused on the source side, a migration will hang waiting for quiesce unless it is resumed, or scaled to 0 manually. Version-Release number of selected component (if applicable): 1.4.2
https://github.com/konveyor/mig-controller/pull/1178 will allow quiescing the application to work and the migration to successfully complete, but there is a catch. - We take a backup then quiesce the application - Next we quiesce the application and while doing so we annotate the original number of replicas before setting it to zero and with the PR above the original pause state. - When we restore the application it is restored in the original state (say 1 replicas and paused, as an example), but because it's paused it will not roll out. - I don't currently have a good solution to this problem, other than a post restore hook along the lines of the example I implemented: https://github.com/konveyor/mig-operator/pull/741
To further clarify the annotations are primarily used during a rollback and in this case the unquiesce works as expected. There is a less common scenario where we also have annotations - when the original migration failed, the applications have become quiesced, and a subsequent migration is performed. In this case we should have the annotations available and in theory the unquiesce code will run on these on the destination. But in the hopefully normally case of a first time success nothing is there to handle rolling out the paused deployment(config)
We can probably handle this with backup and restore plugins for velero in openshift-velero-plugin. Possibly just replicaset and replicationcontroller adjustments so we restore them if the owner deployment / deploymentconfig is paused. We might also have to restore pods, but that will take some testing to determine. The basic idea is we'll get the ownerRef and look at the Deployment or DeploymentConfig to see if it's paused while backing up. If it is we'll annotate the replicaset or replicationcontroller when backing up. When restoring we'll look for the annotation and if it's there we'll restore, otherwise we'll continue doing what we normally do. If necessary we'll do the same for pods. This should alleviate the need to use a hook and will also work for OADP.
In addition to the controller changes this will adjust the plugin to ensure the replicaset or replicationcontroller is restored and the pod started when the deployment or deploymentconfig is paused. https://github.com/openshift/openshift-velero-plugin/pull/100
verified with mtc1.6.0 registry.redhat.io/rhmtc/openshift-migration-controller-rhel8@sha256:3b5efa9c8197fe0313a2ab7eb184d135ba9749c9a4f0d15a6abb11c0d18b9194 registry.redhat.io/rhmtc/openshift-velero-plugin-rhel8@sha256:ea8d7eeae177b6400b82dd528a1205763a7d76f511fcaa29ffc8818facf84cb1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Migration Toolkit for Containers (MTC) 1.6.0 security & bugfix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3694