1968621 – Paused deployment config causes a migration to hang

Bug 1968621 - Paused deployment config causes a migration to hang

Summary: Paused deployment config causes a migration to hang

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Migration Toolkit for Containers
Classification:	Red Hat
Component:	Controller
Sub Component:
Version:	1.4.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	1.6.0
Assignee:	Jason Montleon
QA Contact:	Xin jiang
Docs Contact:	Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-06-07 16:54 UTC by Erik Nelson
Modified:	2021-09-29 14:34 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-09-29 14:34:47 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	konveyor mig-controller pull 1178	0	None	None	None	2021-08-24 13:42:31 UTC
Red Hat Product Errata	RHSA-2021:3694	0	None	None	None	2021-09-29 14:34:55 UTC

Description Erik Nelson 2021-06-07 16:54:41 UTC

Description of problem:
Raised by user: if a DC is paused on the source side, a migration will hang waiting for quiesce unless it is resumed, or scaled to 0 manually.

Version-Release number of selected component (if applicable):
1.4.2

Comment 2 Jason Montleon 2021-08-24 18:46:27 UTC

https://github.com/konveyor/mig-controller/pull/1178 will allow quiescing the application to work and the migration to successfully complete, but there is a catch.

- We take a backup then quiesce the application
- Next we quiesce the application and while doing so we annotate the original number of replicas before setting it to zero and with the PR above the original pause state.
- When we restore the application it is restored in the original state (say 1 replicas and paused, as an example), but because it's paused it will not roll out.
- I don't currently have a good solution to this problem, other than a post restore hook along the lines of the example I implemented: https://github.com/konveyor/mig-operator/pull/741

Comment 3 Jason Montleon 2021-08-24 18:54:18 UTC

To further clarify the annotations are primarily used during a rollback and in this case the unquiesce works as expected.

There is a less common scenario where we also have annotations - when the original migration failed, the applications have become quiesced, and a subsequent migration is performed. In this case we should have the annotations available and in theory the unquiesce code will run on these on the destination. But in the hopefully normally case of a first time success nothing is there to handle rolling out the paused deployment(config)

Comment 4 Jason Montleon 2021-08-24 20:31:27 UTC

We can probably handle this with backup and restore plugins for velero in openshift-velero-plugin. Possibly just replicaset and replicationcontroller adjustments so we restore them if the owner deployment / deploymentconfig is paused. We might also have to restore pods, but that will take some testing to determine.

The basic idea is we'll get the ownerRef and look at the Deployment or DeploymentConfig to see if it's paused while backing up. If it is we'll annotate the replicaset or replicationcontroller when backing up.

When restoring we'll look for the annotation and if it's there we'll restore, otherwise we'll continue doing what we normally do.

If necessary we'll do the same for pods.

This should alleviate the need to use a hook and will also work for OADP.

Comment 5 Jason Montleon 2021-08-25 19:37:04 UTC

In addition to the controller changes this will adjust the plugin to ensure the replicaset or replicationcontroller is restored and the pod started when the deployment or deploymentconfig is paused. https://github.com/openshift/openshift-velero-plugin/pull/100

Comment 10 Xin jiang 2021-09-15 03:19:07 UTC

verified with mtc1.6.0

registry.redhat.io/rhmtc/openshift-migration-controller-rhel8@sha256:3b5efa9c8197fe0313a2ab7eb184d135ba9749c9a4f0d15a6abb11c0d18b9194
registry.redhat.io/rhmtc/openshift-velero-plugin-rhel8@sha256:ea8d7eeae177b6400b82dd528a1205763a7d76f511fcaa29ffc8818facf84cb1

Comment 12 errata-xmlrpc 2021-09-29 14:34:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Migration Toolkit for Containers (MTC) 1.6.0 security & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3694

Note You need to log in before you can comment on or make changes to this bug.