Bug 1887526 - "Stage" pods fail when migrating from classic OpenShift source cluster on IBM Cloud with block storage
Summary: "Stage" pods fail when migrating from classic OpenShift source cluster on IBM...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: General
Version: 1.3.0
Hardware: Unspecified
OS: Unspecified
urgent
low
Target Milestone: ---
: 1.6.0
Assignee: Jaydip Gabani
QA Contact: Xin jiang
Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-12 17:51 UTC by John Matthews
Modified: 2021-09-29 14:34 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-29 14:34:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:3694 0 None None None 2021-09-29 14:34:55 UTC

Description John Matthews 2020-10-12 17:51:32 UTC
Description of problem:

See upstream issue for more background:
https://github.com/konveyor/mig-controller/issues/694#issuecomment-706242906


MTC creates 'stage pods' to aid with ensuring a PV is mounted to a Pod, the approach is to launch a stage pod on the same node as the existing Pod which is consuming the PV.  

For 'Block' storage PVs, they are typically RWO, so only one pod may mount them at a time.  Our understanding was that this restriction was enforced at the Node level, so it'd be possible for 2 pods to mount the same RWO PV if it was scheduled on same Node.  

Seeing the behavior with IBM Block storage in IBM ROKS we are questioning our approach and need to reexamine how we should approach stage pods.

Comment 3 Scott Seago 2020-10-12 21:15:18 UTC
Here's what we'll need to do to resolve this:
1) swap quiesce and "create stage pods" phase order, since we will need to create stage pods for PVCs if we're quiescing
2) make sure that for PVCs that are mounted by more than one pod only one of these pods gets the restic annotation -- this will keep us from failing restore when we have ROX PVCs that must be mounted RWO for restore, and it keeps restic from attempting to backup/restore a volume more than once.
3) Only create new stage pods for the disconnected and quiesced PVCs, use live application pods for those that are going to be live through stage backup -- add restic annotations for volumes to back up to these live application pods
4) On stage restore, convert live application pods to stage pods, including only mounting PVCs that have corresponding restic annotations


Comment 5 Jaydip Gabani 2021-08-11 17:17:57 UTC
These two PRs implements the solution and resolves this issue of stage pod failing.

https://github.com/konveyor/mig-controller/pull/1164

https://github.com/openshift/openshift-velero-plugin/pull/97

Comment 10 Sergio 2021-09-15 13:32:10 UTC
Verified using:
REMOTE CLUSTER: AWS OCP 3.11 GP2
LOCAL CLUSTER: ROKS OCP 4.7 ibmc-block-gold  (controller + UI)
REPLICATION REPOSITORY: AWS S3

openshift-migration-rhel8-operator@sha256:1b93e062d7f1d6242634f3f6f8306c1a13ae47212c7aa805e248b6fda8b43409
    - name: MIG_CONTROLLER_REPO
      value: openshift-migration-controller-rhel8@sha256
    - name: MIG_CONTROLLER_TAG
      value: 3b5efa9c8197fe0313a2ab7eb184d135ba9749c9a4f0d15a6abb11c0d18b9194
    - name: MIG_UI_REPO
      value: openshift-migration-ui-rhel8@sha256
    - name: MIG_UI_TAG
      value: ac56919a13dd6bbf36ce7a5dfc7696d3dfebe6c9438da64b3bcd1d70b33c549c



We could migrate pvcs with "indirect" migrations from local to remote cluster and from remote to local cluster.

Moved to VERIFIED.

Comment 12 errata-xmlrpc 2021-09-29 14:34:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Migration Toolkit for Containers (MTC) 1.6.0 security & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3694


Note You need to log in before you can comment on or make changes to this bug.