Bug 1887526

Summary: "Stage" pods fail when migrating from classic OpenShift source cluster on IBM Cloud with block storage
Product: Migration Toolkit for Containers Reporter: John Matthews <jmatthew>
Component: GeneralAssignee: Jaydip Gabani <jgabani>
Status: CLOSED ERRATA QA Contact: Xin jiang <xjiang>
Severity: low Docs Contact: Avital Pinnick <apinnick>
Priority: urgent    
Version: 1.3.0CC: ernelson, rjohnson, sregidor, sseago
Target Milestone: ---   
Target Release: 1.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-29 14:34:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Matthews 2020-10-12 17:51:32 UTC
Description of problem:

See upstream issue for more background:
https://github.com/konveyor/mig-controller/issues/694#issuecomment-706242906


MTC creates 'stage pods' to aid with ensuring a PV is mounted to a Pod, the approach is to launch a stage pod on the same node as the existing Pod which is consuming the PV.  

For 'Block' storage PVs, they are typically RWO, so only one pod may mount them at a time.  Our understanding was that this restriction was enforced at the Node level, so it'd be possible for 2 pods to mount the same RWO PV if it was scheduled on same Node.  

Seeing the behavior with IBM Block storage in IBM ROKS we are questioning our approach and need to reexamine how we should approach stage pods.

Comment 3 Scott Seago 2020-10-12 21:15:18 UTC
Here's what we'll need to do to resolve this:
1) swap quiesce and "create stage pods" phase order, since we will need to create stage pods for PVCs if we're quiescing
2) make sure that for PVCs that are mounted by more than one pod only one of these pods gets the restic annotation -- this will keep us from failing restore when we have ROX PVCs that must be mounted RWO for restore, and it keeps restic from attempting to backup/restore a volume more than once.
3) Only create new stage pods for the disconnected and quiesced PVCs, use live application pods for those that are going to be live through stage backup -- add restic annotations for volumes to back up to these live application pods
4) On stage restore, convert live application pods to stage pods, including only mounting PVCs that have corresponding restic annotations


Comment 5 Jaydip Gabani 2021-08-11 17:17:57 UTC
These two PRs implements the solution and resolves this issue of stage pod failing.

https://github.com/konveyor/mig-controller/pull/1164

https://github.com/openshift/openshift-velero-plugin/pull/97

Comment 10 Sergio 2021-09-15 13:32:10 UTC
Verified using:
REMOTE CLUSTER: AWS OCP 3.11 GP2
LOCAL CLUSTER: ROKS OCP 4.7 ibmc-block-gold  (controller + UI)
REPLICATION REPOSITORY: AWS S3

openshift-migration-rhel8-operator@sha256:1b93e062d7f1d6242634f3f6f8306c1a13ae47212c7aa805e248b6fda8b43409
    - name: MIG_CONTROLLER_REPO
      value: openshift-migration-controller-rhel8@sha256
    - name: MIG_CONTROLLER_TAG
      value: 3b5efa9c8197fe0313a2ab7eb184d135ba9749c9a4f0d15a6abb11c0d18b9194
    - name: MIG_UI_REPO
      value: openshift-migration-ui-rhel8@sha256
    - name: MIG_UI_TAG
      value: ac56919a13dd6bbf36ce7a5dfc7696d3dfebe6c9438da64b3bcd1d70b33c549c



We could migrate pvcs with "indirect" migrations from local to remote cluster and from remote to local cluster.

Moved to VERIFIED.

Comment 12 errata-xmlrpc 2021-09-29 14:34:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Migration Toolkit for Containers (MTC) 1.6.0 security & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3694