Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1861259

Summary: migration stuck at StageRestoreCreated status due to PV discovery and resource reconciliation suspended
Product: OpenShift Container Platform Reporter: Xin jiang <xjiang>
Component: Migration ToolingAssignee: John Matthews <jmatthew>
Status: CLOSED CURRENTRELEASE QA Contact: Xin jiang <xjiang>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.4CC: chezhang, sregidor, vlaad, whu
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-26 13:53:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1861267    

Description Xin jiang 2020-07-28 07:51:12 UTC
Description of problem:

Before enabled "disable_pv_migration=true", user created an migplan with PVs. When user applied "disable_pv_migration=true" on the migrationController and migration-controller pod was restarted, and then tried to migrate the migplan (I am not sure how long I waited, maybe waited 1 or 2 mins). The "disable_pv_migration=true" doesn't take effect on all current plans. I found the migplan still has PVs. The migration will be stuck at StageRestoreCreated status. And it reported "Limited validation; PV discovery and resource reconciliation suspended"

Version-Release number of selected component (if applicable):
CAM 1.2.4

How reproducible:
Often

Steps to Reproduce:
1. Apply "disable_pv_migration": true" on the migrationController
$ oc patch migrationcontroller migration-controller -p '{"spec":{"disable_pv_migration": true } }' --type='merge' -n openshift-migration

2. Check if the migration-controller is restarted
$ oc get pod -n openshift-migration --watch
NAME                                                           READY   STATUS      RESTARTS   AGE
migration-controller-586c9688b8-6wftl                          2/2     Running     0          3h59m
migration-operator-68bdbf56f7-p67zv                            2/2     Running     0          4h1m
migration-ui-65f66946c4-tlt62                                  1/1     Running     0          3h59m
registry-13a03a04-a1a2-4ec8-ae26-61670668c1b8-bxxnt-1-deploy   0/1     Completed   0          37m
registry-13a03a04-a1a2-4ec8-ae26-61670668c1b8-bxxnt-2-d5g94    1/1     Running     0          35m
registry-13a03a04-a1a2-4ec8-ae26-61670668c1b8-bxxnt-2-deploy   0/1     Completed   0          35m
registry-8e89ec15-a682-4e8e-a522-c97c66c413bd-zjf4g-1-deploy   0/1     Completed   0          6m17s
registry-8e89ec15-a682-4e8e-a522-c97c66c413bd-zjf4g-1-djt55    1/1     Running     0          6m14s
restic-fhq4n                                                   1/1     Running     0          4h
restic-hqcgt                                                   1/1     Running     0          4h
restic-pgb7l                                                   1/1     Running     0          4h
velero-5dfcd8d7c9-pcdc6                                        1/1     Running     0          4h
migration-controller-5546545568-ptmh2                          0/2     Pending     0          0s
migration-controller-5546545568-ptmh2                          0/2     Pending     0          0s
migration-controller-5546545568-ptmh2                          0/2     ContainerCreating   0          0s
migration-controller-5546545568-ptmh2                          0/2     ContainerCreating   0          2s
migration-controller-5546545568-ptmh2                          2/2     Running             0          6s
migration-controller-586c9688b8-6wftl                          2/2     Terminating         0          4h
migration-controller-586c9688b8-6wftl                          0/2     Terminating         0          4h
migration-controller-586c9688b8-6wftl                          0/2     Terminating         0          4h
migration-controller-586c9688b8-6wftl                          0/2     Terminating         0          4h

3. Check the "disable_pv_migration=true" is applied on the MigrationController
$ oc get migrationcontrollers -n openshift-migration -o yaml | grep disable
          f:disable_pv_migration: {}
    disable_pv_migration: true

$ oc get pod -n openshift-migration migration-controller-5546545568-ptmh2 -o yaml | grep EXCLUDED_RESOURCES -A1
              k:{"name":"EXCLUDED_RESOURCES"}:
                .: {}
--
    - name: EXCLUDED_RESOURCES
      value: imagetags,templateinstances,clusterserviceversions,packagemanifests,subscriptions,servicebrokers,servicebindings,serviceclasses,serviceinstances,serviceplans,persistentvolumes,persistentvolumeclaims

4. Execute the migplan before the migplan is updated to remove PVs

5. The migration is stuck at StageRestoreCreated status
$ oc get migplan -n openshift-migration mysql -o yaml
apiVersion: migration.openshift.io/v1alpha1
kind: MigPlan
metadata:
  annotations:
    openshift.io/touch: 0047956e-d0a3-11ea-8b60-0a580a830033
  creationTimestamp: "2020-07-28T07:00:19Z"
......
apiVersion: migration.openshift.io/v1alpha1
kind: MigPlan
metadata:
  annotations:
    openshift.io/touch: 0047956e-d0a3-11ea-8b60-0a580a830033
  creationTimestamp: "2020-07-28T07:00:19Z"
  generation: 6
  managedFields:
  - apiVersion: migration.openshift.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        .: {}
        f:destMigClusterRef:
          .: {}
          f:name: {}
          f:namespace: {}
        f:migStorageRef:
          .: {}
          f:name: {}
          f:namespace: {}
        f:namespaces: {}
        f:srcMigClusterRef:
          .: {}
          f:name: {}
          f:namespace: {}
    manager: Mozilla
    operation: Update
    time: "2020-07-28T07:01:06Z"
  - apiVersion: migration.openshift.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:openshift.io/touch: {}
      f:spec:
        f:persistentVolumes: {}
      f:status:
        .: {}
        f:conditions: {}
        f:excludedResources: {}
        f:observedDigest: {}
    manager: manager
    operation: Update
    time: "2020-07-28T07:21:49Z"
  name: mysql
  namespace: openshift-migration
  resourceVersion: "159947"
  selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migplans/mysql
  uid: 8e89ec15-a682-4e8e-a522-c97c66c413bd
spec:
  destMigClusterRef:
    name: host
    namespace: openshift-migration
  migStorageRef:
    name: automatic
    namespace: openshift-migration
  namespaces:
  - mysql
  persistentVolumes:
  - capacity: 1Gi
    name: pvc-247ac434-d09a-11ea-bd2a-fa163e9c1ec4
    pvc:
      accessModes:
      - ReadWriteOnce
      hasReference: true
      name: mysql
      namespace: mysql
    selection:
      action: copy
      copyMethod: filesystem
      storageClass: standard
    storageClass: standard
    supported:
      actions:
      - copy
      copyMethods:
      - filesystem
      - snapshot
  srcMigClusterRef:
    name: source-cluster
    namespace: openshift-migration
status:
  conditions:
  - category: Required
    lastTransitionTime: "2020-07-28T07:00:24Z"
    message: The `persistentVolumes` list has been updated with discovered PVs.
    reason: Done
    status: "True"
    type: PvsDiscovered
  - category: Required
    lastTransitionTime: "2020-07-28T07:00:26Z"
    message: The storage resources have been created.
    status: "True"
    type: StorageEnsured
  - category: Required
    lastTransitionTime: "2020-07-28T07:00:28Z"
    message: The migration registry resources have been created.
    status: "True"
    type: RegistriesEnsured
  - category: Required
    lastTransitionTime: "2020-07-28T07:00:28Z"
    message: The migration plan is ready.
    status: "True"
    type: Ready
  - category: Advisory
    lastTransitionTime: "2020-07-28T07:01:49Z"
    message: Limited validation; PV discovery and resource reconciliation suspended.
    status: "True"
    type: Suspended
  excludedResources:
  - imagetags
  - templateinstances
  - clusterserviceversions
  - packagemanifests
  - subscriptions
  - servicebrokers
  - servicebindings
  - serviceclasses
  - serviceinstances
  - serviceplans
  - persistentvolumes
  - persistentvolumeclaims
  observedDigest: 605f525329aa790bd0099ed481f27526ddd0f3aae7a906e823766663fafae209

6. During the migration(almost 13 mins), the migplan still is not updated. From the step#5, you also can see it.

Actual results:
The migplan is stuck at StageRestoreCreated

Expected results:
I am not sure what's the expected results. Maybe it should check the migplan before executing migration

Additional info:

Comment 2 Sergio 2020-08-21 09:43:56 UTC
Verified using CAM 1.2.5 stage
SOURCE CLUSTER: CAM 3.9 AWS
TARGET CLUSTER: CAM 4.5 AWS
REPLICATION REPOSITORY: AWS S3

openshift-migration-rhel7-operator@sha256:493e56a8b609eadad79d2df8e3c6402b2014ebcb645a939fa34004b056ea1a2e

    - name: MIG_CONTROLLER_TAG
      value: 35c9a554de83d7dc9c560936c297085bbf05b08202885337addb1e4151b40d40
    - name: MIG_UI_REPO
      value: openshift-migration-ui-rhel8@sha256
    - name: MIG_UI_TAG
      value: 6abfaea8ac04e3b5bbf9648a3479b420b4baec35201033471020c9cae1fe1e11
    - name: MIGRATION_REGISTRY_REPO
      value: openshift-migration-registry-rhel8@sha256
    - name: MIGRATION_REGISTRY_TAG
      value: 37536b4487d3668a7105737695a0651e6be64720bc72a69da74153a8443ac9e1
    - name: VELERO_REPO
      value: openshift-migration-velero-rhel8@sha256
    - name: VELERO_TAG
      value: 461ea0c165ed525d4276056f6aab879dcf011facb00e94acc88ae6e9f33f1637
    - name: VELERO_PLUGIN_REPO
      value: openshift-migration-plugin-rhel8@sha256
    - name: VELERO_PLUGIN_TAG
      value: 7b6aa42f4428ab744e354c4095afae460b6e5e4e868969a14b4d1aec541a946a


The result is that the migration is executed without problems and the pvcs are ignored a not migrated.