Description of problem: When a migration fails, the data used to create this migration plan is erased and lost. Hence, the next time we try to run this migration (once the problem that made it fail is fixed) the execution will always be made using the default data, and not using the information that the user provided to create the migration plan. Version-Release number of selected component (if applicable): TARGET: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-12-14-230621 True False 22h Error while reconciling 4.2.0-0.nightly-2019-12-14-230621: an unknown error has occurre SOURCE: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-12-14-230621 True False 3h40m Cluster version is 4.2.0-0.nightly-2019-12-14-23062 Controller version 1.0.1 in osbs registry: image: image-registry.openshift-image-registry.svc:5000/rhcam-1-0/openshift-migration-controller-rhel8@sha256:e64551a1dd77021ce9c6bf7c01cd184edd7f163c5c9489bb615845eadb867dc7 Migration UI version 1.0.1 osbs registry: image: image-registry.openshift-image-registry.svc:5000/rhcam-1-0/openshift-migration-ui-rhel8@sha256:3b84f2053fb58d42771e1a8aece8037ed39f863c5349b89d313000ba7d905641 How reproducible: Always Steps to Reproduce: 1. Create a migration plan with PVCs changing the default information in the create migration plan screens. (for instance select "snapshot" or select another destination storage class, just change the default information) 2. Check the migration plan, the "persistentVolumes" section will show the information you selected. oc get migplan -o yaml -n openshift-migration 3. Migrate the plan, and force it to fail 4. Check the migration plan after the failure: the "persitentVolumes" section has disappeared. 5. Run the migration plan again. 6. Check the migration plan again: the values in the "persisteVolumes" section used in this second migration are the default ones, and not the ones you selected to create the migration plan. Actual results: The migration executed after the first migration failed, will use the default data and not the data selected when the migration plan was created. Expected results: When we run again a migration plan that has failed, the second execution should use the same data used when the plan was created. Additional info: With the UI disabled (in order to discard a UI problem): $ oc get deployments NAME READY UP-TO-DATE AVAILABLE AGE migration-controller 1/1 1 1 3h7m migration-operator 1/1 1 1 3h21m migration-ui 0/0 0 0 3h7m velero 1/1 1 1 3h7m This is the original data of the migratoin plan before the failure: apiVersion: migration.openshift.io/v1alpha1 kind: MigPlan metadata: annotations: touch: 806e8b50-40eb-4ebe-9ef6-72fd855d2578 creationTimestamp: "2019-12-18T13:41:10Z" generation: 4 name: tobefailed-noui namespace: openshift-migration resourceVersion: "581523" selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migplans/tobefailed-noui uid: 0ccb7eae-219c-11ea-8ff9-42010a000004 spec: destMigClusterRef: name: host namespace: openshift-migration migStorageRef: name: gcp namespace: openshift-migration namespaces: - tobefailed-noui persistentVolumes: - capacity: 1Gi name: pvc-df9a78ae-219b-11ea-9836-42010a000005 pvc: accessModes: - ReadWriteOnce name: nginx-logs namespace: tobefailed-noui selection: action: copy copyMethod: snapshot storageClass: standard storageClass: standard supported: actions: - copy copyMethods: - filesystem - snapshot - capacity: 1Gi name: pvc-dfab0dc3-219b-11ea-9836-42010a000005 pvc: accessModes: - ReadWriteOnce name: nginx-html namespace: tobefailed-noui selection: action: copy copyMethod: snapshot storageClass: standard storageClass: standard supported: actions: - copy copyMethods: - filesystem - snapshot srcMigClusterRef: name: gcp42 namespace: openshift-migration status: conditions: - category: Required lastTransitionTime: "2019-12-18T13:41:14Z" message: The `persistentVolumes` list has been updated with discovered PVs. reason: Done status: "True" type: PvsDiscovered - category: Required lastTransitionTime: "2019-12-18T13:41:15Z" message: The storage resources have been created. status: "True" type: StorageEnsured - category: Required lastTransitionTime: "2019-12-18T13:41:16Z" message: The migration registry resources have been created. status: "True" type: RegistriesEnsured - category: Required lastTransitionTime: "2019-12-18T13:41:16Z" message: The migration plan is ready. status: "True" type: Ready - category: Warn lastTransitionTime: "2019-12-18T13:41:39Z" message: CopyMethod for PV in `persistentVolumes` [pvc-df9a78ae-219b-11ea-9836-42010a000005,pvc-dfab0dc3-219b-11ea-9836-42010a000005] is set to `snapshot`. Make sure that the chosen storage class is compatible with the source volume's storage type for Snapshot support. status: "True" type: PvWarnCopyMethodSnapshot This is the state of the plan after the migration failure (persistetVolumes section has been erased) apiVersion: migration.openshift.io/v1alpha1 kind: MigPlan metadata: annotations: touch: b2459512-dc4b-42ee-ad10-79b889d657d5 creationTimestamp: "2019-12-18T13:41:10Z" generation: 6 name: tobefailed-noui namespace: openshift-migration resourceVersion: "582937" selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migplans/tobefailed-noui uid: 0ccb7eae-219c-11ea-8ff9-42010a000004 spec: destMigClusterRef: name: host namespace: openshift-migration migStorageRef: name: gcp namespace: openshift-migration namespaces: - tobefailed-noui srcMigClusterRef: name: gcp42 namespace: openshift-migration status: conditions: - category: Required lastTransitionTime: "2019-12-18T13:41:14Z" message: The `persistentVolumes` list has been updated with discovered PVs. reason: Done status: "True" type: PvsDiscovered - category: Required lastTransitionTime: "2019-12-18T13:41:15Z" message: The storage resources have been created. status: "True" type: StorageEnsured - category: Required lastTransitionTime: "2019-12-18T13:41:16Z" message: The migration registry resources have been created. status: "True" type: RegistriesEnsured - category: Required lastTransitionTime: "2019-12-18T13:41:16Z" message: The migration plan is ready. status: "True" type: Ready
This is the migmigration that failed, so that we can track the phase where it failed (StageBackupFailed). apiVersion: migration.openshift.io/v1alpha1 kind: MigMigration metadata: annotations: touch: a2b7832b-c2c0-4fe6-a09e-0292c6cfae0c creationTimestamp: "2019-12-18T13:44:26Z" generation: 14 name: trying-without-using-ui namespace: openshift-migration ownerReferences: - apiVersion: migration.openshift.io/v1alpha1 kind: MigPlan name: tobefailed-noui uid: 0ccb7eae-219c-11ea-8ff9-42010a000004 resourceVersion: "582790" selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/trying-without-using-ui uid: 81a80b72-219c-11ea-9c4c-42010a000006 spec: migPlanRef: name: tobefailed-noui namespace: openshift-migration quiescePods: true stage: false status: conditions: - category: Advisory durable: true lastTransitionTime: "2019-12-18T13:45:24Z" message: 'The migration has failed. See: Errors.' reason: StageBackupFailed status: "True" type: Failed errors: - 'Backup: openshift-migration/trying-without-using-ui-wwb9h partially failed.' phase: Completed startTimestamp: "2019-12-18T13:44:26Z"
The failed migration has: quiescePods: true. When the migration fails and the plan is un-suspended which resumes PV discovery. Since the pod has been scaled-down, the PV is no longer found during PV discovery and is removed from the list. This is working as designed.
To remedy this, the user will need to scale the pod back up. Once the application pod is up and running, the PV list will be repopulated by the controller discovery. Unfortunately, the user's choices will be gone. For now, let's document this a a known issue.
The behavior noted will remain for CAM 1.2.x, i.e if a migration fails and is restarted the previous selections for PV migration are lost. We do not intend to fix this in the 1.2 z-stream. We have a known issue documented here: https://github.com/openshift/openshift-docs/pull/19021/files We will keep a clone of this BZ open against future release to consider modifying the behavior. Future tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1831616