Bug 1784899
| Summary: | MigPlan data lost after failed migration | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Sergio <sregidor> | |
| Component: | Migration Tooling | Assignee: | Jeff Ortel <jortel> | |
| Status: | CLOSED WONTFIX | QA Contact: | Sergio <sregidor> | |
| Severity: | medium | Docs Contact: | Avital Pinnick <apinnick> | |
| Priority: | medium | |||
| Version: | 4.2.0 | CC: | apinnick, chezhang, jmatthew, jortel, rpattath, xjiang | |
| Target Milestone: | --- | |||
| Target Release: | 4.4.z | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1831616 (view as bug list) | Environment: | ||
| Last Closed: | 2020-07-27 16:03:30 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1831616 | |||
| Bug Blocks: | ||||
This is the migmigration that failed, so that we can track the phase where it failed (StageBackupFailed).
apiVersion: migration.openshift.io/v1alpha1
kind: MigMigration
metadata:
annotations:
touch: a2b7832b-c2c0-4fe6-a09e-0292c6cfae0c
creationTimestamp: "2019-12-18T13:44:26Z"
generation: 14
name: trying-without-using-ui
namespace: openshift-migration
ownerReferences:
- apiVersion: migration.openshift.io/v1alpha1
kind: MigPlan
name: tobefailed-noui
uid: 0ccb7eae-219c-11ea-8ff9-42010a000004
resourceVersion: "582790"
selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/trying-without-using-ui
uid: 81a80b72-219c-11ea-9c4c-42010a000006
spec:
migPlanRef:
name: tobefailed-noui
namespace: openshift-migration
quiescePods: true
stage: false
status:
conditions:
- category: Advisory
durable: true
lastTransitionTime: "2019-12-18T13:45:24Z"
message: 'The migration has failed. See: Errors.'
reason: StageBackupFailed
status: "True"
type: Failed
errors:
- 'Backup: openshift-migration/trying-without-using-ui-wwb9h partially failed.'
phase: Completed
startTimestamp: "2019-12-18T13:44:26Z"
The failed migration has: quiescePods: true. When the migration fails and the plan is un-suspended which resumes PV discovery. Since the pod has been scaled-down, the PV is no longer found during PV discovery and is removed from the list. This is working as designed. To remedy this, the user will need to scale the pod back up. Once the application pod is up and running, the PV list will be repopulated by the controller discovery. Unfortunately, the user's choices will be gone. For now, let's document this a a known issue. The behavior noted will remain for CAM 1.2.x, i.e if a migration fails and is restarted the previous selections for PV migration are lost. We do not intend to fix this in the 1.2 z-stream. We have a known issue documented here: https://github.com/openshift/openshift-docs/pull/19021/files We will keep a clone of this BZ open against future release to consider modifying the behavior. Future tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1831616 |
Description of problem: When a migration fails, the data used to create this migration plan is erased and lost. Hence, the next time we try to run this migration (once the problem that made it fail is fixed) the execution will always be made using the default data, and not using the information that the user provided to create the migration plan. Version-Release number of selected component (if applicable): TARGET: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-12-14-230621 True False 22h Error while reconciling 4.2.0-0.nightly-2019-12-14-230621: an unknown error has occurre SOURCE: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-12-14-230621 True False 3h40m Cluster version is 4.2.0-0.nightly-2019-12-14-23062 Controller version 1.0.1 in osbs registry: image: image-registry.openshift-image-registry.svc:5000/rhcam-1-0/openshift-migration-controller-rhel8@sha256:e64551a1dd77021ce9c6bf7c01cd184edd7f163c5c9489bb615845eadb867dc7 Migration UI version 1.0.1 osbs registry: image: image-registry.openshift-image-registry.svc:5000/rhcam-1-0/openshift-migration-ui-rhel8@sha256:3b84f2053fb58d42771e1a8aece8037ed39f863c5349b89d313000ba7d905641 How reproducible: Always Steps to Reproduce: 1. Create a migration plan with PVCs changing the default information in the create migration plan screens. (for instance select "snapshot" or select another destination storage class, just change the default information) 2. Check the migration plan, the "persistentVolumes" section will show the information you selected. oc get migplan -o yaml -n openshift-migration 3. Migrate the plan, and force it to fail 4. Check the migration plan after the failure: the "persitentVolumes" section has disappeared. 5. Run the migration plan again. 6. Check the migration plan again: the values in the "persisteVolumes" section used in this second migration are the default ones, and not the ones you selected to create the migration plan. Actual results: The migration executed after the first migration failed, will use the default data and not the data selected when the migration plan was created. Expected results: When we run again a migration plan that has failed, the second execution should use the same data used when the plan was created. Additional info: With the UI disabled (in order to discard a UI problem): $ oc get deployments NAME READY UP-TO-DATE AVAILABLE AGE migration-controller 1/1 1 1 3h7m migration-operator 1/1 1 1 3h21m migration-ui 0/0 0 0 3h7m velero 1/1 1 1 3h7m This is the original data of the migratoin plan before the failure: apiVersion: migration.openshift.io/v1alpha1 kind: MigPlan metadata: annotations: touch: 806e8b50-40eb-4ebe-9ef6-72fd855d2578 creationTimestamp: "2019-12-18T13:41:10Z" generation: 4 name: tobefailed-noui namespace: openshift-migration resourceVersion: "581523" selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migplans/tobefailed-noui uid: 0ccb7eae-219c-11ea-8ff9-42010a000004 spec: destMigClusterRef: name: host namespace: openshift-migration migStorageRef: name: gcp namespace: openshift-migration namespaces: - tobefailed-noui persistentVolumes: - capacity: 1Gi name: pvc-df9a78ae-219b-11ea-9836-42010a000005 pvc: accessModes: - ReadWriteOnce name: nginx-logs namespace: tobefailed-noui selection: action: copy copyMethod: snapshot storageClass: standard storageClass: standard supported: actions: - copy copyMethods: - filesystem - snapshot - capacity: 1Gi name: pvc-dfab0dc3-219b-11ea-9836-42010a000005 pvc: accessModes: - ReadWriteOnce name: nginx-html namespace: tobefailed-noui selection: action: copy copyMethod: snapshot storageClass: standard storageClass: standard supported: actions: - copy copyMethods: - filesystem - snapshot srcMigClusterRef: name: gcp42 namespace: openshift-migration status: conditions: - category: Required lastTransitionTime: "2019-12-18T13:41:14Z" message: The `persistentVolumes` list has been updated with discovered PVs. reason: Done status: "True" type: PvsDiscovered - category: Required lastTransitionTime: "2019-12-18T13:41:15Z" message: The storage resources have been created. status: "True" type: StorageEnsured - category: Required lastTransitionTime: "2019-12-18T13:41:16Z" message: The migration registry resources have been created. status: "True" type: RegistriesEnsured - category: Required lastTransitionTime: "2019-12-18T13:41:16Z" message: The migration plan is ready. status: "True" type: Ready - category: Warn lastTransitionTime: "2019-12-18T13:41:39Z" message: CopyMethod for PV in `persistentVolumes` [pvc-df9a78ae-219b-11ea-9836-42010a000005,pvc-dfab0dc3-219b-11ea-9836-42010a000005] is set to `snapshot`. Make sure that the chosen storage class is compatible with the source volume's storage type for Snapshot support. status: "True" type: PvWarnCopyMethodSnapshot This is the state of the plan after the migration failure (persistetVolumes section has been erased) apiVersion: migration.openshift.io/v1alpha1 kind: MigPlan metadata: annotations: touch: b2459512-dc4b-42ee-ad10-79b889d657d5 creationTimestamp: "2019-12-18T13:41:10Z" generation: 6 name: tobefailed-noui namespace: openshift-migration resourceVersion: "582937" selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migplans/tobefailed-noui uid: 0ccb7eae-219c-11ea-8ff9-42010a000004 spec: destMigClusterRef: name: host namespace: openshift-migration migStorageRef: name: gcp namespace: openshift-migration namespaces: - tobefailed-noui srcMigClusterRef: name: gcp42 namespace: openshift-migration status: conditions: - category: Required lastTransitionTime: "2019-12-18T13:41:14Z" message: The `persistentVolumes` list has been updated with discovered PVs. reason: Done status: "True" type: PvsDiscovered - category: Required lastTransitionTime: "2019-12-18T13:41:15Z" message: The storage resources have been created. status: "True" type: StorageEnsured - category: Required lastTransitionTime: "2019-12-18T13:41:16Z" message: The migration registry resources have been created. status: "True" type: RegistriesEnsured - category: Required lastTransitionTime: "2019-12-18T13:41:16Z" message: The migration plan is ready. status: "True" type: Ready