Bug 1861259
| Summary: | migration stuck at StageRestoreCreated status due to PV discovery and resource reconciliation suspended | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Xin jiang <xjiang> |
| Component: | Migration Tooling | Assignee: | John Matthews <jmatthew> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Xin jiang <xjiang> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.4 | CC: | chezhang, sregidor, vlaad, whu |
| Target Milestone: | --- | ||
| Target Release: | 4.5.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-26 13:53:06 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1861267 | ||
Verified using CAM 1.2.5 stage
SOURCE CLUSTER: CAM 3.9 AWS
TARGET CLUSTER: CAM 4.5 AWS
REPLICATION REPOSITORY: AWS S3
openshift-migration-rhel7-operator@sha256:493e56a8b609eadad79d2df8e3c6402b2014ebcb645a939fa34004b056ea1a2e
- name: MIG_CONTROLLER_TAG
value: 35c9a554de83d7dc9c560936c297085bbf05b08202885337addb1e4151b40d40
- name: MIG_UI_REPO
value: openshift-migration-ui-rhel8@sha256
- name: MIG_UI_TAG
value: 6abfaea8ac04e3b5bbf9648a3479b420b4baec35201033471020c9cae1fe1e11
- name: MIGRATION_REGISTRY_REPO
value: openshift-migration-registry-rhel8@sha256
- name: MIGRATION_REGISTRY_TAG
value: 37536b4487d3668a7105737695a0651e6be64720bc72a69da74153a8443ac9e1
- name: VELERO_REPO
value: openshift-migration-velero-rhel8@sha256
- name: VELERO_TAG
value: 461ea0c165ed525d4276056f6aab879dcf011facb00e94acc88ae6e9f33f1637
- name: VELERO_PLUGIN_REPO
value: openshift-migration-plugin-rhel8@sha256
- name: VELERO_PLUGIN_TAG
value: 7b6aa42f4428ab744e354c4095afae460b6e5e4e868969a14b4d1aec541a946a
The result is that the migration is executed without problems and the pvcs are ignored a not migrated.
|
Description of problem: Before enabled "disable_pv_migration=true", user created an migplan with PVs. When user applied "disable_pv_migration=true" on the migrationController and migration-controller pod was restarted, and then tried to migrate the migplan (I am not sure how long I waited, maybe waited 1 or 2 mins). The "disable_pv_migration=true" doesn't take effect on all current plans. I found the migplan still has PVs. The migration will be stuck at StageRestoreCreated status. And it reported "Limited validation; PV discovery and resource reconciliation suspended" Version-Release number of selected component (if applicable): CAM 1.2.4 How reproducible: Often Steps to Reproduce: 1. Apply "disable_pv_migration": true" on the migrationController $ oc patch migrationcontroller migration-controller -p '{"spec":{"disable_pv_migration": true } }' --type='merge' -n openshift-migration 2. Check if the migration-controller is restarted $ oc get pod -n openshift-migration --watch NAME READY STATUS RESTARTS AGE migration-controller-586c9688b8-6wftl 2/2 Running 0 3h59m migration-operator-68bdbf56f7-p67zv 2/2 Running 0 4h1m migration-ui-65f66946c4-tlt62 1/1 Running 0 3h59m registry-13a03a04-a1a2-4ec8-ae26-61670668c1b8-bxxnt-1-deploy 0/1 Completed 0 37m registry-13a03a04-a1a2-4ec8-ae26-61670668c1b8-bxxnt-2-d5g94 1/1 Running 0 35m registry-13a03a04-a1a2-4ec8-ae26-61670668c1b8-bxxnt-2-deploy 0/1 Completed 0 35m registry-8e89ec15-a682-4e8e-a522-c97c66c413bd-zjf4g-1-deploy 0/1 Completed 0 6m17s registry-8e89ec15-a682-4e8e-a522-c97c66c413bd-zjf4g-1-djt55 1/1 Running 0 6m14s restic-fhq4n 1/1 Running 0 4h restic-hqcgt 1/1 Running 0 4h restic-pgb7l 1/1 Running 0 4h velero-5dfcd8d7c9-pcdc6 1/1 Running 0 4h migration-controller-5546545568-ptmh2 0/2 Pending 0 0s migration-controller-5546545568-ptmh2 0/2 Pending 0 0s migration-controller-5546545568-ptmh2 0/2 ContainerCreating 0 0s migration-controller-5546545568-ptmh2 0/2 ContainerCreating 0 2s migration-controller-5546545568-ptmh2 2/2 Running 0 6s migration-controller-586c9688b8-6wftl 2/2 Terminating 0 4h migration-controller-586c9688b8-6wftl 0/2 Terminating 0 4h migration-controller-586c9688b8-6wftl 0/2 Terminating 0 4h migration-controller-586c9688b8-6wftl 0/2 Terminating 0 4h 3. Check the "disable_pv_migration=true" is applied on the MigrationController $ oc get migrationcontrollers -n openshift-migration -o yaml | grep disable f:disable_pv_migration: {} disable_pv_migration: true $ oc get pod -n openshift-migration migration-controller-5546545568-ptmh2 -o yaml | grep EXCLUDED_RESOURCES -A1 k:{"name":"EXCLUDED_RESOURCES"}: .: {} -- - name: EXCLUDED_RESOURCES value: imagetags,templateinstances,clusterserviceversions,packagemanifests,subscriptions,servicebrokers,servicebindings,serviceclasses,serviceinstances,serviceplans,persistentvolumes,persistentvolumeclaims 4. Execute the migplan before the migplan is updated to remove PVs 5. The migration is stuck at StageRestoreCreated status $ oc get migplan -n openshift-migration mysql -o yaml apiVersion: migration.openshift.io/v1alpha1 kind: MigPlan metadata: annotations: openshift.io/touch: 0047956e-d0a3-11ea-8b60-0a580a830033 creationTimestamp: "2020-07-28T07:00:19Z" ...... apiVersion: migration.openshift.io/v1alpha1 kind: MigPlan metadata: annotations: openshift.io/touch: 0047956e-d0a3-11ea-8b60-0a580a830033 creationTimestamp: "2020-07-28T07:00:19Z" generation: 6 managedFields: - apiVersion: migration.openshift.io/v1alpha1 fieldsType: FieldsV1 fieldsV1: f:spec: .: {} f:destMigClusterRef: .: {} f:name: {} f:namespace: {} f:migStorageRef: .: {} f:name: {} f:namespace: {} f:namespaces: {} f:srcMigClusterRef: .: {} f:name: {} f:namespace: {} manager: Mozilla operation: Update time: "2020-07-28T07:01:06Z" - apiVersion: migration.openshift.io/v1alpha1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:openshift.io/touch: {} f:spec: f:persistentVolumes: {} f:status: .: {} f:conditions: {} f:excludedResources: {} f:observedDigest: {} manager: manager operation: Update time: "2020-07-28T07:21:49Z" name: mysql namespace: openshift-migration resourceVersion: "159947" selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migplans/mysql uid: 8e89ec15-a682-4e8e-a522-c97c66c413bd spec: destMigClusterRef: name: host namespace: openshift-migration migStorageRef: name: automatic namespace: openshift-migration namespaces: - mysql persistentVolumes: - capacity: 1Gi name: pvc-247ac434-d09a-11ea-bd2a-fa163e9c1ec4 pvc: accessModes: - ReadWriteOnce hasReference: true name: mysql namespace: mysql selection: action: copy copyMethod: filesystem storageClass: standard storageClass: standard supported: actions: - copy copyMethods: - filesystem - snapshot srcMigClusterRef: name: source-cluster namespace: openshift-migration status: conditions: - category: Required lastTransitionTime: "2020-07-28T07:00:24Z" message: The `persistentVolumes` list has been updated with discovered PVs. reason: Done status: "True" type: PvsDiscovered - category: Required lastTransitionTime: "2020-07-28T07:00:26Z" message: The storage resources have been created. status: "True" type: StorageEnsured - category: Required lastTransitionTime: "2020-07-28T07:00:28Z" message: The migration registry resources have been created. status: "True" type: RegistriesEnsured - category: Required lastTransitionTime: "2020-07-28T07:00:28Z" message: The migration plan is ready. status: "True" type: Ready - category: Advisory lastTransitionTime: "2020-07-28T07:01:49Z" message: Limited validation; PV discovery and resource reconciliation suspended. status: "True" type: Suspended excludedResources: - imagetags - templateinstances - clusterserviceversions - packagemanifests - subscriptions - servicebrokers - servicebindings - serviceclasses - serviceinstances - serviceplans - persistentvolumes - persistentvolumeclaims observedDigest: 605f525329aa790bd0099ed481f27526ddd0f3aae7a906e823766663fafae209 6. During the migration(almost 13 mins), the migplan still is not updated. From the step#5, you also can see it. Actual results: The migplan is stuck at StageRestoreCreated Expected results: I am not sure what's the expected results. Maybe it should check the migplan before executing migration Additional info: