Description of problem: a direct migration may stuck for a long time as the DVM pods are created and stay in Pending as they are unable to mount a PVC due to it being stuck in a Terminating state. Version-Release number of selected component (if applicable): MTC 1.4.0 registry.redhat.io/rhmtc/openshift-migration-controller-rhel8@sha256:4a29345d11d4d7b8cc8a6a5395a398a5c5f92bff6e2ad396caf6dd73731a8f4d registry.redhat.io/rhmtc/openshift-migration-rhel7-operator@sha256:51c38fd418c923992375c9ad18e5db1c14e6d77d3d7a02803df33c64c9bece2f How reproducible: Always Steps to Reproduce: 1) Create a Pod that mounts a PVC 2) Create a MigPlan that references the Pod and PVC with DVM=true 3) Delete the PVC while it is mounted to the Pod, PVC will be in terminating 4) Run a migration with the MigPlan alternatively, you can use below automation to reproduce it $ ansible-playbook -i inventory.cam.yml ocp-32834-pvc-terminating.yml -e @config/direct_copy_defaults.yml Actual results: DVM pods are stuck at ContainerCreating. $ oc get pod -n ocp-32834-pvc-terminating NAME READY STATUS RESTARTS AGE directvolumemigration-rsync-transfer-nginx-html 0/1 ContainerCreating 0 29m directvolumemigration-rsync-transfer-nginx-logs 0/1 ContainerCreating 0 29m directvolumemigration-stunnel-transfer 1/1 Running 0 29m nginx-deployment-6fd5f9ddf8-r6p9j 1/1 Running 0 32m Expected results: The direct migration should fail waiting for a period of time or it should check the PVC status before starting direct migration Additional info: $ oc describe pod directvolumemigration-rsync-transfer-nginx-html -n ocp-32834-pvc-terminating Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedMount 20s (x3 over 4m) kubelet, ip-172-18-4-43.ec2.internal Unable to mount volumes for pod "directvolumemigration-rsync-transfer-nginx-html_ocp-32834-pvc-terminating(1c1ad6d5-56db-11eb-9e3c-0eb55368412b)": timeout expired waiting for volumes to attach or mount for pod "ocp-32834-pvc-terminating"/"directvolumemigration-rsync-transfer-nginx-html". list of unmounted volumes=[nginx-html]. list of unattached volumes=[nginx-html default-token-pm4vm]
This is something we'd like to fix but not something I'd consider to be critically severe.
in MTC 3.11(controller)-> 4.7, it has different behavior. Not sure if they are same reason that Unable to mount volumes for pod. 1. I didn't see DVM pod was started creating. $ oc get migmigration 3db53140-6143-11eb-8e38-431dc1d3e8a0 -o yaml .... status: conditions: - category: Advisory lastTransitionTime: "2021-01-28T08:32:41Z" message: 'Step: 36/47' reason: WaitForDirectVolumeMigrationToComplete status: "True" type: Running - category: Required lastTransitionTime: "2021-01-28T08:31:39Z" message: The migration is ready. status: "True" type: Ready itinerary: Final observedDigest: f19ac39779c0d0ee1443c3580dad86e38eafa02a1cb4bdff18cac9c14b520005 phase: WaitForDirectVolumeMigrationToComplete pipeline: - completed: "2021-01-28T08:32:08Z" message: Completed name: Prepare started: "2021-01-28T08:31:39Z" - completed: "2021-01-28T08:32:33Z" message: Completed name: Backup progress: - 'Backup openshift-migration/3db53140-6143-11eb-8e38-431dc1d3e8a0-ddxlj: 76 out of estimated total of 76 objects backed up (15s)' started: "2021-01-28T08:32:08Z" - completed: "2021-01-28T08:32:39Z" message: Completed name: StageBackup started: "2021-01-28T08:32:33Z" - message: Skipped name: StageRestore skipped: true - completed: "2021-01-28T08:32:41Z" message: Waiting for Direct Image Migration to complete. name: DirectImage phase: WaitForDirectImageMigrationToComplete progress: - 1 total ImageStreams; 0 running; 1 successful; 0 failed - 'ImageStream ocp-django/django-psql-persistent (dism openshift-migration/3db53140-6143-11eb-8e38-431dc1d3e8a0-6v5nt-k8ngv): Completed ' started: "2021-01-28T08:32:39Z" - name: DirectVolume phase: WaitForDirectVolumeMigrationToComplete progress: - 1 total volumes; 0 successful; 0 running; 0 failed started: "2021-01-28T08:32:41Z" - message: Not started name: Restore - message: Not started name: Cleanup startTimestamp: "2021-01-28T08:31:39Z" $ oc get event -n ocp-django ...... 1h 1h 1 django-psql-persistent-1-deploy.165e562824ce6e67 Pod spec.containers{deployment} Normal Killing kubelet, ip-172-18-7-104.ec2.internal Killing container with id docker://deployment:Need to kill Pod 45m 45m 1 postgresql.165e584151ad2755 DeploymentConfig Normal ReplicationControllerScaled deploymentconfig-controller Scaled replication controller "postgresql-1" from 1 to 0 45m 45m 1 django-psql-persistent-1-vbdtm.165e58417b74be00 Pod spec.containers{django-psql-persistent} Normal Killing kubelet, ip-172-18-13-30.ec2.internal Killing container with id docker://django-psql-persistent:Need to kill Pod 45m 45m 1 postgresql-1.165e58415a03fdc9 ReplicationController Normal SuccessfulDelete replication-controller Deleted pod: postgresql-1-j5kwc 45m 45m 1 django-psql-persistent-1.165e5841538da3c3 ReplicationController Normal SuccessfulDelete replication-controller Deleted pod: django-psql-persistent-1-vbdtm 45m 45m 1 django-psql-persistent.165e58414fdba442 DeploymentConfig Normal ReplicationControllerScaled deploymentconfig-controller Scaled replication controller "django-psql-persistent-1" from 1 to 0 45m 45m 1 postgresql-1-j5kwc.165e58416275175a Pod spec.containers{postgresql} Normal Killing kubelet, ip-172-18-9-34.ec2.internal Killing container with id docker://postgresql:Need to kill Pod
https://github.com/konveyor/mig-controller/pull/958 the cp PR to bring change in the release branch is: https://github.com/konveyor/mig-controller/pull/972
verified. talked with Jaydip, it just shows up a warning on UI as below. The whole migration still is stuck there as if problem is fixed, the migration can continue the rest phases, it won't wast time. Warning alert:Paused - waiting for route to be admitted Pods directvolumemigration-rsync-transfer-mysql/ocp-24769-cakephp are stuck in Pending state for more than 10 mins
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Migration Toolkit for Containers (MTC) image release advisory 1.4.2), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0814