Created attachment 1804636 [details] More details about the Pod in NotReady state Description of problem: 1) Used Fedora32 containerDisk Image 2) Created a 2.5GB /home/fedora/disksump.img inside the Fedora VMI via the cloud-init "runcmd" 3) Created 100 VMI and 100 VMIM objects to Live Migrate each of them in a loop. I see an issue, which is frequently recurring, that is( Live Migration gets stuck ) a) VMIM is applied for the VMI. b) New VMI Pod is created and it enters Running state on the Target Node c) The old VMI Pod enters into "Not Ready" State [kbidarka@localhost vmim]$ oc get pods virt-launcher-vm8-fedora-gmgjt -n kbidarka NAME READY STATUS RESTARTS AGE virt-launcher-vm8-fedora-gmgjt 1/2 NotReady 0 7h1m d) The old VMI Pod when the VMI live migrates, fails to get deleted ( In this case compute container in the pod ) and enters from Running to NotReady State , though a new VMI Pod gets created successfully. e) But overall the Pod status of this NotReady Pod is "Running" , when looking at Pod.status.phase f)And now when a new VMIM object is applied for the (2 nd time) Live Migration never happens and the VMIM object stays in the "Pending" phase, it fails to create a new VMI Pod. Assumption here: My guess is, as we do not allow Live Migration (or new VMI Pod creation) to happen when there are 2 VMI pods already in the "Running" Phase, we get into this state. ---- As seen below we see qemu zombie process root@virt-launcher-vm8-fedora-gmgjt /]# ps -ef|grep -i qemu root 1 0 0 09:37 ? 00:00:05 /usr/bin/virt-launcher --qemu-timeout 5m --name vm8-fedora --uid 43e56583-4bd3-4348-804a-7a6a6ab743cd --namespace kbidarka --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --less-pvc-space-toleration 10 --ovmf-path /usr/share/OVMF root 20 1 0 09:37 ? 00:00:25 /usr/bin/virt-launcher --qemu-timeout 5m --name vm8-fedora --uid 43e56583-4bd3-4348-804a-7a6a6ab743cd --namespace kbidarka --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --less-pvc-space-toleration 10 --ovmf-path /usr/share/OVMF --no-fork true qemu 96 1 0 09:37 ? 00:00:11 [qemu-kvm] <defunct> Version-Release number of selected component (if applicable): CNV-2.6.6 How reproducible: Setup: All the 3 worker nodes had 100% CPU Util and 125% CPU Saturation Using stress-ng image. Steps to Reproduce: 1) Used Fedora32 containerDisk Image 2) Created a 2.5GB /home/fedora/disksump.img inside the Fedora VMI via the cloud-init "runcmd" 3) Created 100 VMI and 100 VMIM objects to Live Migrate each of them in a loop. Actual results: 1) VMI pod does not Terminate successfully and a qemu zombie process as seen below. 2) Live Migration is stuck root@virt-launcher-vm8-fedora-gmgjt /]# ps -ef|grep -i qemu root 1 0 0 09:37 ? 00:00:05 /usr/bin/virt-launcher --qemu-timeout 5m --name vm8-fedora --uid 43e56583-4bd3-4348-804a-7a6a6ab743cd --namespace kbidarka --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --less-pvc-space-toleration 10 --ovmf-path /usr/share/OVMF root 20 1 0 09:37 ? 00:00:25 /usr/bin/virt-launcher --qemu-timeout 5m --name vm8-fedora --uid 43e56583-4bd3-4348-804a-7a6a6ab743cd --namespace kbidarka --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --less-pvc-space-toleration 10 --ovmf-path /usr/share/OVMF --no-fork true qemu 96 1 0 09:37 ? 00:00:11 [qemu-kvm] <defunct> Expected results: 1) VMI pod Terminates successfully and no qemu zombie process 2) Live Migration is not stuck Additional info: Had ran many loop of VMIM object creation and deletion. 1) Issue happens in almost all the runs 2) But in only 1-3% ( 1-3/100 ) of the VMI Pods
very interesting. It appears that the live migration completes successfully, but that the termination of the source qemu process fails, which in turn causes the source virt-launcher to never exit. on the source pod we see this log line that tells us libvirt was unable to terminate the qemu process {"component":"virt-launcher","level":"error","msg":"Failed to terminate process 96 with SIGKILL: Device or resource busy","pos":"virProcessKillPainfullyDelay:431","subcomponent":"libvirt","thread":"35","timestamp":"2021-07-22T10:11:02.109000Z"} Then a few lines later we see that the migration did succeed {"component":"virt-launcher","kind":"","level":"info","msg":"Live migration succeeded.","name":"vm8-fedora","namespace":"kbidarka","pos":"manager.go:589","timestamp":"2021-07-22T10:11:02.130573Z","uid":"43e56583-4bd3-4348-804a-7a6a6ab743cd"} Since the old source pod never terminates, that prevents new migrations from occurring due to a validation check that ensures a new migration can't be started if more than one VMI pod is active. There's quite a few reasons a process can become defunct, especially under load. Maybe we should consider having virt-launcher exit if it detects the qemu process is in a defunct state. That would leave it up to the kubelet to handle cleanup.
(In reply to David Vossel from comment #1) > very interesting. It appears that the live migration completes successfully, > but that the termination of the source qemu process fails, which in turn > causes the source virt-launcher to never exit. on the source pod we see this > log line that tells us libvirt was unable to terminate the qemu process > > {"component":"virt-launcher","level":"error","msg":"Failed to terminate > process 96 with SIGKILL: Device or resource > busy","pos":"virProcessKillPainfullyDelay:431","subcomponent":"libvirt", > "thread":"35","timestamp":"2021-07-22T10:11:02.109000Z"} > > Then a few lines later we see that the migration did succeed > > {"component":"virt-launcher","kind":"","level":"info","msg":"Live migration > succeeded.","name":"vm8-fedora","namespace":"kbidarka","pos":"manager.go: > 589","timestamp":"2021-07-22T10:11:02.130573Z","uid":"43e56583-4bd3-4348- > 804a-7a6a6ab743cd"} > > > Since the old source pod never terminates, that prevents new migrations from > occurring due to a validation check that ensures a new migration can't be > started if more than one VMI pod is active. > > > There's quite a few reasons a process can become defunct, especially under > load. Maybe we should consider having virt-launcher exit if it detects the > qemu process is in a defunct state. That would leave it up to the kubelet to > handle cleanup. +1 or send an alert. It seemed as a temporary hang due to CPU overutilization. In this case kill -s SIGCHLD 1 in the virt-launcher fixed this - but this may not always be the case. kill -s SIGCHLD 1
Setting a severity of High as this can be disruptive to cluster operations, e.g. node drain for maintenance/upgrade.
upstream PR is posted: https://github.com/kubevirt/kubevirt/pull/6163
To verify, follow steps to reproduce. Keep in mind this bug is somewhat rare so multiple attempts to reproduce should be conducted.
Tried reproducing this, a) Created 100 VM's b) Live Migrated twice using VMIM objects , no issues seen. Verified with virt-operator-container-v4.9.0-35
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4104