1985083 – VMI Pod fails to terminate due to a zombie qemu process

Bug 1985083 - VMI Pod fails to terminate due to a zombie qemu process

Summary: VMI Pod fails to terminate due to a zombie qemu process

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Virtualization
Sub Component:
Version:	2.6.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.9.0
Assignee:	David Vossel
QA Contact:	Kedar Bidarkar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-07-22 19:29 UTC by Kedar Bidarkar
Modified:	2021-11-02 15:59 UTC (History)
CC List:	4 users (show)
Fixed In Version:	virt-operator-container-v4.9.0-29 hco-bundle-registry-container-v4.9.0-122
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-11-02 15:59:33 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
More details about the Pod in NotReady state (52.89 KB, text/plain) 2021-07-22 19:29 UTC, Kedar Bidarkar	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:4104	0	None	None	None	2021-11-02 15:59:53 UTC

Description Kedar Bidarkar 2021-07-22 19:29:28 UTC

Created attachment 1804636 [details]
More details about the Pod in NotReady state

Description of problem:

1) Used Fedora32 containerDisk Image
2) Created a 2.5GB /home/fedora/disksump.img inside the Fedora VMI via the cloud-init "runcmd"
3) Created 100 VMI and 100 VMIM objects to Live Migrate each of them in a loop.


I see an issue, which is frequently recurring, that is( Live Migration gets stuck )

a) VMIM is applied for the VMI.
b) New VMI Pod is created and it enters Running state on the Target Node

c) The old VMI Pod enters into "Not Ready" State
[kbidarka@localhost vmim]$ oc get pods virt-launcher-vm8-fedora-gmgjt -n kbidarka
NAME                             READY   STATUS     RESTARTS   AGE
virt-launcher-vm8-fedora-gmgjt   1/2     NotReady   0          7h1m

d) The old VMI Pod when the VMI live migrates, fails to get deleted ( In this case compute container in the pod ) and enters from Running to NotReady State , though a new VMI Pod gets created successfully.

e) But overall the Pod status of this NotReady Pod is "Running" , when looking at Pod.status.phase

f)And now when a new VMIM object is applied for the (2 nd time) Live Migration never happens and the VMIM object stays in the "Pending" phase, it fails to create a new VMI Pod.

Assumption here: My guess is, as we do not allow Live Migration (or new VMI Pod creation) to happen when there are 2 VMI pods already in the "Running" Phase, we get into this state.

----


As seen below we see qemu zombie process

root@virt-launcher-vm8-fedora-gmgjt /]# ps -ef|grep -i qemu
root           1       0  0 09:37 ?        00:00:05 /usr/bin/virt-launcher --qemu-timeout 5m --name vm8-fedora --uid 43e56583-4bd3-4348-804a-7a6a6ab743cd --namespace kbidarka --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --less-pvc-space-toleration 10 --ovmf-path /usr/share/OVMF
root          20       1  0 09:37 ?        00:00:25 /usr/bin/virt-launcher --qemu-timeout 5m --name vm8-fedora --uid 43e56583-4bd3-4348-804a-7a6a6ab743cd --namespace kbidarka --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --less-pvc-space-toleration 10 --ovmf-path /usr/share/OVMF --no-fork true
qemu          96       1  0 09:37 ?        00:00:11 [qemu-kvm] <defunct>

Version-Release number of selected component (if applicable):
CNV-2.6.6

How reproducible:
Setup:
All the 3 worker nodes had 100% CPU Util and 125% CPU Saturation
Using stress-ng image.

Steps to Reproduce:
1) Used Fedora32 containerDisk Image
2) Created a 2.5GB /home/fedora/disksump.img inside the Fedora VMI via the cloud-init "runcmd"
3) Created 100 VMI and 100 VMIM objects to Live Migrate each of them in a loop.

Actual results:
1) VMI pod does not Terminate successfully and a qemu zombie process as seen below.
2) Live Migration is stuck

root@virt-launcher-vm8-fedora-gmgjt /]# ps -ef|grep -i qemu
root           1       0  0 09:37 ?        00:00:05 /usr/bin/virt-launcher --qemu-timeout 5m --name vm8-fedora --uid 43e56583-4bd3-4348-804a-7a6a6ab743cd --namespace kbidarka --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --less-pvc-space-toleration 10 --ovmf-path /usr/share/OVMF
root          20       1  0 09:37 ?        00:00:25 /usr/bin/virt-launcher --qemu-timeout 5m --name vm8-fedora --uid 43e56583-4bd3-4348-804a-7a6a6ab743cd --namespace kbidarka --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --less-pvc-space-toleration 10 --ovmf-path /usr/share/OVMF --no-fork true
qemu          96       1  0 09:37 ?        00:00:11 [qemu-kvm] <defunct>


Expected results:
1) VMI pod Terminates successfully and no qemu zombie process
2) Live Migration is not stuck


Additional info:
Had ran many loop of VMIM object creation and deletion.

1) Issue happens in almost all the runs
2) But in only 1-3% ( 1-3/100 ) of the VMI Pods

Comment 1 David Vossel 2021-07-22 20:52:32 UTC

very interesting. It appears that the live migration completes successfully, but that the termination of the source qemu process fails, which in turn causes the source virt-launcher to never exit. on the source pod we see this log line that tells us libvirt was unable to terminate the qemu process

{"component":"virt-launcher","level":"error","msg":"Failed to terminate process 96 with SIGKILL: Device or resource busy","pos":"virProcessKillPainfullyDelay:431","subcomponent":"libvirt","thread":"35","timestamp":"2021-07-22T10:11:02.109000Z"}

Then a few lines later we see that the migration did succeed

{"component":"virt-launcher","kind":"","level":"info","msg":"Live migration succeeded.","name":"vm8-fedora","namespace":"kbidarka","pos":"manager.go:589","timestamp":"2021-07-22T10:11:02.130573Z","uid":"43e56583-4bd3-4348-804a-7a6a6ab743cd"}


Since the old source pod never terminates, that prevents new migrations from occurring due to a validation check that ensures a new migration can't be started if more than one VMI pod is active.


There's quite a few reasons a process can become defunct, especially under load. Maybe we should consider having virt-launcher exit if it detects the qemu process is in a defunct state. That would leave it up to the kubelet to handle cleanup.

Comment 2 Vladik Romanovsky 2021-07-22 20:57:22 UTC

(In reply to David Vossel from comment #1)
> very interesting. It appears that the live migration completes successfully,
> but that the termination of the source qemu process fails, which in turn
> causes the source virt-launcher to never exit. on the source pod we see this
> log line that tells us libvirt was unable to terminate the qemu process
> 
> {"component":"virt-launcher","level":"error","msg":"Failed to terminate
> process 96 with SIGKILL: Device or resource
> busy","pos":"virProcessKillPainfullyDelay:431","subcomponent":"libvirt",
> "thread":"35","timestamp":"2021-07-22T10:11:02.109000Z"}
> 
> Then a few lines later we see that the migration did succeed
> 
> {"component":"virt-launcher","kind":"","level":"info","msg":"Live migration
> succeeded.","name":"vm8-fedora","namespace":"kbidarka","pos":"manager.go:
> 589","timestamp":"2021-07-22T10:11:02.130573Z","uid":"43e56583-4bd3-4348-
> 804a-7a6a6ab743cd"}
> 
> 
> Since the old source pod never terminates, that prevents new migrations from
> occurring due to a validation check that ensures a new migration can't be
> started if more than one VMI pod is active.
> 
> 
> There's quite a few reasons a process can become defunct, especially under
> load. Maybe we should consider having virt-launcher exit if it detects the
> qemu process is in a defunct state. That would leave it up to the kubelet to
> handle cleanup.

+1 or send an alert.

It seemed as a temporary hang due to CPU overutilization.
In this case kill -s SIGCHLD 1 in the virt-launcher fixed this - but this may not always be the case.


kill -s SIGCHLD 1

Comment 3 sgott 2021-07-22 21:42:02 UTC

Setting a severity of High as this can be disruptive to cluster operations, e.g. node drain for maintenance/upgrade.

Comment 4 David Vossel 2021-07-29 18:21:32 UTC

upstream PR is posted: https://github.com/kubevirt/kubevirt/pull/6163

Comment 5 sgott 2021-08-18 13:08:42 UTC

To verify, follow steps to reproduce. Keep in mind this bug is somewhat rare so multiple attempts to reproduce should be conducted.

Comment 6 Kedar Bidarkar 2021-08-31 12:30:39 UTC

Tried reproducing this, 

a) Created 100 VM's
b) Live Migrated twice using VMIM objects , no issues seen.

Verified with virt-operator-container-v4.9.0-35

Comment 10 errata-xmlrpc 2021-11-02 15:59:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4104

Note You need to log in before you can comment on or make changes to this bug.