1663859 – qemu monitor of the domain is getting blocked when the VM is getting destroyed

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1663859 - qemu monitor of the domain is getting blocked when the VM is getting destroyed

Summary: qemu monitor of the domain is getting blocked when the VM is getting destroyed

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.6
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Markus Armbruster
QA Contact:	FuXiangChun
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-01-07 08:44 UTC by nijin ashok
Modified:	2020-06-11 02:20 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-07-22 20:30:04 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
libvirt log (1.35 MB, application/gzip) 2019-01-16 20:19 UTC, nijin ashok	no flags	Details
View All

Description nijin ashok 2019-01-07 08:44:27 UTC

Description of problem:

If libvirt send VIR_DOMAIN_DESTROY_GRACEFUL to the domain, it may take some time for the VM to go down if there is a lot of i/o pending to flush for the VM(generally in VMs with huge RAM). However, during this time, the qemu monitor commands are getting blocked. So call from any management application sending qemu monitor commands using libvirt API will also get blocked during this time. The issue observed is in RHV environment where the majority of the calls from the vdsm were stuck causing the particular task queue getting full and was creating issues in RHV environment.

A bug 1660451 for this is already opened with RHV which have detailed info about log analysis in RHV. However, would like to know if there is an easy solution in libvirt/qemu to overcome the situation.

Version-Release number of selected component (if applicable):

qemu-kvm-rhev-2.12.0-18.el7_6.1.x86_64

How reproducible:

100%. Observed in RHV environment.

Steps to Reproduce:

Run the qemu monitor commands when we send a VIR_DOMAIN_DESTROY_GRACEFUL to the domain.

Actual results:

qemu monitor of the domain is getting blocked when the VM is getting destroyed

Expected results:

Calls should not be blocked.

Additional info:

Comment 2 Markus Armbruster 2019-01-09 08:59:40 UTC

The reproducer is not detailed enough for me to reproduce anything (mind, I'm a VDSM ignoramus).  Even if it had more detail, a reproducer that works several layers up the stack requires me to peel off the layers myself to get to the layer I'm supposed to debug.  I commonly need the reporter's help to do that when it's more than one layer.

Your libvirt logs *might* be enough for me to make sense of the bug.  Please provide a reproducer in terms qemu-kvm, or at least your libvirt logs.

Comment 3 FuXiangChun 2019-01-15 10:23:59 UTC

QE cann't reproduce this bug. This is the detailed steps as below. please let me know if my steps have any problems.

1) enabled hugepage on host
#echo 10240 >  /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages

2) Boot RHEL7.6 guest with hugepage, This is part of xml file.

<memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB'/>
    </hugepages>
    <locked/>
  </memoryBacking>

3. run io stress script inside guest.

#cat io-stress.sh
dd if=/dev/random of=file1 iflag=fullblock bs=1M count=2000 &
dd if=/dev/random of=file2 iflag=fullblock bs=1M count=2000 &
dd if=/dev/random of=file3 iflag=fullblock bs=1M count=2000 &
dd if=/dev/random of=file4 iflag=fullblock bs=1M count=2000 &
dd if=/dev/random of=file5 iflag=fullblock bs=1M count=2000 &

4. Run continuously the qemu monitor commands
while true;do virsh qemu-monitor-command rhel7 --hmp "info qtree";done

5. shutdown guest on host
#virsh destroy rhel7

Result:

step 4 & 5 work well.

Comment 4 nijin ashok 2019-01-16 20:16:50 UTC

(In reply to Markus Armbruster from comment #2)
> The reproducer is not detailed enough for me to reproduce anything (mind,
> I'm a VDSM ignoramus).  Even if it had more detail, a reproducer that works
> several layers up the stack requires me to peel off the layers myself to get
> to the layer I'm supposed to debug.  I commonly need the reporter's help to
> do that when it's more than one layer.
> 
> Your libvirt logs *might* be enough for me to make sense of the bug.  Please
> provide a reproducer in terms qemu-kvm, or at least your libvirt logs.

Sorry Markus, I thought my original bug on RHV will help in understanding the issue.

I am giving a reproducer steps on a KVM directly (No RHV).

Steps to Reproduce:

[1] I used slowfs (https://github.com/nirs/slowfs) on the directory where the image is located to get some delay to the write operation. The value I set is "0.001" for the "write". So VIR_DOMAIN_DESTROY_GRACEFUL will take some time to complete as it will flush the i/o and that would take some time if I have slow storage.

[2] Started a while loop to execute qemu-monitor command every second.

while true;do virsh qemu-monitor-command rhel7.5 --hmp "info qtree";date;sleep 1;done

[3] In the VM, ran the stress script mentioned in comment #3.

[4] When I have large dirty pages in the VM, ran "virsh destroy rhel7.5 --graceful"

[5] The qemu-monitor got stuck immediately after I run the destroy and got exited only when the domain was destroyed successfully.

===
The query which was called before the "destroy" got completed successfully.

2019-01-16 13:58:25.548+0000: 32959: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7f49e0017930 buf={"execute":"human-monitor-command","arguments":{"command-line":"info qtree"},"id":"libvirt-254"}^M
2019-01-16 13:58:25.552+0000: 32963: debug : qemuDomainObjExitMonitorInternal:7074 : Exited monitor (mon=0x7f49e0017930 vm=0x7f4994193c40 name=rhel7.5)
2019-01-16 13:58:25.552+0000: 32963: debug : qemuDomainObjEndJob:6935 : Stopping job: query (async=none vm=0x7f4994193c40 name=rhel7.5)

However, the next one got blocked.

===
2019-01-16 13:58:26.570+0000: 32964: info : qemuMonitorSend:1083 : QEMU_MONITOR_SEND_MSG: mon=0x7f49e0017930 msg={"execute":"human-monitor-command","arguments":{"command-line":"info qtree"},"id":"libvirt-255"}^M
2019-01-16 13:58:26.570+0000: 32959: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7f49e0017930 buf={"execute":"human-monitor-command","arguments":{"command-line":"info qtree"},"id":"libvirt-255"}^M

destroy called...

2019-01-16 13:58:26.751+0000: 32960: debug : virDomainDestroyFlags:524 : dom=0x7f49f0001360, (VM: name=rhel7.5, uuid=85ea4436-0295-474c-83d8-9c85ca7dafae), flags=0x1


2019-01-16 13:58:47.080+0000: 32962: debug : qemuDomainObjExitMonitorInternal:7074 : Exited monitor (mon=0x7f49e0017930 vm=0x7f4994193c40 name=rhel7.5)
2019-01-16 13:58:47.080+0000: 32962: debug : qemuDomainObjEndJob:6935 : Stopping job: query (async=none vm=0x7f4994193c40 name=rhel7.5)
===

So during the "destroy", the monitor is not responding to the commands and this is the issue we are facing here and would like to know if this can be improved.

In RHV, there are many periodic tasks running these monitor commands. The issue was observed in a customer environment where the "destroy" of a VM took 59 seconds which caused the monitoring queue in RHV to get full and made all the VMs "not responding" and this was reported in bug 1660451.


Attaching the libvirtd debug log.

Comment 5 nijin ashok 2019-01-16 20:19:32 UTC

Created attachment 1521112 [details]
libvirt log

Comment 6 FuXiangChun 2019-01-18 10:50:13 UTC

QE cann't reproduce this bug yet. This is detailed steps. please correct me if my steps is wrong, Thanks.

1. Boot RHEL7.6 guest with virsh.

2. run slowfs inside guest e.g

#slowfs -c slowfs.cfg.example /realfs /slowfs

3. run io stress script seems comment3.

#./io-stress.sh

4.Started a while loop to execute qemu-monitor command every second.

#while true;do virsh qemu-monitor-command rhel7 --hmp "info qtree";date;sleep 1;done

5."virsh destroy rhel7 --graceful"

result:

step5 is successful.

Comment 7 FuXiangChun 2019-01-21 05:37:54 UTC

Highlight: The "destroy command" doesn't have any delay(response is very fast) in my testing. I can't reproduce it like comment4(e.g it took 59 seconds...)

Comment 8 nijin ashok 2019-01-28 01:22:12 UTC

(In reply to FuXiangChun from comment #6)
> QE cann't reproduce this bug yet. This is detailed steps. please correct me
> if my steps is wrong, Thanks.
> 
> 1. Boot RHEL7.6 guest with virsh.
> 
> 2. run slowfs inside guest e.g
> 
> #slowfs -c slowfs.cfg.example /realfs /slowfs
> 

You have to run slowfs on the directory where the image file is located. I also played with parameters "vm.dirty_background_ratio", "vm.dirty_ratio" and "vm.dirty_expire_centisecs" within the VM to get a large cache  before shutdown.

Note You need to log in before you can comment on or make changes to this bug.