Bug 1929144

Summary: fix qemu-ga-win resource leaks
Product: Red Hat Enterprise Linux 8 Reporter: Basil Salman <bsalman>
Component: virtio-winAssignee: Virtualization Maintenance <virt-maint>
virtio-win sub component: qemu-ga-win QA Contact: dehanmeng <demeng>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: ailan, coli, demeng, gveitmic, jortialc, lijin, lmiksik, mdean, sbonazzo, vrozenfe, yvugenfi, zhguo
Version: 8.4Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Windows   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1935246 (view as bug list) Environment:
Last Closed: 2021-05-18 16:25:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1915198    
Bug Blocks: 1935246, 1985906    

Description Basil Salman 2021-02-16 10:01:01 UTC
Description of problem:
As part of BZ#1920342, Qemu-ga-win was reported to have resource (memory and handle) leaks.
the leaks were found in "guest-get-osinfo" and "guest-get-devices"

Comment 3 dehanmeng 2021-02-22 14:43:12 UTC
recently hit a uncommon problem 'bsod' for testing guest-agent on win10-20H20_x86 guest, case is 42-Host_RHEL.m8.u4.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.qemu_guest_agent.virtio_serial.check_os_info.q35, so I thought about this bz and I'm trying to collect dump file, but it's not easy to get, because I've been testing it 200 times, keep doing it. just update the situation here as tracker. feel free to correct me if I misunderstand something. 

complete log: http://fileshare.englab.nay.redhat.com/pub/logs/win10_20H2_32_guest-agent/test-results/42-Host_RHEL.m8.u4.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.qemu_guest_agent.virtio_serial.check_os_info.q35/

packages version:
    kernel-4.18.0-278.el8.dt3.x86_64
    qemu-kvm-5.2.0-4.module+el8.4.0+9676+589043b9.x86_64
    virtio-win-prewhql-193
    seabios-bin-1.14.0-1.module+el8.4.0+8855+a9e237a9.noarch
    RHEL-8.4.0-20210128.d.2

Comment 4 dehanmeng 2021-02-23 03:42:53 UTC
(In reply to dehanmeng from comment #3)
> recently hit a uncommon problem 'bsod' for testing guest-agent on
> win10-20H20_x86 guest, case is
> 42-Host_RHEL.m8.u4.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.
> x86_64.io-github-autotest-qemu.qemu_guest_agent.virtio_serial.check_os_info.
> q35, so I thought about this bz and I'm trying to collect dump file, but
> it's not easy to get, because I've been testing it 200 times, keep doing it.
> just update the situation here as tracker. feel free to correct me if I
> misunderstand something. 

correct dump log link here: http://fileshare.englab.nay.redhat.com/pub/logs/win10_20H2_32_guest_agent/test-results/42-Host_RHEL.m8.u4.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.qemu_guest_agent.virtio_serial.check_os_info.q35/

Comment 10 Basil Salman 2021-02-24 22:38:25 UTC
Hi Dehan,

Scratch build with changes:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=35146591
can this build be verified for this bug too?

Thanks in advance,
Basil

Comment 11 dehanmeng 2021-02-25 00:59:04 UTC
(In reply to Basil Salman from comment #10)
> Hi Dehan,
> 
> Scratch build with changes:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=35146591
> can this build be verified for this bug too?
> 
> Thanks in advance,
> Basil

Hi Basil,
Okay, sure, Will use this new build to check and update result asap. thanks for your effort, Basil.

BR
Dehan

Comment 12 dehanmeng 2021-02-25 13:34:36 UTC
Hi Basil, qga command {"guest-get-disks"} of new build works well. 

reproduce with previous mingw-qemu-ga-win-101.2.0-1.el7ev
step to reproduce:
1. boot up win2019 guest with serial and qga.
2. connect with guest and execute qga command, run following command in terminal (host).
cat <(for i in {1..10000}; do echo '{"execute":"guest-get-osinfo"}'; sleep 2; echo '{"execute":"guest-get-devices"}'; sleep 2; done)  | nc -U /tmp/qga.sock > log
3.check the PID of QEMU-ga in cmd with following command (guest):
C:> handle.exe -s -p $qga_PID

Actual result:
handles raise up without limit, resource leak.
Expected result:
handles maintain a stable value.

Verify with this build qemu-ga-win-102.0.0-1.el8.noarch.rpm
step to verify:
as above

Actual result:
handles maintain a stable value.
Expected result:
as above.

Comment 13 dehanmeng 2021-02-25 13:37:11 UTC
(In reply to dehanmeng from comment #12) 
> reproduce with previous mingw-qemu-ga-win-101.2.0-1.el7ev
> step to reproduce:
> 1. boot up win2019 guest with serial and qga.
> 2. connect with guest and execute qga command, run following command in
> terminal (host).
> cat <(for i in {1..10000}; do echo '{"execute":"guest-get-osinfo"}'; sleep
> 2; echo '{"execute":"guest-get-devices"}'; sleep 2; done)  | nc -U
> /tmp/qga.sock > log
> 3.check the PID of QEMU-ga in cmd with following command (guest):
> C:> handle.exe -s -p $qga_PID
> 
> Actual result:
> handles raise up without limit, resource leak.
> Expected result:
> handles maintain a stable value.
> 
> Verify with this build qemu-ga-win-102.0.0-1.el8.noarch.rpm
> step to verify:
> as above
> 
> Actual result:
> handles maintain a stable value.
> Expected result:
> as above.

Hi Basil, qga command {"guest-get-osinfo"}, {"guest-get-devices"} of new build works well. didn't check out resource leak.

Comment 14 Germano Veit Michel 2021-03-01 21:14:20 UTC
The test build given to the customer in BZ1920342 was tested by the customer and it fixed the handle leak.

Comment 18 Basil Salman 2021-03-06 19:58:23 UTC
Build that resolves this bug:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=35297151

Comment 19 dehanmeng 2021-03-07 12:55:52 UTC
reproduce with previous mingw-qemu-ga-win-101.2.0-1.el7ev
step to reproduce:
1. boot up win10-64(q35) guest with serial and qga.
2. Check the PID and start handles of qga in guest.
# win+R --> 'taskmgr' --> More details --> right click 'Name' and choose 'PID', 
# C:> handle.exe -s -p $PID
3. connect with guest and execute qga command, run following command in terminal (host).
cat <(for i in {1..10000}; do echo '{"execute":"guest-get-osinfo"}'; sleep 2; echo '{"execute":"guest-get-devices"}'; sleep 2; done)  | nc -U /tmp/qga.sock > log
4.check the end handles of QEMU-ga in cmd with following command after several minutes:
# C:> handle.exe -s -p $qga_PID

Actual result:
handles raise up without limit, actually it should be limited to less than 300, but now it cause resource leak.

Nthandle v4.22 - Handle Viewer
Copyright (C) 1997-2019 Mark Russinovich
Sysinternals - www.sysinternals.com
Handle type summary:
  ALPC Port                                   :5
  Desktop                                      :1
  Directory                                     :2
  ...
  ...
Total Handles: 437

Expected result:
handles maintain a stable value.

Verify with this build qemu-ga-win-102.0.0-1.el8.noarch.rpm on win10-64(q35)
step to verify:
as above

Actual result:
handles will raise about 5%~ and then maintain a stable value
Nthandle v4.22 - Handle Viewer
Copyright (C) 1997-2019 Mark Russinovich
Sysinternals - www.sysinternals.com
Handle type summary:
  ALPC Port                                   :5
  Desktop                                      :1
  Directory                                     :2
  ...
  ...
Total Handles: 177
Expected result:
as above.

Comment 20 dehanmeng 2021-03-11 09:56:51 UTC
Hi all, 
The newest qemu-ga-win came out and it has been passed from my side.  the whole test loop and new cases got passed as well.  No further errors and regression issues were found now. Thanks everyone for the time and effort.

Cheers
Dehan

Comment 21 dehanmeng 2021-03-11 09:57:57 UTC
(In reply to dehanmeng from comment #20)
> Hi all, 
> The newest qemu-ga-win came out and it has been passed from my side.  the
> whole test loop and new cases got passed as well.  No further errors and
> regression issues were found now. Thanks everyone for the time and effort.
> 
> Cheers
> Dehan

the qemu-ga-win version is mingw-qemu-ga-win-102.0.0-2.el8

Comment 24 Vadim Rozenfeld 2021-05-11 09:32:57 UTC
Seems as we have two options to solve this problem , use the latest rhel 8.4.0 virtio win rpm or
update/repackage 8.3.0(z) virtio-win rpm with the latest binaries that were used for building 8.4.0 
virtio-win package. Please let us know which one looks better.

Comment 25 Sandro Bonazzola 2021-05-12 05:44:47 UTC
(In reply to Vadim Rozenfeld from comment #24)
> Seems as we have two options to solve this problem , use the latest rhel
> 8.4.0 virtio win rpm or
> update/repackage 8.3.0(z) virtio-win rpm with the latest binaries that were
> used for building 8.4.0 
> virtio-win package. Please let us know which one looks better.

using latest virtio-win rpm from 8.4 is ok from RHV point of view if that's the question for me.

Comment 28 dehanmeng 2021-05-18 01:30:38 UTC
*** Bug 1958825 has been marked as a duplicate of this bug. ***

Comment 29 errata-xmlrpc 2021-05-18 16:25:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virtio-win bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:1959