Bug 1737702

Summary: High Host CPU load for Windows 10 Guests (Update 1903) when idle
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: liunana <nanliu>
Component: qemu-kvmAssignee: Vadim Rozenfeld <vrozenfe>
qemu-kvm sub component: General QA Contact: Yu Wang <wyu>
Status: CLOSED WORKSFORME Docs Contact:
Severity: unspecified    
Priority: high CC: ailan, chayang, dholler, jinzhao, lijin, michal.skrivanek, virt-maint, vrozenfe, wyu
Version: 8.1   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1610461 Environment:
Last Closed: 2020-04-09 09:56:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
trace-cmd on Win10 1809 guest
none
trace-cmd on Win10 1903 guest none

Description liunana 2019-08-06 06:04:48 UTC
Description of problem:
High Host CPU load for Windows 10 Guests (Update 1903) when idle



Version-Release number of selected component (if applicable):
Host:
    #/usr/libexec/qemu-kvm --version
      QEMU emulator version 4.0.93 (qemu-kvm-4.1.0-1)
    kernel-4.18.0-127.el8.x86_64
    seabios-bin-1.11.1-4.module+el8.1.0+3531+2918145b.noarch
Guest:
    en_windows_10_business_editions_version_1903_x64_dvd_37200948.iso


How reproducible:
5/5


Steps to Reproduce:
1. Boot the last Windows10 guest(1903) with command [1] with all flags "-cpu Skylake-Client-IBRS,kvm_pv_unhalt,hv_stimer,hv_relaxed,hv_vpindex,hv_ipi,hv_spinlocks=0xfff,hv_vapic,hv_tlbflush,hv_reset,hv_crash,hv_synic,hv_time".
2. Wait, idle time inside the Windows. (CPU load 1~10%)
3. use top command to monitor host cpu usage and save to a file.
   #top -p 1318 -n 1800 -d 2 -b >fixed-top-pc-result (about 30 mins)
4.calculate average value with this file.
   # cat fixed-top-pc-result |grep 1318 |awk -F ' ' '{print $9;}'|awk '{sum+=$1} END {print "Average = ", sum/NR}'

Actual results:
    HOST CPU average utilization rate:
    Average =  27.2757%

Expected results:
   CPU load 0-5%


Additional info:
[1]
/usr/libexec/qemu-kvm -name win10 -M q35 -enable-kvm \
-cpu Skylake-Client-IBRS,kvm_pv_unhalt,hv_stimer,hv_relaxed,hv_vpindex,hv_ipi,hv_spinlocks=0xfff,hv_vapic,hv_tlbflush,hv_reset,hv_crash,hv_synic,hv_time \
-monitor stdio \
-nodefaults -rtc base=utc \
-m 4G \
-smp 2,sockets=1,cores=2,threads=2,maxcpus=4 \
-object secret,id=sec0,data=redhat \
-blockdev node-name=back_image,driver=file,cache.direct=on,cache.no-flush=off,filename=/home/3-win10/win10.luks,aio=threads \
-blockdev node-name=drive-virtio-disk0,driver=luks,cache.direct=on,cache.no-flush=off,file=back_image,key-secret=sec0 \
-device pcie-root-port,id=root0,slot=0 \
-device virtio-blk-pci,drive=drive-virtio-disk0,id=disk0,bus=root0 \
-device pcie-root-port,id=root1,slot=1 \
-device virtio-net-pci,mac=70:5a:0f:38:cd:a4,id=idhRa7sf,vectors=4,netdev=idNIlYmb,bus=root1 -netdev tap,id=idNIlYmb,vhost=on \
-drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/iso/windows/virtio-win-prewhql-0.1-172.iso \
-device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \
-device ich9-usb-uhci6 \
-device usb-tablet,id=mouse \
-device qxl-vga,id=video1 \
-spice port=5903,disable-ticketing \
-device virtio-serial-pci,id=virtio-serial1 \
-chardev spicevmc,id=charchannel0,name=vdagent \
-device virtserialport,bus=virtio-serial1.0,nr=3,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 \

Comment 5 Vadim Rozenfeld 2019-08-16 07:04:23 UTC
Created attachment 1604300 [details]
trace-cmd on Win10 1809 guest

trace-cmd record -e kvm on Win10 1809-32b guest

Comment 6 Vadim Rozenfeld 2019-08-16 07:07:22 UTC
Created attachment 1604302 [details]
trace-cmd on Win10 1903 guest

trace-cmd record -e kvm on Win10 1903-32b guest

Comment 12 Ademar Reis 2020-02-05 23:02:05 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 15 Vadim Rozenfeld 2020-03-08 23:48:40 UTC
I spent some time trying to reproduce the problem on 1909. 
Not sure if I can reproduce the problem with the latest Win10
version any longer. Can QE give it a try?
Thanks,
Vadim.

Comment 16 Yu Wang 2020-03-09 03:09:35 UTC
I can still reproduce it on win10(1909)- 64bit

If guest cpu load 0%-1%, the host cpu load for qemu-kvm 5%-10%
If guest cpu load >1% , the host cpu load for qemu-kvm >10%

qemu-kvm-4.2.0-11.module+el8.2.0+5837+4c1442ec.x86_64
kernel-4.18.0-175.el8.x86_64
seabios-1.13.0-1.module+el8.2.0+5520+4e5817f3.x86_64

Boot command:
MALLOC_PERTURB_=1  /usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -machine q35  \
    -nodefaults \
    -sandbox on  \
    -device VGA,bus=pcie.0,addr=0x1 \
    -m 6144  \
    -smp 6,maxcpus=6,cores=3,threads=1,sockets=2  \
    -cpu 'Skylake-Server',hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,hv_evmcs \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_Ea97kw1,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_Ea97kw1,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=id2Gz4fb \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-3,addr=0x0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/win.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci1,bus=pcie.0-root-port-5,addr=0x0 \
    -drive id=drive_image2,if=none,snapshot=off,aio=threads,format=raw,file=/mnt/tmpfs/data.raw \
    -device scsi-hd,id=image2,drive=drive_image2 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:57:da:4f:cb:b2,id=idotmiWW,mq=on,vectors=14,netdev=idT6Wphl,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idT6Wphl,vhost=on,queues=6 \
    -vnc :9  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot menu=on,strict=off,order=cdn,once=c \
    -enable-kvm \
    -monitor stdio \
    -qmp tcp:0:4444,server,nowait

Comment 19 Yu Wang 2020-03-18 09:36:50 UTC
Tried on a clean win10-64 (1909)

If the guest cpu utilization is closed to 0% for a while, the host cpu is 4%-6%.
But the service for win10 is more active than other guests. So, the result is 
not as well as others. In another words ,I cannot hold the idle status for a long
time. (sysMain service is always use the cpu)

tried with other guests (below win10), can get 4%-5%, since the guest status is stable.

Above result config:
I closed the windows defender and windows updates, disable Automatic Windows Update, 
disable NIC , unused USB ports, and CD-ROM. 


And a question for my automation scripts:

I tried with my automation scripts.The result is:

Win8-32/win8.1-64,win2016,win2019 can get 7%-9% cpu utilization in host.
(auto scripts will increase about 3% cpu utilization)
Win10-64 can get 9% (if guest service stable) or 15+% utilization(guest not idle) in host.

Can we accept the result(host utilization <10%) when running with automation?

Thanks
Yu Wang

Comment 20 Vadim Rozenfeld 2020-03-18 10:34:50 UTC
(In reply to Yu Wang from comment #19)
> Tried on a clean win10-64 (1909)
> 
> If the guest cpu utilization is closed to 0% for a while, the host cpu is
> 4%-6%.
> But the service for win10 is more active than other guests. So, the result
> is 
> not as well as others. In another words ,I cannot hold the idle status for a
> long
> time. (sysMain service is always use the cpu)

sysmain is just a new name for Superfetch. MSFT used to recommend disabling it on
Hyper-V guest 
https://docs.microsoft.com/en-us/windows-server/administration/performance-tuning/role/hyper-v-server/processor-performance
I believe is what we should do as well.

> 
> tried with other guests (below win10), can get 4%-5%, since the guest status
> is stable.
> 
> Above result config:
> I closed the windows defender and windows updates, disable Automatic Windows
> Update, 
> disable NIC , unused USB ports, and CD-ROM. 


> 
> 
> And a question for my automation scripts:
> 
> I tried with my automation scripts.The result is:
> 
> Win8-32/win8.1-64,win2016,win2019 can get 7%-9% cpu utilization in host.
> (auto scripts will increase about 3% cpu utilization)
> Win10-64 can get 9% (if guest service stable) or 15+% utilization(guest not
> idle) in host.
> 
> Can we accept the result(host utilization <10%) when running with automation?

I think it can be acceptable, but definitely would prefer seeing it around 5%

Cheers,
Vadim.

> 
> Thanks
> Yu Wang

Comment 21 Yu Wang 2020-03-24 08:16:44 UTC
(In reply to Vadim Rozenfeld from comment #20)
> (In reply to Yu Wang from comment #19)
> > Tried on a clean win10-64 (1909)
> > 
> > If the guest cpu utilization is closed to 0% for a while, the host cpu is
> > 4%-6%.
> > But the service for win10 is more active than other guests. So, the result
> > is 
> > not as well as others. In another words ,I cannot hold the idle status for a
> > long
> > time. (sysMain service is always use the cpu)
> 
> sysmain is just a new name for Superfetch. MSFT used to recommend disabling
> it on
> Hyper-V guest 
> https://docs.microsoft.com/en-us/windows-server/administration/performance-
> tuning/role/hyper-v-server/processor-performance
> I believe is what we should do as well.
> 

Got it, I disabled this service and the utilization in guest is more stable than before.

And I have another question about cpu utilization with different vcpus:
(There are 48 vcpus in host, guest win10-64/ws2012/ws2019)

4 vcpus in guest, the utilization will be 7% in host
6 vcpus in guest, the utilization will be 9% in host
24 vcpus in guest,the utilization will be 30% in host

So the vcpu will influence the performance a lot.
When we test this case, will we restrict the quantity of guest vcpus?

(Note:above results are running with auto scripts, so the results will a bit more than 
"by manual")

Thanks
Yu Wang

Comment 22 Vadim Rozenfeld 2020-03-24 08:41:24 UTC
(In reply to Yu Wang from comment #21)
> (In reply to Vadim Rozenfeld from comment #20)
> > (In reply to Yu Wang from comment #19)
> > > Tried on a clean win10-64 (1909)
> > > 
> > > If the guest cpu utilization is closed to 0% for a while, the host cpu is
> > > 4%-6%.
> > > But the service for win10 is more active than other guests. So, the result
> > > is 
> > > not as well as others. In another words ,I cannot hold the idle status for a
> > > long
> > > time. (sysMain service is always use the cpu)
> > 
> > sysmain is just a new name for Superfetch. MSFT used to recommend disabling
> > it on
> > Hyper-V guest 
> > https://docs.microsoft.com/en-us/windows-server/administration/performance-
> > tuning/role/hyper-v-server/processor-performance
> > I believe is what we should do as well.
> > 
> 
> Got it, I disabled this service and the utilization in guest is more stable
> than before.
> 
> And I have another question about cpu utilization with different vcpus:
> (There are 48 vcpus in host, guest win10-64/ws2012/ws2019)
> 
> 4 vcpus in guest, the utilization will be 7% in host
> 6 vcpus in guest, the utilization will be 9% in host
> 24 vcpus in guest,the utilization will be 30% in host
> 
> So the vcpu will influence the performance a lot.
> When we test this case, will we restrict the quantity of guest vcpus?

Do you use vCPU pinning? If not, vCPU may be scheduled across any of 
the cores in the system, which in turn can lead to not optimal cache
performance.

Cheers,
Vadim.

> 
> (Note:above results are running with auto scripts, so the results will a bit
> more than 
> "by manual")
> 
> Thanks
> Yu Wang

Comment 23 Yu Wang 2020-03-24 09:38:56 UTC
(In reply to Vadim Rozenfeld from comment #22)
> (In reply to Yu Wang from comment #21)

> > 
> > And I have another question about cpu utilization with different vcpus:
> > (There are 48 vcpus in host, guest win10-64/ws2012/ws2019)
> > 
> > 4 vcpus in guest, the utilization will be 7% in host
> > 6 vcpus in guest, the utilization will be 9% in host
> > 24 vcpus in guest,the utilization will be 30% in host
> > 
> > So the vcpu will influence the performance a lot.
> > When we test this case, will we restrict the quantity of guest vcpus?
> 
> Do you use vCPU pinning? If not, vCPU may be scheduled across any of 
> the cores in the system, which in turn can lead to not optimal cache
> performance.

Do you mean that adding "numactl --physcpubind=1,2,3,4,5,6 " in qemu cmd , 
is that enough or we have an other cpu pinning method ?

Thanks
Yu Wang

> 
> Cheers,
> Vadim.

Comment 31 Yu Wang 2020-04-09 02:32:02 UTC
Hi Vadim,

According to Comment#26 and Comment#28, we got less than 5% cpu utilization for each thread on win10 guests.

So, I cannot reproduce high cpu utilization in host, maybe this bug can be closed.


Thanks
Yu Wang

Comment 32 Vadim Rozenfeld 2020-04-09 05:24:29 UTC
(In reply to Yu Wang from comment #31)
> Hi Vadim,
> 
> According to Comment#26 and Comment#28, we got less than 5% cpu utilization
> for each thread on win10 guests.
> 
> So, I cannot reproduce high cpu utilization in host, maybe this bug can be
> closed.
> 
> 
> Thanks
> Yu Wang

No objections. Let's close it,
but keeping eyes on new Windows releases.

Best,
Vadim.

Comment 33 Yu Wang 2020-04-09 09:56:23 UTC
(In reply to Vadim Rozenfeld from comment #32)
> (In reply to Yu Wang from comment #31)
> > Hi Vadim,
> > 
> > According to Comment#26 and Comment#28, we got less than 5% cpu utilization
> > for each thread on win10 guests.
> > 
> > So, I cannot reproduce high cpu utilization in host, maybe this bug can be
> > closed.
> > 
> > 
> > Thanks
> > Yu Wang
> 
> No objections. Let's close it,
> but keeping eyes on new Windows releases.

OK, got it.

Thanks
Yu Wang 
> 
> Best,
> Vadim.