Bug 1277353 - Virtio-blk and Virtio-scsi performance of windows guest are quite slow
Virtio-blk and Virtio-scsi performance of windows guest are quite slow
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: virtio-win (Show other bugs)
7.2
x86_64 Windows
high Severity high
: rc
: ---
Assigned To: Vadim Rozenfeld
Virtualization Bugs
:
Depends On:
Blocks: 1288337
  Show dependency treegraph
 
Reported: 2015-11-03 01:34 EST by Yanhui Ma
Modified: 2016-07-11 06:31 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-06-30 05:41:44 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
IOMeter results with rhel7.2 guest (72.76 KB, text/plain)
2016-06-27 06:02 EDT, Yumei Huang
no flags Details
IOMeter results with win2012 guest (65.92 KB, text/plain)
2016-06-27 06:02 EDT, Yumei Huang
no flags Details
probe_timer tool source code (1.16 KB, application/octet-stream)
2016-06-30 05:26 EDT, Ladi Prosek
no flags Details
probe_timer tool x64 binary (84.00 KB, application/x-ms-dos-executable)
2016-06-30 05:27 EDT, Ladi Prosek
no flags Details

  None (edit)
Description Yanhui Ma 2015-11-03 01:34:36 EST
Description of problem:

Here are results for windows guest:
https://mojo.redhat.com/docs/DOC-1052284
Here are results for RHLE guest:
https://mojo.redhat.com/docs/DOC-1053455

Based on above results, we only get ~ 11K IOPS with a Windows guest for 4k seq read ramdisk backend vs. ~ 280K IOPS with a RHEL guest.

Bandwidth-wise, for 256K ramdisk backend, ~2000MB/s with RHEL vs. 580MB/sec with Windwos.

Windows guest performance is quite bad, compared with RHEL guest.

Version-Release number of selected component (if applicable):

virtio-scsi: virtio-win-1.7.3-1.el7 vs virtio-win1.7.4-1.el7
virtio-blk: virtio-win-1.7.3-1.el7 vs virtio-win-prewhql-106

How reproducible:
100%

Steps to Reproduce:
1.generate a ramdisk in host
#modprobe brd rd_size=1048576 rd_nr=1
2.boot guest with one ramdisk
numactl \
    -m 3 /usr/libexec/qemu-kvm  \
    -S \
    -name 'virt-tests-vm1' \
    -nodefaults \
    -chardev socket,id=hmp_id_humanmonitor1,path=/tmp/monitor-humanmonitor1-20151014-055006-QfAzU0R5,server,nowait \
    -mon chardev=hmp_id_humanmonitor1,mode=readline \
    -chardev socket,id=serial_id_serial1,path=/tmp/serial-serial1-20151014-055006-QfAzU0R5,server,nowait \
    -device isa-serial,chardev=serial_id_serial1 \
    -chardev socket,id=seabioslog_id_20151014-055006-QfAzU0R5,path=/tmp/seabios-20151014-055006-QfAzU0R5,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20151014-055006-QfAzU0R5,iobase=0x402 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=0x3 \
    -drive file='/usr/local/autotest/tests/virt/shared/data/images/win2008r2-64.raw',if=none,id=drive-ide0-0-0,media=disk,cache=none,snapshot=off,format=raw,aio=native \
    -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,addr=0x4 \
    -drive file='/dev/ram0',if=none,id=virtio-scsi-id0,media=disk,cache=none,snapshot=off,format=raw,aio=native \
    -device scsi-hd,drive=virtio-scsi-id0 \
    -device virtio-net-pci,netdev=id3MZQWH,mac='9a:37:37:37:37:5e',bus=pci.0,addr=0x5,id='idQEvJlC' \
    -netdev tap,id=id3MZQWH,fd=25 \
    -m 4096 \
    -smp 2,cores=1,threads=1,sockets=2 \
    -cpu 'SandyBridge' \
    -M pc \
    -drive file='/usr/local/autotest/tests/virt/shared/data/isos/windows/winutils.iso',if=none,id=drive-ide0-0-1,media=cdrom,format=raw \
    -device ide-drive,bus=ide.0,unit=1,drive=drive-ide0-0-1 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -vnc :0 \
    -vga cirrus \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off   \
    -device sga \
    -enable-kvm

3.in host:
# numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60
node 0 size: 8146 MB
node 0 free: 7603 MB
node 1 cpus: 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61
node 1 size: 8192 MB
node 1 free: 7823 MB
node 2 cpus: 2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62
node 2 size: 8192 MB
node 2 free: 7802 MB
node 3 cpus: 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63
node 3 size: 8192 MB
node 3 free: 7413 MB
node distances:
node   0   1   2   3 
  0:  10  20  30  20 
  1:  20  10  20  30 
  2:  30  20  10  20 
  3:  20  30  20  10

pin two vcpu threads to cpu(3) and cpu(7) respectively

4.in guest:

C:\fio-2.2.10-x64\fio.exe --rw=read --bs=4k --iodepth=1 --runtime=1m --direct=1 --filename=\\.\PHYSICALDRIVE1 --name=job1 --ioengine=windowsaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --output="C:\\fio_result


Actual results:

Windows guest performance is quite bad, compared with RHEL guest.

Expected results:

Windows guest performance is comparable to Linux guest.

Additional info:
Comment 3 Vadim Rozenfeld 2016-06-01 06:50:26 EDT
Can we please give it another try with the latest drivers from build 118
( http://download.eng.bos.redhat.com/brewroot/packages/virtio-win-prewhql/0.1/118/win/virtio-win-prewhql-0.1.zip )

Also, please turn CPU flag "hv_time" on.

Thanks,
Vadim.
Comment 4 Ladi Prosek 2016-06-02 10:26:05 EDT
I just tried this on Win7 and Fedora 23 on a laptop. No NUMA, no vcpu pinning, far from a true perf lab environment. The results came out in favor of Windows (take with a grain of salt). Definitely not seeing order of magnitude differences as suggested by the description.

4k blocks, vioscsi ramdisk
Windows: bw=84125KB/s, iops=21031
Fedora:  bw=70754KB/s, iops=17688

256k blocks, vioscsi ramdisk
Windows: bw=2849.6MB/s, iops=11398
Fedora:  bw=1822.7MB/s, iops=7290

Yanhui Ma, can you please provide the full fio command line you used on RHEL? The result links on the MOJO page you linked seem to be dead. Thanks!
Comment 5 Yanhui Ma 2016-06-03 03:47:28 EDT
(In reply to Ladi Prosek from comment #4)
> I just tried this on Win7 and Fedora 23 on a laptop. No NUMA, no vcpu
> pinning, far from a true perf lab environment. The results came out in favor
> of Windows (take with a grain of salt). Definitely not seeing order of
> magnitude differences as suggested by the description.
> 
> 4k blocks, vioscsi ramdisk
> Windows: bw=84125KB/s, iops=21031
> Fedora:  bw=70754KB/s, iops=17688
> 
> 256k blocks, vioscsi ramdisk
> Windows: bw=2849.6MB/s, iops=11398
> Fedora:  bw=1822.7MB/s, iops=7290
> 
> Yanhui Ma, can you please provide the full fio command line you used on
> RHEL? The result links on the MOJO page you linked seem to be dead. Thanks!

The full fio command line in RHEL likes this:
fio --rw=read --bs=4k --iodepth=64 --runtime=1m --direct=1 --filename=/dev/sdb --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --time_based --output=/tmp/fio_result

You can find fio cml in below link. And I have also updated result links. Now you can access them.
http://kvm-perf.englab.nay.redhat.com/results/8798-autotest/hp-z800-06.qe.lab.eng.nay.redhat.com/virt.qemu.rhel72-brew.local.Host_RHEL.7.2.raw.virtio_scsi.smp2.virtio_net.RHEL.7.2.x86_64.fio_linux.single_disk.rawdisk.repeat1/debug/virt.qemu.rhel72-brew.local.Host_RHEL.7.2.raw.virtio_scsi.smp2.virtio_net.RHEL.7.2.x86_64.fio_linux.single_disk.rawdisk.repeat1.DEBUG
Comment 6 Yanhui Ma 2016-06-03 04:00:00 EDT
(In reply to Vadim Rozenfeld from comment #3)
> Can we please give it another try with the latest drivers from build 118
> (
> http://download.eng.bos.redhat.com/brewroot/packages/virtio-win-prewhql/0.1/
> 118/win/virtio-win-prewhql-0.1.zip )
> 
> Also, please turn CPU flag "hv_time" on.
> 

Sure, will have a try.

> Thanks,
> Vadim.
Comment 7 Yumei Huang 2016-06-16 05:21:38 EDT
Here are the test results comparing rhel7.2 guest and win2012r2 guest (with build 118 and CPU flag "hv_time" on):

https://mojo.redhat.com/docs/DOC-1083825

Compared to RHEL7.2 guest, there is a degradation of 30%~100%  for Win2012r2_x86_64 guest with both virtio-blk and virtio-scsi driver.
Comment 8 Vadim Rozenfeld 2016-06-16 06:42:33 EDT
(In reply to Yumei Huang from comment #7)
> Here are the test results comparing rhel7.2 guest and win2012r2 guest (with
> build 118 and CPU flag "hv_time" on):
> 
> https://mojo.redhat.com/docs/DOC-1083825
> 
> Compared to RHEL7.2 guest, there is a degradation of 30%~100%  for
> Win2012r2_x86_64 guest with both virtio-blk and virtio-scsi driver.

Could yo please tell me about the Windows VM that was used in this test(s) -
is it a fresh one, installed from an .iso image, or did you create it from template? Can you give a try to IDE based disk and see if you get the same or close results? Can it be that  IO Throttling is turning on for some reason?
Can you please give a try to IoMeter and see if you get the same result?  Any chance that I you can share the VM image, so I can try running a clone of your Windows VM system on my setup?

Thanks and best regards,
Vadim.
Comment 12 Yumei Huang 2016-06-27 05:58:36 EDT
Here are test results with IDE based disk and the performance of win2012 is still much worse than RHEL7.2 guest:

-localfs
http://kvm-perf.englab.nay.redhat.com/results/request/Bug1277353/localfs/raw.ide.smp2.virtio_net.html

-ramdisk
http://kvm-perf.englab.nay.redhat.com/results/request/Bug1277353/1ramdisk/raw.ide.smp2.virtio_net.html


Also QE have tried with IOMeter and the performance of win2012 is worse too. The results will be attached.
Comment 13 Yumei Huang 2016-06-27 06:02 EDT
Created attachment 1172824 [details]
IOMeter results with rhel7.2 guest
Comment 14 Yumei Huang 2016-06-27 06:02 EDT
Created attachment 1172825 [details]
IOMeter results with win2012 guest
Comment 22 Ladi Prosek 2016-06-30 05:26 EDT
Created attachment 1174425 [details]
probe_timer tool source code
Comment 23 Ladi Prosek 2016-06-30 05:27 EDT
Created attachment 1174426 [details]
probe_timer tool x64 binary
Comment 24 Ladi Prosek 2016-06-30 05:35:27 EDT
For optimum I/O performance it's critical to make sure that Windows guests use the Hyper-V reference counter feature. QEMU command line should include

-cpu ...,hv_time

and

-no-hpet

and the useplatformclock Windows boot entry option should be disabled:

bcdedit /set useplatformclock false

The attached probe_timer can be used to analyze guests. It prints the result of QueryPerformanceFrequency, measures the cost of a QueryPerformanceCounter call, and checks the Hyper-V reference counter feature.

Note You need to log in before you can comment on or make changes to this bug.