Bug 1277353 - Virtio-blk and Virtio-scsi performance of windows guest are quite slow
Summary: Virtio-blk and Virtio-scsi performance of windows guest are quite slow
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: virtio-win
Version: 7.2
Hardware: x86_64
OS: Windows
high
high
Target Milestone: rc
: ---
Assignee: Vadim Rozenfeld
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 1288337
TreeView+ depends on / blocked
 
Reported: 2015-11-03 06:34 UTC by Yanhui Ma
Modified: 2016-07-11 10:31 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-06-30 09:41:44 UTC
Target Upstream Version:


Attachments (Terms of Use)
IOMeter results with rhel7.2 guest (72.76 KB, text/plain)
2016-06-27 10:02 UTC, Yumei Huang
no flags Details
IOMeter results with win2012 guest (65.92 KB, text/plain)
2016-06-27 10:02 UTC, Yumei Huang
no flags Details
probe_timer tool source code (1.16 KB, application/octet-stream)
2016-06-30 09:26 UTC, Ladi Prosek
no flags Details
probe_timer tool x64 binary (84.00 KB, application/x-ms-dos-executable)
2016-06-30 09:27 UTC, Ladi Prosek
no flags Details

Description Yanhui Ma 2015-11-03 06:34:36 UTC
Description of problem:

Here are results for windows guest:
https://mojo.redhat.com/docs/DOC-1052284
Here are results for RHLE guest:
https://mojo.redhat.com/docs/DOC-1053455

Based on above results, we only get ~ 11K IOPS with a Windows guest for 4k seq read ramdisk backend vs. ~ 280K IOPS with a RHEL guest.

Bandwidth-wise, for 256K ramdisk backend, ~2000MB/s with RHEL vs. 580MB/sec with Windwos.

Windows guest performance is quite bad, compared with RHEL guest.

Version-Release number of selected component (if applicable):

virtio-scsi: virtio-win-1.7.3-1.el7 vs virtio-win1.7.4-1.el7
virtio-blk: virtio-win-1.7.3-1.el7 vs virtio-win-prewhql-106

How reproducible:
100%

Steps to Reproduce:
1.generate a ramdisk in host
#modprobe brd rd_size=1048576 rd_nr=1
2.boot guest with one ramdisk
numactl \
    -m 3 /usr/libexec/qemu-kvm  \
    -S \
    -name 'virt-tests-vm1' \
    -nodefaults \
    -chardev socket,id=hmp_id_humanmonitor1,path=/tmp/monitor-humanmonitor1-20151014-055006-QfAzU0R5,server,nowait \
    -mon chardev=hmp_id_humanmonitor1,mode=readline \
    -chardev socket,id=serial_id_serial1,path=/tmp/serial-serial1-20151014-055006-QfAzU0R5,server,nowait \
    -device isa-serial,chardev=serial_id_serial1 \
    -chardev socket,id=seabioslog_id_20151014-055006-QfAzU0R5,path=/tmp/seabios-20151014-055006-QfAzU0R5,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20151014-055006-QfAzU0R5,iobase=0x402 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=0x3 \
    -drive file='/usr/local/autotest/tests/virt/shared/data/images/win2008r2-64.raw',if=none,id=drive-ide0-0-0,media=disk,cache=none,snapshot=off,format=raw,aio=native \
    -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,addr=0x4 \
    -drive file='/dev/ram0',if=none,id=virtio-scsi-id0,media=disk,cache=none,snapshot=off,format=raw,aio=native \
    -device scsi-hd,drive=virtio-scsi-id0 \
    -device virtio-net-pci,netdev=id3MZQWH,mac='9a:37:37:37:37:5e',bus=pci.0,addr=0x5,id='idQEvJlC' \
    -netdev tap,id=id3MZQWH,fd=25 \
    -m 4096 \
    -smp 2,cores=1,threads=1,sockets=2 \
    -cpu 'SandyBridge' \
    -M pc \
    -drive file='/usr/local/autotest/tests/virt/shared/data/isos/windows/winutils.iso',if=none,id=drive-ide0-0-1,media=cdrom,format=raw \
    -device ide-drive,bus=ide.0,unit=1,drive=drive-ide0-0-1 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -vnc :0 \
    -vga cirrus \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off   \
    -device sga \
    -enable-kvm

3.in host:
# numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60
node 0 size: 8146 MB
node 0 free: 7603 MB
node 1 cpus: 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61
node 1 size: 8192 MB
node 1 free: 7823 MB
node 2 cpus: 2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62
node 2 size: 8192 MB
node 2 free: 7802 MB
node 3 cpus: 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63
node 3 size: 8192 MB
node 3 free: 7413 MB
node distances:
node   0   1   2   3 
  0:  10  20  30  20 
  1:  20  10  20  30 
  2:  30  20  10  20 
  3:  20  30  20  10

pin two vcpu threads to cpu(3) and cpu(7) respectively

4.in guest:

C:\fio-2.2.10-x64\fio.exe --rw=read --bs=4k --iodepth=1 --runtime=1m --direct=1 --filename=\\.\PHYSICALDRIVE1 --name=job1 --ioengine=windowsaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --output="C:\\fio_result


Actual results:

Windows guest performance is quite bad, compared with RHEL guest.

Expected results:

Windows guest performance is comparable to Linux guest.

Additional info:

Comment 3 Vadim Rozenfeld 2016-06-01 10:50:26 UTC
Can we please give it another try with the latest drivers from build 118
( http://download.eng.bos.redhat.com/brewroot/packages/virtio-win-prewhql/0.1/118/win/virtio-win-prewhql-0.1.zip )

Also, please turn CPU flag "hv_time" on.

Thanks,
Vadim.

Comment 4 Ladi Prosek 2016-06-02 14:26:05 UTC
I just tried this on Win7 and Fedora 23 on a laptop. No NUMA, no vcpu pinning, far from a true perf lab environment. The results came out in favor of Windows (take with a grain of salt). Definitely not seeing order of magnitude differences as suggested by the description.

4k blocks, vioscsi ramdisk
Windows: bw=84125KB/s, iops=21031
Fedora:  bw=70754KB/s, iops=17688

256k blocks, vioscsi ramdisk
Windows: bw=2849.6MB/s, iops=11398
Fedora:  bw=1822.7MB/s, iops=7290

Yanhui Ma, can you please provide the full fio command line you used on RHEL? The result links on the MOJO page you linked seem to be dead. Thanks!

Comment 5 Yanhui Ma 2016-06-03 07:47:28 UTC
(In reply to Ladi Prosek from comment #4)
> I just tried this on Win7 and Fedora 23 on a laptop. No NUMA, no vcpu
> pinning, far from a true perf lab environment. The results came out in favor
> of Windows (take with a grain of salt). Definitely not seeing order of
> magnitude differences as suggested by the description.
> 
> 4k blocks, vioscsi ramdisk
> Windows: bw=84125KB/s, iops=21031
> Fedora:  bw=70754KB/s, iops=17688
> 
> 256k blocks, vioscsi ramdisk
> Windows: bw=2849.6MB/s, iops=11398
> Fedora:  bw=1822.7MB/s, iops=7290
> 
> Yanhui Ma, can you please provide the full fio command line you used on
> RHEL? The result links on the MOJO page you linked seem to be dead. Thanks!

The full fio command line in RHEL likes this:
fio --rw=read --bs=4k --iodepth=64 --runtime=1m --direct=1 --filename=/dev/sdb --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --time_based --output=/tmp/fio_result

You can find fio cml in below link. And I have also updated result links. Now you can access them.
http://kvm-perf.englab.nay.redhat.com/results/8798-autotest/hp-z800-06.qe.lab.eng.nay.redhat.com/virt.qemu.rhel72-brew.local.Host_RHEL.7.2.raw.virtio_scsi.smp2.virtio_net.RHEL.7.2.x86_64.fio_linux.single_disk.rawdisk.repeat1/debug/virt.qemu.rhel72-brew.local.Host_RHEL.7.2.raw.virtio_scsi.smp2.virtio_net.RHEL.7.2.x86_64.fio_linux.single_disk.rawdisk.repeat1.DEBUG

Comment 6 Yanhui Ma 2016-06-03 08:00:00 UTC
(In reply to Vadim Rozenfeld from comment #3)
> Can we please give it another try with the latest drivers from build 118
> (
> http://download.eng.bos.redhat.com/brewroot/packages/virtio-win-prewhql/0.1/
> 118/win/virtio-win-prewhql-0.1.zip )
> 
> Also, please turn CPU flag "hv_time" on.
> 

Sure, will have a try.

> Thanks,
> Vadim.

Comment 7 Yumei Huang 2016-06-16 09:21:38 UTC
Here are the test results comparing rhel7.2 guest and win2012r2 guest (with build 118 and CPU flag "hv_time" on):

https://mojo.redhat.com/docs/DOC-1083825

Compared to RHEL7.2 guest, there is a degradation of 30%~100%  for Win2012r2_x86_64 guest with both virtio-blk and virtio-scsi driver.

Comment 8 Vadim Rozenfeld 2016-06-16 10:42:33 UTC
(In reply to Yumei Huang from comment #7)
> Here are the test results comparing rhel7.2 guest and win2012r2 guest (with
> build 118 and CPU flag "hv_time" on):
> 
> https://mojo.redhat.com/docs/DOC-1083825
> 
> Compared to RHEL7.2 guest, there is a degradation of 30%~100%  for
> Win2012r2_x86_64 guest with both virtio-blk and virtio-scsi driver.

Could yo please tell me about the Windows VM that was used in this test(s) -
is it a fresh one, installed from an .iso image, or did you create it from template? Can you give a try to IDE based disk and see if you get the same or close results? Can it be that  IO Throttling is turning on for some reason?
Can you please give a try to IoMeter and see if you get the same result?  Any chance that I you can share the VM image, so I can try running a clone of your Windows VM system on my setup?

Thanks and best regards,
Vadim.

Comment 12 Yumei Huang 2016-06-27 09:58:36 UTC
Here are test results with IDE based disk and the performance of win2012 is still much worse than RHEL7.2 guest:

-localfs
http://kvm-perf.englab.nay.redhat.com/results/request/Bug1277353/localfs/raw.ide.smp2.virtio_net.html

-ramdisk
http://kvm-perf.englab.nay.redhat.com/results/request/Bug1277353/1ramdisk/raw.ide.smp2.virtio_net.html


Also QE have tried with IOMeter and the performance of win2012 is worse too. The results will be attached.

Comment 13 Yumei Huang 2016-06-27 10:02:12 UTC
Created attachment 1172824 [details]
IOMeter results with rhel7.2 guest

Comment 14 Yumei Huang 2016-06-27 10:02:56 UTC
Created attachment 1172825 [details]
IOMeter results with win2012 guest

Comment 22 Ladi Prosek 2016-06-30 09:26:26 UTC
Created attachment 1174425 [details]
probe_timer tool source code

Comment 23 Ladi Prosek 2016-06-30 09:27:24 UTC
Created attachment 1174426 [details]
probe_timer tool x64 binary

Comment 24 Ladi Prosek 2016-06-30 09:35:27 UTC
For optimum I/O performance it's critical to make sure that Windows guests use the Hyper-V reference counter feature. QEMU command line should include

-cpu ...,hv_time

and

-no-hpet

and the useplatformclock Windows boot entry option should be disabled:

bcdedit /set useplatformclock false

The attached probe_timer can be used to analyze guests. It prints the result of QueryPerformanceFrequency, measures the cost of a QueryPerformanceCounter call, and checks the Hyper-V reference counter feature.


Note You need to log in before you can comment on or make changes to this bug.