Bug 1500334
| Summary: | LUKS driver has poor performance compared to in-kernel driver | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Daniel Berrangé <berrange> |
| Component: | qemu-kvm-rhev | Assignee: | Daniel Berrangé <berrange> |
| Status: | CLOSED ERRATA | QA Contact: | Yanhui Ma <yama> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 7.3 | CC: | berrange, chayang, coli, juzhang, kchamart, knoel, michen, mtessun, ngu, pingl, virt-maint, wquan, yama |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | qemu-kvm-rhev-2.10.0-11.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-04-11 00:38:42 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Fix included in qemu-kvm-rhev-2.10.0-11.el7 Hi Daniel,
I try reproducing and verifying the bz according following steps. But no performance difference between qemu-kvm-rhev-2.10.0-10.el7.x86_64 and qemu-kvm-rhev-2.10.0-11.el7.x86_64.
Could you pls help check whether these steps are correct?
Host:
qemu-kvm-rhev-2.10.0-10.el7.x86_64 vs. qemu-kvm-rhev-2.10.0-11.el7.x86_64
kernel-3.10.0-799.el7.x86_64
Guest:
kernel-3.10.0-799.el7.x86_64
1. Create img in host
qemu-img create --object secret,id=sec0,data=123456 -f qcow2 -o encrypt.format=luks,encrypt.key-secret=sec0 base.qcow2 20G
2. Boot a guest with following cmd:
/usr/libexec/qemu-kvm \
...
-drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=raw,file=/home/kvm_autotest_root/images/rhel75-64-virtio.raw \
-device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=03 \
--object secret,id=sec0,data=123456 \
-drive id=drive_disk1,driver=qcow2,file.filename=/home/kvm_autotest_root/images/base.qcow2,encrypt.key-secret=sec0,if=none,snapshot=off,aio=native,cache=none \
-device virtio-blk-pci,id=disk1,drive=drive_disk1,bootindex=1,bus=pci.0,addr=0x4 \
...
3. Format the data disk and run fio four times in guest
#mkfs.xfs /dev/vdb
#mount /dev/vdb /mnt
#fio --rw=write --bs=4k --iodepth=8 --runtime=1m --direct=1 --filename=/mnt/write_4k_8 --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --output=/tmp/fio_result
4 Results:
For qemu-kvm-rhev-2.10.0-10.el7.x86_64:
bw=13615KB/s, iops=3403
bw=15373KB/s, iops=3843
bw=11157KB/s, iops=2789
bw=16004KB/s, iops=4000
For qemu-kvm-rhev-2.10.0-11.el7.x86_64:
bw=15513KB/s, iops=3878
bw=10735KB/s, iops=2683
bw=14818KB/s, iops=3704
bw=12306KB/s, iops=3076
Thanks,
Yanhui
I see similar results to you when I used the fio tool, and I'm not really sure why, perhaps its due to the use of libaio allowing asynchronous I/O operations. If I instead use 'dd' in the guest, with O_DIRECT the difference before & after the patch is very visible: Before: # dd if=/dev/zero of=/dev/vdb bs=1M count=500 oflag=direct 500+0 records in 500+0 records out 524288000 bytes (524 MB, 500 MiB) copied, 277.494 s, 1.9 MB/s After: # dd if=/dev/zero of=/dev/vdb bs=1M count=500 oflag=direct 500+0 records in 500+0 records out 524288000 bytes (524 MB, 500 MiB) copied, 15.7482 s, 33.3 MB/s I also get similar results if I use the 'qemu-io' tool from the host, against a luks image $ qemu-img create -f luks demo.luks --object secret,id=sec0,data=123456 -o key-secret=sec0 1G Before patch: $ qemu-io --cache none --object secret,id=sec0,data=123456 --image-opts driver=luks,file.filename=demo.luks,key-secret=sec0 qemu-io> writev -P 0xfe 0 500M wrote 524288000/524288000 bytes at offset 0 500 MiB, 1 ops; 0:04:33.13 (1.831 MiB/sec and 0.0037 ops/sec) After patch $ qemu-io --cache none --object secret,id=sec0,data=123456 --image-opts driver=luks,file.filename=demo.luks,key-secret=sec0 qemu-io> writev -P 0xfe 0 500M wrote 524288000/524288000 bytes at offset 0 500 MiB, 1 ops; 0:00:14.68 (34.041 MiB/sec and 0.0681 ops/sec) (In reply to Daniel Berrange from comment #6) > I see similar results to you when I used the fio tool, and I'm not really > sure why, perhaps its due to the use of libaio allowing asynchronous I/O > operations. > > If I instead use 'dd' in the guest, with O_DIRECT the difference before & > after the patch is very visible: > > Before: > > # dd if=/dev/zero of=/dev/vdb bs=1M count=500 oflag=direct > 500+0 records in > 500+0 records out > 524288000 bytes (524 MB, 500 MiB) copied, 277.494 s, 1.9 MB/s > > After: > > # dd if=/dev/zero of=/dev/vdb bs=1M count=500 oflag=direct > 500+0 records in > 500+0 records out > 524288000 bytes (524 MB, 500 MiB) copied, 15.7482 s, 33.3 MB/s > > I had a try it with dd. But got similar results. qemu-kvm-rhev-2.10.0-10.el7.x86_64: #dd if=/dev/zero of=/dev/vdb bs=1M count=500 oflag=direct 500+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 13.693 s, 38.3 MB/s qemu-kvm-rhev-2.10.0-11.el7.x86_64: #dd if=/dev/zero of=/dev/vdb bs=1M count=500 oflag=direct 500+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 13.6657 s, 38.4 MB/s > > I also get similar results if I use the 'qemu-io' tool from the host, > against a luks image > > $ qemu-img create -f luks demo.luks --object secret,id=sec0,data=123456 -o > key-secret=sec0 1G > > Before patch: > > $ qemu-io --cache none --object secret,id=sec0,data=123456 --image-opts > driver=luks,file.filename=demo.luks,key-secret=sec0 > qemu-io> writev -P 0xfe 0 500M > wrote 524288000/524288000 bytes at offset 0 > 500 MiB, 1 ops; 0:04:33.13 (1.831 MiB/sec and 0.0037 ops/sec) > > After patch > > $ qemu-io --cache none --object secret,id=sec0,data=123456 --image-opts > driver=luks,file.filename=demo.luks,key-secret=sec0 > qemu-io> writev -P 0xfe 0 500M > wrote 524288000/524288000 bytes at offset 0 > 500 MiB, 1 ops; 0:00:14.68 (34.041 MiB/sec and 0.0681 ops/sec) You have given "aio=native", which is different to my tests, which did not set 'aio' and so would be equiv to aio=threads. Also note that if your host has SSDs, it'll probably mask any performance difference - my tests used HDDs in the host. (In reply to Daniel Berrange from comment #8) > You have given "aio=native", which is different to my tests, which did not > set 'aio' and so would be equiv to aio=threads. > > Also note that if your host has SSDs, it'll probably mask any performance > difference - my tests used HDDs in the host. Hi Daniel, I used HDD in the host, not SSD, and had a try it again with aio=threads. qemu-kvm-rhev-2.10.0-10.el7.x86_64: dd if=/dev/zero of=/dev/vdb bs=1M count=500 oflag=direct 500+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 13.6441 s, 38.4 MB/s qemu-kvm-rhev-2.10.0-11.el7.x86_64: # dd if=/dev/zero of=/dev/vdb bs=1M count=500 oflag=direct 500+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 13.6661 s, 38.4 MB/s Yanhui, Thanks! I use qemu-io to test it again: #qemu-img create -f luks demo.luks --object secret,id=sec0,data=123456 -o key-secret=sec0 1G qemu-img-rhev-2.10.0-10.el7.x86_64 # qemu-io --cache none --object secret,id=sec0,data=123456 --image-opts driver=luks,file.filename=demo.luks,key-secret=sec0 qemu-io> writev -P 0xfe 0 500M wrote 524288000/524288000 bytes at offset 0 500 MiB, 1 ops; 0:00:18.12 (27.592 MiB/sec and 0.0552 ops/sec) qemu-io> quit qemu-img-rhev-2.10.0-11.el7.x86_64 # qemu-io --cache none --object secret,id=sec0,data=123456 --image-opts driver=luks,file.filename=demo.luks,key-secret=sec0 qemu-io> writev -P 0xfe 0 500M wrote 524288000/524288000 bytes at offset 0 500 MiB, 1 ops; 0:00:10.95 (45.636 MiB/sec and 0.0913 ops/sec) qemu-io> quit According to above data, set the bz verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1104 |
Description of problem: In tests the LUKS driver write performance is as little as 2MiB/s compared to 35-40MiB/s for the host kernel driver. This is primarily due to an inadequate sized bounce buffer, which is fixed in 2.11 with: commit 161253e2d0a83a1b33bca019c6e926013e1a03db Author: Daniel P. Berrange <berrange> Date: Wed Sep 27 13:53:35 2017 +0100 block: use 1 MB bounce buffers for crypto instead of 16KB Using 16KB bounce buffers creates a significant performance penalty for I/O to encrypted volumes on storage which high I/O latency (rotating rust & network drives), because it triggers lots of fairly small I/O operations. On tests with rotating rust, and cache=none|directsync, write speed increased from 2MiB/s to 32MiB/s, on a par with that achieved by the in-kernel luks driver. With other cache modes the in-kernel driver is still notably faster because it is able to report completion of the I/O request before any encryption is done, while the in-QEMU driver must encrypt the data before completion. Signed-off-by: Daniel P. Berrange <berrange> Message-id: 20170927125340.12360-2-berrange Reviewed-by: Eric Blake <eblake> Reviewed-by: Max Reitz <mreitz> Signed-off-by: Max Reitz <mreitz> This patch needs applying to QEMU 2.10.0 tree. Further performance hit is due to QEMU linking to 'nettle' for crypto which has poorly optimized ASM routines for AES. These are 30% slower than equivalents to libgcrypt. We should consider whether to revert the previous change that switched from gcrypt to nettle. Version-Release number of selected component (if applicable): 2.10.0