Bug 1500334

Summary: LUKS driver has poor performance compared to in-kernel driver
Product: Red Hat Enterprise Linux 7 Reporter: Daniel Berrangé <berrange>
Component: qemu-kvm-rhevAssignee: Daniel Berrangé <berrange>
Status: CLOSED ERRATA QA Contact: Yanhui Ma <yama>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.3CC: berrange, chayang, coli, juzhang, kchamart, knoel, michen, mtessun, ngu, pingl, virt-maint, wquan, yama
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.10.0-11.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-11 00:38:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Daniel Berrangé 2017-10-10 12:01:00 UTC
Description of problem:
In tests the LUKS driver write performance is as little as 2MiB/s compared to 35-40MiB/s for the host kernel driver. This is primarily due to an inadequate sized bounce buffer, which is fixed in 2.11 with:

commit 161253e2d0a83a1b33bca019c6e926013e1a03db
Author: Daniel P. Berrange <berrange>
Date:   Wed Sep 27 13:53:35 2017 +0100

    block: use 1 MB bounce buffers for crypto instead of 16KB
    
    Using 16KB bounce buffers creates a significant performance
    penalty for I/O to encrypted volumes on storage which high
    I/O latency (rotating rust & network drives), because it
    triggers lots of fairly small I/O operations.
    
    On tests with rotating rust, and cache=none|directsync,
    write speed increased from 2MiB/s to 32MiB/s, on a par
    with that achieved by the in-kernel luks driver. With
    other cache modes the in-kernel driver is still notably
    faster because it is able to report completion of the
    I/O request before any encryption is done, while the
    in-QEMU driver must encrypt the data before completion.
    
    Signed-off-by: Daniel P. Berrange <berrange>
    Message-id: 20170927125340.12360-2-berrange
    Reviewed-by: Eric Blake <eblake>
    Reviewed-by: Max Reitz <mreitz>
    Signed-off-by: Max Reitz <mreitz>


This patch needs applying to QEMU 2.10.0 tree.

Further performance hit is due to QEMU linking to 'nettle' for crypto which has poorly optimized ASM routines for AES. These are 30% slower than equivalents to libgcrypt. We should consider whether to revert the previous change that switched from gcrypt to nettle.

Version-Release number of selected component (if applicable):
2.10.0

Comment 3 Miroslav Rezanina 2017-12-05 12:59:15 UTC
Fix included in qemu-kvm-rhev-2.10.0-11.el7

Comment 5 Yanhui Ma 2017-12-06 06:52:12 UTC
Hi Daniel,

I try reproducing and verifying the bz according following steps. But no performance difference between qemu-kvm-rhev-2.10.0-10.el7.x86_64 and qemu-kvm-rhev-2.10.0-11.el7.x86_64. 
Could you pls help check whether these steps are correct? 
 
Host:
qemu-kvm-rhev-2.10.0-10.el7.x86_64 vs. qemu-kvm-rhev-2.10.0-11.el7.x86_64
kernel-3.10.0-799.el7.x86_64
Guest:
kernel-3.10.0-799.el7.x86_64

1. Create img in host
qemu-img create --object secret,id=sec0,data=123456 -f qcow2 -o encrypt.format=luks,encrypt.key-secret=sec0 base.qcow2 20G
2. Boot a guest with following cmd:
/usr/libexec/qemu-kvm \
...
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=raw,file=/home/kvm_autotest_root/images/rhel75-64-virtio.raw \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=03 \
    --object secret,id=sec0,data=123456 \
    -drive id=drive_disk1,driver=qcow2,file.filename=/home/kvm_autotest_root/images/base.qcow2,encrypt.key-secret=sec0,if=none,snapshot=off,aio=native,cache=none \
    -device virtio-blk-pci,id=disk1,drive=drive_disk1,bootindex=1,bus=pci.0,addr=0x4 \
...

3. Format the data disk and run fio four times in guest
#mkfs.xfs /dev/vdb
#mount /dev/vdb /mnt
#fio --rw=write --bs=4k --iodepth=8 --runtime=1m --direct=1 --filename=/mnt/write_4k_8 --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --output=/tmp/fio_result

4 Results:
For qemu-kvm-rhev-2.10.0-10.el7.x86_64:
bw=13615KB/s, iops=3403
bw=15373KB/s, iops=3843
bw=11157KB/s, iops=2789
bw=16004KB/s, iops=4000

For qemu-kvm-rhev-2.10.0-11.el7.x86_64:
bw=15513KB/s, iops=3878
bw=10735KB/s, iops=2683
bw=14818KB/s, iops=3704
bw=12306KB/s, iops=3076

Thanks,
Yanhui

Comment 6 Daniel Berrangé 2017-12-06 10:59:11 UTC
I see similar results to you when I used the fio tool, and I'm not really sure why, perhaps its due to the use of libaio allowing asynchronous I/O operations.

If I instead use 'dd' in the guest, with O_DIRECT the difference before & after the patch is very visible:

Before:

# dd if=/dev/zero of=/dev/vdb bs=1M count=500 oflag=direct
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 277.494 s, 1.9 MB/s

After:

# dd if=/dev/zero of=/dev/vdb bs=1M count=500 oflag=direct
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 15.7482 s, 33.3 MB/s



I also get similar results if I use the 'qemu-io' tool from the host, against a luks image

$ qemu-img create -f luks demo.luks --object secret,id=sec0,data=123456 -o key-secret=sec0 1G

Before patch:

$ qemu-io --cache none --object secret,id=sec0,data=123456  --image-opts driver=luks,file.filename=demo.luks,key-secret=sec0
qemu-io> writev -P 0xfe 0 500M
wrote 524288000/524288000 bytes at offset 0
500 MiB, 1 ops; 0:04:33.13 (1.831 MiB/sec and 0.0037 ops/sec)

After patch

$ qemu-io --cache none --object secret,id=sec0,data=123456  --image-opts driver=luks,file.filename=demo.luks,key-secret=sec0
qemu-io> writev -P 0xfe 0 500M
wrote 524288000/524288000 bytes at offset 0
500 MiB, 1 ops; 0:00:14.68 (34.041 MiB/sec and 0.0681 ops/sec)

Comment 7 Yanhui Ma 2017-12-13 06:31:46 UTC
(In reply to Daniel Berrange from comment #6)
> I see similar results to you when I used the fio tool, and I'm not really
> sure why, perhaps its due to the use of libaio allowing asynchronous I/O
> operations.
> 
> If I instead use 'dd' in the guest, with O_DIRECT the difference before &
> after the patch is very visible:
> 
> Before:
> 
> # dd if=/dev/zero of=/dev/vdb bs=1M count=500 oflag=direct
> 500+0 records in
> 500+0 records out
> 524288000 bytes (524 MB, 500 MiB) copied, 277.494 s, 1.9 MB/s
> 
> After:
> 
> # dd if=/dev/zero of=/dev/vdb bs=1M count=500 oflag=direct
> 500+0 records in
> 500+0 records out
> 524288000 bytes (524 MB, 500 MiB) copied, 15.7482 s, 33.3 MB/s
> 
> 

I had a try it with dd. But got similar results.

qemu-kvm-rhev-2.10.0-10.el7.x86_64:
#dd if=/dev/zero of=/dev/vdb bs=1M count=500 oflag=direct
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 13.693 s, 38.3 MB/s

qemu-kvm-rhev-2.10.0-11.el7.x86_64:
#dd if=/dev/zero of=/dev/vdb bs=1M count=500 oflag=direct
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 13.6657 s, 38.4 MB/s


> 
> I also get similar results if I use the 'qemu-io' tool from the host,
> against a luks image
> 
> $ qemu-img create -f luks demo.luks --object secret,id=sec0,data=123456 -o
> key-secret=sec0 1G
> 
> Before patch:
> 
> $ qemu-io --cache none --object secret,id=sec0,data=123456  --image-opts
> driver=luks,file.filename=demo.luks,key-secret=sec0
> qemu-io> writev -P 0xfe 0 500M
> wrote 524288000/524288000 bytes at offset 0
> 500 MiB, 1 ops; 0:04:33.13 (1.831 MiB/sec and 0.0037 ops/sec)
> 
> After patch
> 
> $ qemu-io --cache none --object secret,id=sec0,data=123456  --image-opts
> driver=luks,file.filename=demo.luks,key-secret=sec0
> qemu-io> writev -P 0xfe 0 500M
> wrote 524288000/524288000 bytes at offset 0
> 500 MiB, 1 ops; 0:00:14.68 (34.041 MiB/sec and 0.0681 ops/sec)

Comment 8 Daniel Berrangé 2017-12-13 09:06:41 UTC
You have given "aio=native", which is different to my tests, which did not set 'aio' and so would be equiv to aio=threads.

Also note that if your host has SSDs, it'll probably mask any performance difference - my tests used HDDs in the host.

Comment 9 Yanhui Ma 2017-12-14 04:15:33 UTC
(In reply to Daniel Berrange from comment #8)
> You have given "aio=native", which is different to my tests, which did not
> set 'aio' and so would be equiv to aio=threads.
> 
> Also note that if your host has SSDs, it'll probably mask any performance
> difference - my tests used HDDs in the host.

Hi Daniel,

I used HDD in the host, not SSD, and had a try it again with aio=threads.

qemu-kvm-rhev-2.10.0-10.el7.x86_64:
dd if=/dev/zero of=/dev/vdb bs=1M count=500 oflag=direct
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 13.6441 s, 38.4 MB/s

qemu-kvm-rhev-2.10.0-11.el7.x86_64:
# dd if=/dev/zero of=/dev/vdb bs=1M count=500 oflag=direct
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 13.6661 s, 38.4 MB/s

Yanhui,
Thanks!

Comment 11 Yanhui Ma 2018-02-27 06:07:09 UTC
I use qemu-io to test it again:

#qemu-img create -f luks demo.luks --object secret,id=sec0,data=123456 -o key-secret=sec0 1G

qemu-img-rhev-2.10.0-10.el7.x86_64
# qemu-io --cache none --object secret,id=sec0,data=123456  --image-opts driver=luks,file.filename=demo.luks,key-secret=sec0
qemu-io> writev -P 0xfe 0 500M
wrote 524288000/524288000 bytes at offset 0
500 MiB, 1 ops; 0:00:18.12 (27.592 MiB/sec and 0.0552 ops/sec)
qemu-io> quit


qemu-img-rhev-2.10.0-11.el7.x86_64
# qemu-io --cache none --object secret,id=sec0,data=123456  --image-opts driver=luks,file.filename=demo.luks,key-secret=sec0
qemu-io> writev -P 0xfe 0 500M
wrote 524288000/524288000 bytes at offset 0
500 MiB, 1 ops; 0:00:10.95 (45.636 MiB/sec and 0.0913 ops/sec)
qemu-io> quit


According to above data, set the bz verified.

Comment 13 errata-xmlrpc 2018-04-11 00:38:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104