Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1762775

Summary: RFE: support gcrypt's native XTS cipher mode code instead of QEMU's local copy
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Daniel Berrangé <berrange>
Component: qemu-kvmAssignee: Daniel Berrangé <berrange>
qemu-kvm sub component: General QA Contact: Xueqiang Wei <xuwei>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: high CC: areis, coli, ddepaula, jinzhao, juzhang, knoel, mmethot, virt-maint
Version: 8.2Keywords: FutureFeature
Target Milestone: rcFlags: knoel: mirror+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-4.2.0-1.module+el8.2.0+4793+b09dd2fb Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-05 09:50:34 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1701948    

Description Daniel Berrangé 2019-10-17 13:04:03 UTC
Description of problem:
QEMU currently has its own XTS cipher mode impl, since at time it was introduced this was not available in either gcrypt or nettle.

This is undesirable long term since QEMU's impl of XTS will not be FIPS certified. 

It will also miss out of ongoing performance improvements, for example gcrypt git master has accelerated XTS so that it is x5 quicker than QEMU's (bug 1762765)

Finally, QEMU's impl involves making many encryption calls to gcrypt with 16 byte buffers which hits a mutex lock in gcrypt crippling performance (bug 1762741)

For all three of these reasons, we must switch to gcrypt's native XTS mode. 

This will give an immediate performance improvement to QEMU, approximately doubling cipher speed for AES-XTS using current libgcrypt-1.8.3-4.el8

Version-Release number of selected component (if applicable):
qemu-kvm-4.1.0-13.el8

Comment 1 Daniel Berrangé 2019-10-18 08:32:20 UTC
Patches proposed upstram

https://lists.gnu.org/archive/html/qemu-devel/2019-10/msg04295.html

Comment 2 Daniel Berrangé 2019-11-15 14:23:04 UTC
Patches merged upstream

https://lists.gnu.org/archive/html/qemu-devel/2019-10/msg07688.html

We'll get this as part of the rebase to QEMU 4.2

Comment 6 Tingting Mao 2019-12-13 02:33:19 UTC
Tried to verify this bug as below.


Scenario1 (Regression test with luks)
No new regression bugs. Refer to below link for more details, please.
https://projects.engineering.redhat.com/browse/XKVMEIGHT-1498 


Scenario2 (Performance test)

Steps:
1.Create a data disk over tmpfs.
# mkdir tmp
# mount tmpfs tmp/ -t tmpfs -o size=50G
# qemu-img create -f luks --object secret,id=sec0,data=test -o key-secret=sec0 data.luks 5G
# df -T tmp/data.luks
Filesystem     Type  1K-blocks    Used Available Use% Mounted on
tmpfs          tmpfs  52428800 2620432  49808368   5% /home/test/tmp

2. Boot a guest with this data disk.
# /usr/libexec/qemu-kvm \
    -name 'guest-rhel7.7' \
    -machine q35 \
    -nodefaults \
    -vga qxl \
    -object secret,id=sec0,data=test \
    -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
    -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
    -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
    -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
    -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
    -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
    -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
    -blockdev driver=file,cache.direct=on,cache.no-flush=off,node-name=my_file,filename=rhel820-64-virtio.qcow2 \
    -blockdev driver=qcow2,node-name=my,file=my_file \
    -device virtio-blk-pci,id=virtio_blk_pci0,drive=my,bus=pci.2 \
    -blockdev driver=file,cache.direct=off,cache.no-flush=off,node-name=my_file1,filename=tmp/data.luks \
    -blockdev driver=luks,key-secret=sec0,node-name=my1,file=my_file1 \
    -device virtio-blk-pci,id=virtio_blk_pci1,drive=my1,bus=pci.3 \
    -vnc :0 \
    -m 8192 \
    -smp 4 \
    -netdev tap,id=hostnet0,vhost=on \
    -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:56:00:00:00:07,bus=pci.4,addr=0x0 \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/home/qmp-sock2,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -boot order=cdn,once=c,menu=off,strict=off  \
    -enable-kvm \
    -monitor stdio


3. Test the i/o performance via `fio`. 

In ‘qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489’.

# fio --filename=/dev/vdb --direct=1 --rw=randrw --bs=4K --name=my_test --iodepth=1 --ioengine=libaio
my_test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [m(1)][100.0%][r=27.6MiB/s,w=27.0MiB/s][r=7063,w=6917 IOPS][eta 00m:00s]
my_test: (groupid=0, jobs=1): err= 0: pid=2519: Thu Dec 12 19:15:28 2019
   read: IOPS=7067, BW=27.6MiB/s (28.9MB/s)(2561MiB/92771msec)
    slat (nsec): min=7158, max=72640, avg=8355.29, stdev=881.43
    clat (usec): min=28, max=5768, avg=55.42, stdev=11.76
     lat (usec): min=46, max=5776, avg=64.98, stdev=11.83
    clat percentiles (usec):
     |  1.00th=[   45],  5.00th=[   48], 10.00th=[   52], 20.00th=[   55],
     | 30.00th=[   55], 40.00th=[   56], 50.00th=[   56], 60.00th=[   57],
     | 70.00th=[   57], 80.00th=[   58], 90.00th=[   59], 95.00th=[   60],
     | 99.00th=[   66], 99.50th=[   68], 99.90th=[  102], 99.95th=[  114],
     | 99.99th=[  141]
   bw (  KiB/s): min=25912, max=30848, per=100.00%, avg=28268.81, stdev=803.40, samples=185
   iops        : min= 6478, max= 7712, avg=7067.20, stdev=200.84, samples=185
  write: IOPS=7060, BW=27.6MiB/s (28.9MB/s)(2559MiB/92771msec)
    slat (nsec): min=7419, max=60383, avg=8597.30, stdev=903.69
    clat (usec): min=34, max=16401, avg=58.16, stdev=21.79
     lat (usec): min=51, max=16415, avg=67.97, stdev=21.84
    clat percentiles (usec):
     |  1.00th=[   47],  5.00th=[   50], 10.00th=[   53], 20.00th=[   57],
     | 30.00th=[   58], 40.00th=[   58], 50.00th=[   59], 60.00th=[   60],
     | 70.00th=[   60], 80.00th=[   61], 90.00th=[   63], 95.00th=[   64],
     | 99.00th=[   69], 99.50th=[   72], 99.90th=[  111], 99.95th=[  121],
     | 99.99th=[  139]
   bw (  KiB/s): min=26272, max=30744, per=100.00%, avg=28244.72, stdev=817.28, samples=185
   iops        : min= 6568, max= 7686, avg=7061.17, stdev=204.29, samples=185
  lat (usec)   : 50=7.61%, 100=92.28%, 250=0.11%, 500=0.01%, 750=0.01%
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%
  cpu          : usr=10.43%, sys=16.48%, ctx=1310729, majf=0, minf=14
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=655676,655044,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=27.6MiB/s (28.9MB/s), 27.6MiB/s-27.6MiB/s (28.9MB/s-28.9MB/s), io=2561MiB (2686MB), run=92771-92771msec
  WRITE: bw=27.6MiB/s (28.9MB/s), 27.6MiB/s-27.6MiB/s (28.9MB/s-28.9MB/s), io=2559MiB (2683MB), run=92771-92771msec

Disk stats (read/write):
  vdb: ios=654841/654182, merge=0/0, ticks=37662/39406, in_queue=43, util=99.91%


In ‘qemu-kvm-4.1.0-14.module+el8.1.0+4548+ed1300f4’.
# fio --filename=/dev/vdb --direct=1 --rw=randrw --bs=4K --name=my_test --iodepth=1 --ioengine=libaio
my_test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [m(1)][100.0%][r=23.8MiB/s,w=23.0MiB/s][r=6098,w=5898 IOPS][eta 00m:00s]
my_test: (groupid=0, jobs=1): err= 0: pid=2432: Thu Dec 12 19:27:47 2019
   read: IOPS=6143, BW=23.0MiB/s (25.2MB/s)(2561MiB/106732msec)
    slat (usec): min=7, max=1465, avg= 8.51, stdev= 2.52
    clat (usec): min=29, max=8477, avg=66.84, stdev=12.85
     lat (usec): min=61, max=8487, avg=76.57, stdev=13.13
    clat percentiles (usec):
     |  1.00th=[   57],  5.00th=[   61], 10.00th=[   62], 20.00th=[   65],
     | 30.00th=[   66], 40.00th=[   67], 50.00th=[   68], 60.00th=[   69],
     | 70.00th=[   69], 80.00th=[   70], 90.00th=[   71], 95.00th=[   72],
     | 99.00th=[   77], 99.50th=[   80], 99.90th=[  143], 99.95th=[  151],
     | 99.99th=[  176]
   bw (  KiB/s): min=23169, max=26040, per=100.00%, avg=24571.66, stdev=514.37, samples=213
   iops        : min= 5792, max= 6510, avg=6142.89, stdev=128.60, samples=213
  write: IOPS=6137, BW=23.0MiB/s (25.1MB/s)(2559MiB/106732msec)
    slat (usec): min=7, max=946, avg= 8.71, stdev= 1.46
    clat (usec): min=3, max=4664, avg=67.79, stdev= 9.44
     lat (usec): min=62, max=4674, avg=77.71, stdev= 9.57
    clat percentiles (usec):
     |  1.00th=[   58],  5.00th=[   62], 10.00th=[   63], 20.00th=[   66],
     | 30.00th=[   67], 40.00th=[   68], 50.00th=[   69], 60.00th=[   70],
     | 70.00th=[   70], 80.00th=[   71], 90.00th=[   72], 95.00th=[   74],
     | 99.00th=[   78], 99.50th=[   81], 99.90th=[  143], 99.95th=[  153],
     | 99.99th=[  178]
   bw (  KiB/s): min=22912, max=26168, per=100.00%, avg=24549.62, stdev=540.81, samples=213
   iops        : min= 5728, max= 6542, avg=6137.40, stdev=135.21, samples=213
  lat (usec)   : 4=0.01%, 50=0.01%, 100=99.81%, 250=0.18%, 500=0.01%
  lat (usec)   : 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%
  cpu          : usr=9.04%, sys=14.55%, ctx=1310724, majf=0, minf=14
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=655676,655044,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=23.0MiB/s (25.2MB/s), 23.0MiB/s-23.0MiB/s (25.2MB/s-25.2MB/s), io=2561MiB (2686MB), run=106732-106732msec
  WRITE: bw=23.0MiB/s (25.1MB/s), 23.0MiB/s-23.0MiB/s (25.1MB/s-25.1MB/s), io=2559MiB (2683MB), run=106732-106732msec

Disk stats (read/write):
  vdb: ios=655169/654565, merge=0/0, ticks=45097/45662, in_queue=22, util=99.94%

Comment 10 Ademar Reis 2020-02-05 23:07:19 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 13 errata-xmlrpc 2020-05-05 09:50:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2017