Bug 1680231
Summary: | severe performance impact using luks format | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Yihuang Yu <yihyu> | ||||
Component: | qemu-kvm | Assignee: | Daniel Berrangé <berrange> | ||||
qemu-kvm sub component: | General | QA Contact: | Tingting Mao <timao> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | high | ||||||
Priority: | high | CC: | berrange, coli, hannsj_uhl, knoel, mdeng, micai, ngu, qzhang, rbalakri, timao, virt-maint, wquan, xianwang, xuma, yama, zhenyzha | ||||
Version: | 8.0 | ||||||
Target Milestone: | rc | ||||||
Target Release: | 8.1 | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-2.12.0-69.module+el8.1.0+3143+457f984c | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | 1680226 | Environment: | |||||
Last Closed: | 2019-11-05 20:48:05 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1680226 | ||||||
Attachments: |
|
Description
Yihuang Yu
2019-02-23 08:24:19 UTC
See also bug 1666336 There is one set of upstream patches that significantly improve the performance when using AES in XTS mode, which is the default. These patches approximately double the performance for encryption/decryption: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg05389.html These are quite straightforward to backport to QEMU in RHEL-7. There will still be a delta vs the in-kernel performance, even with these patches, but it will be reduced. Note that problem with ppc64 is different from x86_64. On x86_64, QEMU will use AES instructions in hardware. AFAIK, there is no hardware accelerated impl for ppc64 so it will be significantly slower than x86_64. Fix included in qemu-kvm-2.12.0-69.module+el8.1.0+3143+457f984c Hi Daniel, According to my latest test, it seems the performance improved not that lot for luks(write:138.288 MiB/sec -> 174.160 MiB/sec, read:161.318 MiB/sec -> 198.676 MiB/sec). And it still causes almost 100% CPU usage on their hypervisor when writing data. So could you please help to check where I am wrong? Or the fix needs to update. Thanks in advance. Tested with: # qemu-img create -f luks --object secret,id=secret0,data="redhat" -o key-secret=secret0 data.luks 10G Formatting 'data.luks', fmt=luks size=10737418240 key-secret=secret0 # df -Th data.luks Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/rhel_dell--per740xd--01-home xfs 290G 100G 190G 35% /home # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 24 On-line CPU(s) list: 0-23 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz Stepping: 4 CPU MHz: 1200.826 BogoMIPS: 6800.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 19712K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke flush_l1d # free -h total used free shared buff/cache available Mem: 62Gi 704Mi 51Gi 18Mi 10Gi 61Gi Swap: 31Gi 0B 31Gi Steps: In 'qemu-kvm-2.12.0-68.module+el8.1.0+3138+c9bec3af': Write: qemu-io --object secret,id=secret0,data="redhat" -c 'write 0 1.9G' --image-opts driver=luks,file.filename=data.luks,key-secret=secret0 wrote 2040109465/2040109465 bytes at offset 0 1.900 GiB, 1 ops; 0:00:14.06 (138.288 MiB/sec and 0.0711 ops/sec) Read: qemu-io --object secret,id=secret0,data="redhat" -c 'read 0 1.9G' --image-opts driver=luks,file.filename=data.luks,key-secret=secret0 read 2040109465/2040109465 bytes at offset 0 1.900 GiB, 1 ops; 0:00:12.06 (161.318 MiB/sec and 0.0829 ops/sec) In 'qemu-kvm-2.12.0-71.module+el8.1.0+3170+c76f9235': Write: qemu-io --object secret,id=secret0,data="redhat" -c 'write 0 1.9G' --image-opts driver=luks,file.filename=data.luks,key-secret=secret0 wrote 2040109465/2040109465 bytes at offset 0 1.900 GiB, 1 ops; 0:00:11.17 (174.160 MiB/sec and 0.0895 ops/sec) Read: qemu-io --object secret,id=secret0,data="redhat" -c 'read 0 1.9G' --image-opts driver=luks,file.filename=data.luks,key-secret=secret0 read 2040109465/2040109465 bytes at offset 0 1.900 GiB, 1 ops; 0:00:09.79 (198.676 MiB/sec and 0.1021 ops/sec) Note that we're comparing different things. When I wrote "On my dev machine this increases perf from 327.41 MB/s to 635.68 MB/s with AES-128 XTS" this is referring to a microbenchmark of the crypto code run by tests/benchmark-crypto-cipher. ie this is the maximum possible AES-128 + XTS performance in a tight loop with nothing else running. The numbers you're reporting 174.160 MiB/sec / 198.676 MiB/sec from qemu-io are doing the combination of encryption and disk I/O, so will always be lower than what I reported, as some portion of CPU time is spent on the I/O itself. You still saw a performance improvement, which is good as it shows the patch is working. Obviously we still need further work in future to improve perf more. (In reply to Daniel Berrange from comment #15) > Note that we're comparing different things. When I wrote > > "On my dev machine this increases perf from 327.41 MB/s to 635.68 MB/s > with AES-128 XTS" > > this is referring to a microbenchmark of the crypto code run by > tests/benchmark-crypto-cipher. ie this is the maximum possible AES-128 + XTS > performance in a tight loop with nothing else running. > > The numbers you're reporting 174.160 MiB/sec / 198.676 MiB/sec from qemu-io > are doing the combination of encryption and disk I/O, so will always be > lower than what I reported, as some portion of CPU time is spent on the I/O > itself. > > You still saw a performance improvement, which is good as it shows the patch > is working. Obviously we still need further work in future to improve perf > more. Thanks for you info first. And I would like to confirm the next status of this bug. Does we set it as verified then, and report new bug to trace future improvement. Or re-assign it, and trace the future improvement in this bug? You've demonstrated that the patches applied for this bug result in an improvement, so this is success from QE pov IMHO. Future performance improvements are tracked via this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1701948 Based on comment 14 and comment 17. Set this bug as verified. And thanks Daniel. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:3345 |