Bug 1996854
Summary: | dm crypt: Avoid percpu_counter spinlock contention in crypt_page_alloc() | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Mike Snitzer <msnitzer> |
Component: | kernel | Assignee: | Mike Snitzer <msnitzer> |
kernel sub component: | Crypt | QA Contact: | guazhang <guazhang> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | unspecified | ||
Priority: | unspecified | CC: | agk, brdeoliv, guazhang, msnitzer, okozina |
Version: | 8.5 | Keywords: | Triaged |
Target Milestone: | rc | ||
Target Release: | 8.5 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | kernel-4.18.0-339.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-11-09 19:27:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | 5.15 |
Deadline: | 2021-08-30 |
Description
Mike Snitzer
2021-08-23 20:05:36 UTC
Hi, Could you provide more detail test steps to reproduce the bug? (In reply to guazhang from comment #4) > Hi, > > Could you provide more detail test steps to reproduce the bug? This is a scalability performance bug. The more IO you issue to a dm-crypt device, particularly on a very large SMP/NUMA machine, the more spinlock contention will hurt performance. So to test: throw as much IO at a fast dm-crypt device as you can on a large system with a lot of cpus. You'd do well to have the dm-crypt device be layered on fast storage that handles seeks easily (e.g. ssd, nvme, ramdisk, etc). Then just increase the thread count on your synthetic IO load generator (e.g. fio). The kernel with the fix should consumer less CPU time and yield better IO performance. Hi, I test the stress-ng in VM and found the test kernel %sys utilization higher than default kernel. stress-ng --cpu 4 --io 32 --vm 4 --vm-bytes 128M --fork 4 --timeout 120s --hdd 8 [root@test1 test8]# stress-ng --cpu 4 --io 32 --vm 4 --vm-bytes 128M --fork 4 --timeout 120s --hdd 8 stress-ng: info: [250380] dispatching hogs: 4 cpu, 32 io, 4 vm, 4 fork, 8 hdd stress-ng: info: [250380] successful run completed in 120.39s (2 mins, 0.39 secs) [root@test1 test8]# pwd /home/test8 [root@test1 test8]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 500M 0 loop `-test1 253:0 0 484M 0 crypt /home/test1 loop1 7:1 0 500M 0 loop `-test2 253:1 0 484M 0 crypt /home/test2 loop2 7:2 0 500M 0 loop `-test3 253:2 0 484M 0 crypt /home/test3 loop3 7:3 0 500M 0 loop `-test4 253:3 0 484M 0 crypt /home/test4 loop4 7:4 0 500M 0 loop `-test5 253:4 0 484M 0 crypt /home/test5 loop5 7:5 0 500M 0 loop `-test6 253:5 0 484M 0 crypt /home/test6 loop6 7:6 0 500M 0 loop `-test7 253:6 0 484M 0 crypt /home/test7 loop7 7:7 0 500M 0 loop `-test8 253:7 0 484M 0 crypt /home/test8 sr0 11:0 1 1024M 0 rom vda 252:0 0 27G 0 disk |-vda1 252:1 0 1M 0 part |-vda2 252:2 0 100M 0 part /boot/efi `-vda3 252:3 0 26.9G 0 part / default kernel kernel-4.18.0-335.el8.x86_64 Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 60.15 0.00 24.52 0.28 0.00 1.94 0.01 0.00 0.00 13.11 Average: 0 58.14 0.00 26.62 0.21 0.00 1.84 0.01 0.00 0.00 13.18 Average: 1 55.77 0.00 28.67 0.36 0.00 2.30 0.01 0.00 0.00 12.89 Average: 2 70.47 0.00 15.02 0.21 0.00 1.15 0.01 0.00 0.00 13.14 Average: 3 56.20 0.00 27.75 0.33 0.00 2.47 0.01 0.00 0.00 13.23 kernel-4.18.0-334.el8.mr1218_210823_2059.x86_64 Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 66.32 0.00 26.40 0.86 0.00 2.35 0.01 0.00 0.00 4.06 Average: 0 67.51 0.00 26.48 0.73 0.00 1.29 0.00 0.00 0.00 3.99 Average: 1 67.69 0.00 26.05 0.81 0.00 1.42 0.01 0.00 0.00 4.03 Average: 2 67.25 0.00 26.18 0.59 0.00 1.90 0.01 0.00 0.00 4.08 Average: 3 62.85 0.00 26.91 1.31 0.00 4.78 0.02 0.00 0.00 4.14 Why are you constraining the number of cpus used? The performance issue is function of cpu count due to spinlock contention. Also, your test device may not be fast enough to expose the spinlock contention as a bottleneck (tmpfs files are used in comment#0.. which are backed by ram). Hi mkdir /mnt/ramdisk mount -t tmpfs -o size=512m tmpfs /mnt/ramdisk dd if=/dev/zero of=/mnt/ramdisk/file bs=1M count=500 echo 'redhat redhat'|cryptsetup -q luksFormat /mnt/ramdisk/file echo 'redhat redhat'|cryptsetup -q open /mnt/ramdisk/file test1 mkfs.ext4 /dev/mapper/test1 mkdir /home/test1 mount /dev/mapper/test1 /home/test1 lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 500M 0 loop `-test1 253:3 0 484M 0 crypt /home/test1 4.18.0-334.el8.mr1218_210823_2059.x86_64 cd /home/test1 stress-ng --io 32 --vm 4 --vm-bytes 128M --fork 4 --timeout 120s --hdd 8 Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 13.01 0.00 29.84 3.69 0.52 1.41 0.00 0.00 0.00 51.53 Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 13.16 0.00 31.47 3.54 0.50 1.33 0.00 0.00 0.00 50.00 Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 13.01 0.00 31.95 3.51 0.51 1.35 0.00 0.00 0.00 49.67 fio --filename=fio_test_file --size=400M --direct=1 --rw=write --bs=4k --ioengine=libaio --iodepth=64 --runtime=120 --time_based --group_reporting --name=throughput-test-job --eta-newline=1 3 time fio test result write: IOPS=185k, BW=724MiB/s (759MB/s)(84.8GiB/120001msec); 0 zone resets write: IOPS=183k, BW=716MiB/s (750MB/s)(83.9GiB/120001msec); 0 zone resets write: IOPS=184k, BW=719MiB/s (754MB/s)(84.2GiB/120001msec); 0 zone resets 4.18.0-335.el8.x86_64 stress-ng --io 32 --vm 4 --vm-bytes 128M --fork 4 --timeout 120s --hdd 8 Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 13.95 0.00 35.25 3.25 0.64 1.65 0.00 0.00 0.00 45.26 Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 12.94 0.00 34.75 2.69 0.54 1.23 0.00 0.00 0.00 47.85 Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 13.13 0.00 34.47 2.94 0.56 1.26 0.00 0.00 0.00 47.65 fio --filename=fio_test_file --size=400M --direct=1 --rw=write --bs=4k --ioengine=libaio --iodepth=64 --runtime=120 --time_based --group_reporting --name=throughput-test-job --eta-newline=1 write: IOPS=177k, BW=691MiB/s (725MB/s)(81.0GiB/120001msec); 0 zone resets write: IOPS=173k, BW=677MiB/s (710MB/s)(79.3GiB/120001msec); 0 zone resets write: IOPS=174k, BW=679MiB/s (712MB/s)(79.6GiB/120001msec); 0 zone resets please help to check the result if expected ? (In reply to guazhang from comment #10) > > please help to check the result if expected ? Yes, while the workload is still somewhat limited, it does start to show the benefits of the fix with this revised testing. Thanks. Hi, Please update Fixed In Version HI I did code review on kernel-4.18.0-339.el8.x86_64.rpm, After check, all modified can be found in source, and not found new regression either. move to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4356 |