Description of problem: Disk IO significantly reduced on AMD system when accessing disks via dm-crypt/LUKS. Version-Release number of selected component (if applicable): All Kernel versions beginning with, and including, 3.10.0-327.el7 through 3.10.0-514.10.2.el7. All other software is up to date. Changing only Kernel to v3.10.0-229.el7.x86_64 results in dramatic increase of disk IO. How reproducible: 100% Affects multiple drives on system, two different models of SATA drives. Steps to Reproduce: 1. mount encrypted partitions via cryptsetup 2. hdparm -t /dev/sdX1 /dev/mapper/enc Actual results: [root@alpha ~]# uname -r 3.10.0-514.10.2.el7.x86_64 [root@alpha ~]# hdparm -t /dev/sdc1 /dev/mapper/enc /dev/sdc1: Timing buffered disk reads: 496 MB in 3.00 seconds = 165.19 MB/sec /dev/mapper/enc: Timing buffered disk reads: 278 MB in 3.01 seconds = 92.43 MB/sec Expected results: [root@alpha ~]# uname -r 3.10.0-229.el7.x86_64 SATA Disks: [root@alpha ~]# hdparm -t /dev/sda1 /dev/mapper/enc /dev/sda1: Timing buffered disk reads: 496 MB in 3.01 seconds = 164.92 MB/sec /dev/mapper/enc: Timing buffered disk reads: 472 MB in 3.01 seconds = 156.99 MB/sec Additional info: Filesystem in use is ext4. No LVMs are in use, all partitions are on bare metal. CPU Info: vendor_id : AuthenticAMD cpu family : 16 model : 4 model name : AMD Phenom(tm) II X4 955 Processor stepping : 2 microcode : 0x10000db [root@alpha ~]# cryptsetup benchmark # Tests are approximate using memory only (no storage IO). PBKDF2-sha1 174066 iterations per second for 256-bit key PBKDF2-sha256 209380 iterations per second for 256-bit key PBKDF2-sha512 142935 iterations per second for 256-bit key PBKDF2-ripemd160 141393 iterations per second for 256-bit key PBKDF2-whirlpool 142935 iterations per second for 256-bit key # Algorithm | Key | Encryption | Decryption aes-cbc 128b 201.3 MiB/s 225.2 MiB/s serpent-cbc 128b 76.3 MiB/s 212.3 MiB/s twofish-cbc 128b 175.9 MiB/s 232.8 MiB/s aes-cbc 256b 159.0 MiB/s 172.2 MiB/s serpent-cbc 256b 85.5 MiB/s 212.7 MiB/s twofish-cbc 256b 187.7 MiB/s 232.5 MiB/s aes-xts 256b 212.8 MiB/s 215.2 MiB/s serpent-xts 256b 172.6 MiB/s 198.3 MiB/s twofish-xts 256b 212.3 MiB/s 211.7 MiB/s aes-xts 512b 164.4 MiB/s 166.4 MiB/s serpent-xts 512b 195.0 MiB/s 198.0 MiB/s twofish-xts 512b 212.7 MiB/s 211.3 MiB/s (Benchmark is comparable under all kernel versions tested.) This problem is not exhibited on a similar system with Intel based processor in similar configuration. vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Pentium(R) Dual-Core CPU E5300 @ 2.60GHz stepping : 10 microcode : 0xa0b
Verified problem still exists on latest kernel: 3.10.0-514.16.1.el7.x86_64 /dev/sda1: Timing buffered disk reads: 532 MB in 3.01 seconds = 176.94 MB/sec /dev/mapper/enc: Timing buffered disk reads: 286 MB in 3.02 seconds = 94.85 MB/sec
I'd start with question why the CPU perform so bad in cryptsetup benchmark. It seems the CPU gave very bad results in synthetic tests. Can it do better with older kernels?
Please compare lsmod from both kernels while some dm-crypt device is active (need so check use count). It is possible that some accelerated crypto module is not used while it should be. It can happen if initramfs loads only generic implementation or so.
Created attachment 1274068 [details] Output of lsmod from system Alpha with encrypted drive mounted and operating normally Per request - lsmod output of system Alpha with encrypted drive mounted, performing as expected
Created attachment 1274069 [details] Output of lsmod from system Alpha with encrypted drive mounted and operating slowly Per request - lsmod output of system Alpha with encrypted drive mounted, performing poorly.
Created attachment 1274070 [details] Output of lsmod from system Beta with encrypted drives mounted and operating normally. For comparison only. For comparison to lsmod output from Alpha operating slowly. The system that this was output from is showing no evidence of slowdown on current kernel while utilizing LUKS.
Ondrej - The cryptsetup benchmarks were similar under all the tested kernels I tried, so I only included it once for brevity. If you would like a full print out of both the fast/normal Kernel and the slow/"broken" kernel, I can include them. The numbers were within 1-3% between runs. Also, I have to ask, why do you consider it performing poorly? Just as an FYI, neither of the computers I am running have AES Extensions in the CPU. In fact, the system I am experiencing this issue with actually out performs my Beta system in the synthetic benchmarks--by about 30%, but hard disk IO through the Alpha system via dm-crypt/LUKS is poorer than the disk IO on Beta. I've been viewing this as, as long as synthetic IO of AES is greater than actual hardware IO of raw disk, there should not be an encryption bottleneck. If AES IO was worse than disk IO, I would not expect encrypted disk IO to exceed benchmark IO, and in fact to be a bit worse, demonstrating the bottleneck is the encryption layer. Milan - I've attached lsmod outputs from both 3.10.0-229(Normal) and 3.10.0-514.16(Current/Slow) from the troubled machine. (Listed by Kernel Version executed from.) I'm also including a lsmod output from the Intel based machine I mentioned that operates at "full speed" on the current kernel for comparisons. By observation there appear to be more modules running on the older kernel, specific to encryption, though I don't see anything mentioning AES specifically, nor do they exist on a secondary machine (Beta) which appears to be operating normally on the current kernel.
Yes, seems AES modules are compiled-in, I did not realize it, sorry. You can see which one is actually used from system log and also there are priorities in /proc/crypto. But if cryptsetup benchmark is similar on all kernels (that is quite important info!), I would not expect that there is some optimized module missing (systems do not have AES-NI but there are still SSE/x86_64 optimized variants).
(In reply to kookyman from comment #8) > Ondrej - The cryptsetup benchmarks were similar under all the tested kernels Bummer, I missed the note under results, sorry for that. No need to post results in details for cryptsetup benchmark though. Let's assume for starters that issue is not within crypto layer. What exact kernel versions you can perform tests on (or you have access to), only those two already mentioned? Could you test following 4 configurations on AMD system using old and current ('slow') kernel? Perform same hdparm test on both old and new kernel as you did earlier. First three tests are harmless and affect only low level performance tuning knobs in dm-crypt. 1) activate dm-crypt using cryptsetup utility as usual and add --perf-same_cpu_crypt option. 2) same as above, just add --perf-submit_from_crypt_cpus option 3) same as above, just add both perf options at the same time. The fourth one is a bit tricky but as long as you activate it in read-only mode, it's harmless to your data on /dev/sdX as well. You must not write any data using this dm-crypt table. Writes would destroy your ciphertext on /dev/sdX! 4) Activate dm-crypt mapping on top of your /dev/sdX manually using following commands: dmsetup create enc -r --table "0 $(blockdev --getsz /dev/sdX crypt cipher_null-ecb-null - 0 /dev/sdX 0" When you load null cipher we can measure pure dm-crypt overhead. With this 'cipher' we completely rule out crypto layer from the issue. This would help us to better localise the eventual performance drop in dm-crypt. Thank you.
(In reply to Ondrej Kozina from comment #10) > > dmsetup create enc -r --table "0 $(blockdev --getsz /dev/sdX crypt > cipher_null-ecb-null - 0 /dev/sdX 0" Typo! Missing ')'. It's: dmsetup create enc -r --table "0 $(blockdev --getsz /dev/sdX) crypt cipher_null-ecb-null - 0 /dev/sdX 0"
Ondrej - (For shorthand, replace 3.10.0 with "x" in version numbers below) With regards to kernels tested, I worked with the crew on #CentOS@Freenode and tested specifically against multiple kernels, starting with the previous kernel 3.10.0-514.10, then backing up using kernels from the vault. While I didn't try every kernel, I started with x-514.10.2, then x-327., x-229. x-229 operated normally, so I installed the last of the x-229 branch, x-229.20.1, to establish it was a change that occured between 7.1 and 7.2 release cycles, so appears to affect all kernels starting with x-327 moving forward--unless it was fixed in one, then regressed. I can try specific other kernels if needed, though would prefer not to have to install and test every single kernel release between x-327 and current. I've listed the full results from below, but the options don't exist on x-229. On x-514.16.1, --perf-same_cpu_crypt appears to "fix" the issue. The null cypher provides higher throughput than raw disk access. Results from Request: Using kernel x-514.16.1: 1) [root@alpha ~]# cryptsetup luksOpen /dev/sda1 enc --perf-same_cpu_crypt [root@alpha ~]# hdparm -t /dev/sda1 /dev/mapper/enc /dev/sda1: Timing buffered disk reads: 530 MB in 3.00 seconds = 176.58 MB/sec /dev/mapper/enc: Timing buffered disk reads: 510 MB in 3.01 seconds = 169.49 MB/sec 2) [root@alpha ~]# cryptsetup luksOpen /dev/sda1 enc --perf-submit_from_crypt_cpus [root@alpha ~]# hdparm -t /dev/sda1 /dev/mapper/enc /dev/sda1: Timing buffered disk reads: 532 MB in 3.01 seconds = 176.84 MB/sec /dev/mapper/enc: Timing buffered disk reads: 296 MB in 3.00 seconds = 98.61 MB/sec 3) [root@alpha ~]# cryptsetup luksOpen /dev/sda1 enc --perf-submit_from_crypt_cpus --perf-same_cpu_crypt [root@alpha ~]# hdparm -t /dev/sda1 /dev/mapper/enc /dev/sda1: Timing buffered disk reads: 520 MB in 3.00 seconds = 173.24 MB/sec /dev/mapper/enc: Timing buffered disk reads: 488 MB in 3.00 seconds = 162.62 MB/sec 4) [root@alpha ~]# dmsetup create enc -r --table "0 $(blockdev --getsz /dev/sda1) crypt cipher_null-ecb-null - 0 /dev/sda1 0" [root@alpha ~]# hdparm -t /dev/sda1 /dev/mapper/enc /dev/sda1: Timing buffered disk reads: 532 MB in 3.01 seconds = 176.81 MB/sec /dev/mapper/enc: Timing buffered disk reads: 536 MB in 3.01 seconds = 177.98 MB/sec -- Using kernel x-229.20.1: [root@alpha ~]# cryptsetup luksOpen /dev/sdc1 enc --key-file MasterTempKey --perf-same_cpu_crypt device-mapper: reload ioctl on enc failed: Invalid argument Requested dm-crypt performance options are not supported. 4) [root@alpha ~]# hdparm -t /dev/sdc1 /dev/mapper/enc /dev/sdc1: Timing buffered disk reads: 536 MB in 3.01 seconds = 178.13 MB/sec /dev/mapper/enc: Timing buffered disk reads: 536 MB in 3.01 seconds = 178.28 MB/sec
(In reply to kookyman from comment #12) > Ondrej - > (For shorthand, replace 3.10.0 with "x" in version numbers below) > > With regards to kernels tested, I worked with the crew on #CentOS@Freenode > and tested specifically against multiple kernels, starting with the previous > kernel 3.10.0-514.10, then backing up using kernels from the vault. While I > didn't try every kernel, I started with x-514.10.2, then x-327., x-229. > x-229 operated normally, so I installed the last of the x-229 branch, > x-229.20.1, to establish it was a change that occured between 7.1 and 7.2 > release cycles, so appears to affect all kernels starting with x-327 moving > forward--unless it was fixed in one, then regressed. I can try specific > other kernels if needed, though would prefer not to have to install and test > every single kernel release between x-327 and current. Definitely no need to test all kernels in between 7.1 and 7.2. Thank you for your great effort anyway! > > I've listed the full results from below, but the options don't exist on > x-229. On x-514.16.1, --perf-same_cpu_crypt appears to "fix" the issue. That's expected. In scope of RHEL7 parallel processing of de/encryption together with tuning options were introduced in 3.10.0-238. > The null cypher provides higher throughput than raw disk access. I'd humbly suggest it's a measurement error on a buffered read path. At the same time it says the dm-crypt overhead is not a trigger for the drop. > > Results from Request: > Using kernel x-514.16.1: > > 1) > [root@alpha ~]# cryptsetup luksOpen /dev/sda1 enc --perf-same_cpu_crypt > [root@alpha ~]# hdparm -t /dev/sda1 /dev/mapper/enc > > /dev/sda1: > Timing buffered disk reads: 530 MB in 3.00 seconds = 176.58 MB/sec > > /dev/mapper/enc: > Timing buffered disk reads: 510 MB in 3.01 seconds = 169.49 MB/sec This hints a possibility that parallel processing of encryption on all available cores within this particular CPU caused the performance drop. I'd suggest you to stick with the performance tuning option that helped. They were introduced for eventual anomalies like this.
I'll do that for now, and test occasionally with future kernel updates. I took the time today and did some quick testing with other kernels, and discovered that this appears to be exclusive to these kernels. While I know they are a different system, I tried some LiveCDs of Fedora that I have, and this slowdown doesn't appear, or at least nowhere near as bad, on any of those. (Yes, I know this is an Apples to Oranges comparison, but as I understand it, Fedora usually starts as a "Base" for a future RHEL major release, as I believe RHEL7 started from Fedora 19/20.) Fedora 20, Kernel 3.11.10-301.fc20, hdparm result: 160.76MB/s Fedora 21, Kernel 3.17.4-301.fc21, hdparm result: 163.12MB/s Fedora 22, Kernel 4.0.4-301.fc22, hdparm result: 146.26MB/s Fedora 23, Didn't Test Fedora 24, Didn't Test Fedora 25, Kernel 4.8.6-300.fc25, hdparm result: 149.76MB/s Fedora 22 and up does show a noticeable drop in throughput, but nowhere near the performance killing drop shown on the CentOS kernels. I didn't have Fedora 23 and 24 LiveCDs loaded on my USB stick to test with, but if you feel it would provide relevant information, I can do it. I was going to test Fedora 19, but forgot that there was a gcrypt bug that reacted negatively with key files, so I moved on to F20 for testing before realizing the mistake. Like F23 and F24, if you think it would provide something, I can go back and do it. Otherwise I'll hold out for future improvements (as the fact it works on newer kernels leads me to believe there is something better out there), and use the "--perf-same_cpu_crypt".
I recently discovered the 'tuned' feature, and after changing from 'balanced' to 'throughput-performance' HD-Parm is now showing full speed for data access. This may give another angle of something that changed between those kernels that are effected. I plan on updating to 7.4 in the next week or two, and will update this with results of if the tuning continues to affect the system. [root@alpha ~]# tuned-adm profile throughput-performance [root@alpha ~]# hdparm -t /dev/sdb /dev/mapper/rnd /dev/sdb: Timing buffered disk reads: 302 MB in 3.01 seconds = 100.19 MB/sec /dev/mapper/rnd: Timing buffered disk reads: 304 MB in 3.00 seconds = 101.25 MB/sec [root@alpha ~]# tuned-adm profile balanced [root@alpha ~]# hdparm -t /dev/sdb /dev/mapper/rnd /dev/sdb: Timing buffered disk reads: 306 MB in 3.01 seconds = 101.72 MB/sec /dev/mapper/rnd: Timing buffered disk reads: 230 MB in 3.00 seconds = 76.56 MB/sec
I suspect this is most probably tied directly to cpu governor which is in use for specific tuned profile. I guess balanced tuned profile run with 'ondemand' cpu governor whereas throughput-performance profile sticks with 'performance' governor.
This affects older AMD CPUs and there is a workaround. We really don't want to change the defaults. I'm closing this WONTFIX, but a release note (known issue) might be a possibility if people desire it.