Bug 1522962 - dm-crypt fails to load on an Atom system: encrypt test failed
Summary: dm-crypt fails to load on an Atom system: encrypt test failed
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 29
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-06 20:12 UTC by Georg Sauthoff
Modified: 2020-12-01 11:53 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-09 17:52:30 UTC
Type: Bug
Embargoed:
fedora: needinfo-


Attachments (Terms of Use)

Description Georg Sauthoff 2017-12-06 20:12:57 UTC
Description of problem:
Loading the kernel module fails. Fails when trying to load it explicitly via modprobe and also fails when executing `cryptsetup luksOpen`.

Version-Release number of selected component (if applicable):
kernel-4.13.16-302.fc27.x86_64

How reproducible: always


Steps to Reproduce:
1. modprobe dm-crypt
2.
3.

Actual results:
modprobe: ERROR: could not insert 'dm_crypt': Key was rejected by service

dmesg messages:
[Dec 6 21:09] alg: akcipher: encrypt test failed. err -22
[  +0.006056] alg: akcipher: test 1 failed for pkcs1pad(qat-rsa,sha256), err=-22


Expected results: module is successfully loaded without errors


Additional info:
It's a Supermicro system featuring a Intel(R) Atom(TM) CPU C3758.

Comment 1 Georg Sauthoff 2017-12-06 20:30:04 UTC
Running `cryptsetup benchmark` triggers some more 'encrypt test failed' messages and yields several 'N/A' results:

#     Algorithm | Key |  Encryption |  Decryption
        aes-cbc   128b   375.3 MiB/s   375.9 MiB/s
    serpent-cbc   128b           N/A           N/A
    twofish-cbc   128b           N/A           N/A
        aes-cbc   256b   291.6 MiB/s   298.4 MiB/s
    serpent-cbc   256b           N/A           N/A
    twofish-cbc   256b           N/A           N/A
        aes-xts   256b   416.8 MiB/s   381.8 MiB/s
    serpent-xts   256b           N/A           N/A
    twofish-xts   256b           N/A           N/A
        aes-xts   512b   327.3 MiB/s   327.8 MiB/s
    serpent-xts   512b           N/A           N/A
    twofish-xts   512b           N/A           N/A

I did run the `cryptsetup benchmark` yesterday from the Fedora 27 Live CD and there the output didn't contain any 'N/A' values.

Thus, this looks like a regression from the kernel version that is used on the Live CD.

PS: The luksOpen fails like this:

cryptsetup luksOpen /dev/sda backup
Enter passphrase for /dev/sda: 
device-mapper: reload ioctl on backup failed: Invalid argument

Comment 2 Georg Sauthoff 2017-12-07 09:32:10 UTC
I've booted the Fedora 27 Live CD again and the kernel is:

kernel-4.13.9-300.fc27.x86-64

In that environment, besides returning plausible values for all combinations, `cryptsetup benchmark` also reports much better rates for AES. For example:

aes-xts    256b    888.1 MiB/s    896.1 MiB/s

Executing cryptsetup doesn't trigger the load of the dm-crypt kernel module, but explicitly calling `modprobe dm-crypt` works successfully, as expected.

Comment 3 Georg Sauthoff 2017-12-08 09:27:16 UTC
Perhaps this is relevant, as well:

When running `cryptsetup benchmark` under kernel-4.13.9-300.fc27.x86_64 the kernel logs the following message 3 times in a row:

CPU feature 'AVX registers' is not supported.

(grep'ing for 'avx' in /proc/cpuinfo yields zero results - which looks plausible since it's an Intel Atom C3758 CPU. /proc/cpuinfo has the 'aes' flag, though.)

Comment 4 Georg Sauthoff 2017-12-11 10:16:24 UTC
With kernel 4.14.3-300.fc27.x86_64 I also see the same issues as with 4.13.16-302.fc27.x86_64 (dm-crypt module can't be loaded; encrypt test failed messages).


Another curious thing: booting the 4.13.9-300.fc27.x86-64 kernel on the installed Fedora 27 system yields different results in contrast to the Fedora Live-CD environment:

1st try: dm-crypt load fails (modprobe: ERROR: could not insert 'dm_crypt': Key was rejected by service) and `cryptsetup benchmark` reports N/A as with the 4.13.16

2nd try - after a reboot: dm-crypt can be loaded - but the `cryptsetup benchmark` results are worse than when running from the Live-CD:

aes-xts   256b   374.0 MiB/s   375.5 MiB/s

Comment 5 Georg Sauthoff 2017-12-12 21:33:10 UTC
Ok, I think I've identified the root cause for this issue: the Intel QuickAssist (QAT) [1] crypto driver.

Comparing the F27 Live-CD environment with the F27 installation I've noticed that the Live-CD fails to load the qat_c3xxx kernel module while it is loaded when booting the installed system.

The Live-CD boot log contains the following messages:

c3xxx 0000:01:00.0: enabling device (0140 -> 0142)
c3xxx 0000:01:00.0: Direct firmware load for qat_c3xxx_mmp.bin failed with error -2
c3xxx 0000:01:00.0: Failed to load MMP firmware qat_c3xxx_mmp.bin
c3xxx 0000:01:00.0: Failed to load acceleration FW
c3xxx 0000:01:00.0: Resetting device qat_dev0
c3xxx: probe of 0000:01:00.0 failed with error -14

The Live-CD system even has the firmware image available under /lib/firmware/qat_c3xxx_mmp.bin. Perhaps it isn't available during early boot, though. Or something like that. The md5sum of the firmware image is the same as on the installed system:

md5sum /lib/firmware/qat_c3xxx_mmp.bin
fb7deea913d87aed7676222269a593e6  /lib/firmware/qat_c3xxx_mmp.bin

Thus, `lsmod` doesn't include the qat modules.

After booting the installed F27 system, there are no such error messages, and the qat kernel modules are loaded:

lsmod | grep qat
qat_c3xxx              16384  1
intel_qat             135168  2 qat_c3xxx
authenc                16384  1 intel_qat

And the log just has this line:

c3xxx 0000:01:00.0: qat_dev0 started 6 acceleration engines

Thus, my current workaround is to blacklist the qat modules. When the modules are blacklisted `modprobe dm-crypt` and `cryptsetup benchmark` work as expected. Even with the current kernel version 4.14.3-300.fc27.x86_64. The `cryptsetup benchmark` results are as high as on the Live-CD, e.g.:

aes-xts   256b   883.8 MiB/s   885.2 MiB/s

Considering the above, this looks like a bug in the qat_c3xxx module. 

[1]: cf. e.g. http://dpdk.org/doc/guides/cryptodevs/qat.html for some QAT context

Comment 6 Laura Abbott 2017-12-12 21:53:47 UTC
That's odd, that driver hasn't been touched in over a year and Fedora has had it on for just as long. The firmware also hasn't been touched. I wonder if it's always been broken for this hardware?

This is best reported as an e-mail to the crypto devs

$ scripts/get_maintainer.pl -f drivers/crypto/qat/qat_c3xxx/
Giovanni Cabiddu <giovanni.cabiddu> (supporter:QAT DRIVER,commit_signer:1/1=100%)
Salvatore Benedetto <salvatore.benedetto> (supporter:QAT DRIVER)
Herbert Xu <herbert.org.au> (maintainer:CRYPTO API,commit_signer:1/1=100%)
"David S. Miller" <davem> (maintainer:CRYPTO API)
Pablo Marcos Oltra <pablo.marcos.oltra> (commit_signer:1/1=100%,authored:1/1=100%)
qat-linux (open list:QAT DRIVER)
linux-crypto.org (open list:CRYPTO API)
linux-kernel.org (open list)

Go ahead and e-mail that entire set.

Good find!

Comment 7 Georg Sauthoff 2017-12-13 20:51:02 UTC
I've sent an email to the listed recipients:

https://marc.info/?l=linux-crypto-vger&m=151319386420394&w=2

(Pablo Marcos Oltra's email bounces: 550 #5.1.0 Address rejected. (in reply to RCPT TO command))

Perhaps the driver is just broken for some hardware - e.g. the Intel Atom C3758 CPU I've been using it a relatively recent model - it was released in Q3 20017. [1]


[1]: https://ark.intel.com/products/97926/Intel-Atom-Processor-C3758-16M-Cache-up-to-2_20-GHz

Comment 8 Laura Abbott 2018-02-20 19:52:29 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  As kernel maintainers, we try to keep up with bugzilla but due the rate at which the upstream kernel project moves, bugs may be fixed without any indication to us. Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs.
 
Fedora 27 has now been rebased to 4.15.3-300.f27.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you experience different issues, please open a new bug report for those.

Comment 9 Georg Sauthoff 2018-02-24 15:20:17 UTC
I updated the system to 4.15.4-300.fc27.x86_64 [1] and I still can reproduce this issue:

modprobe dm-crypt
modprobe: ERROR: could not insert 'dm_crypt': Key was rejected by service

and

Feb 24 16:13:35 example.org kernel: alg: akcipher: encrypt test failed. err -22
Feb 24 16:13:35 example.org kernel: alg: akcipher: test 1 failed for pkcs1pad(qat-rsa,sha256), err=-22

(i.e. same results, as previously)

The `dnf update` commands didn't update the QAT firmware, it's still:

md5sum /lib/firmware/qat_c3xxx_mmp.bin
fb7deea913d87aed7676222269a593e6  /lib/firmware/qat_c3xxx_mmp.bin

Also, I haven't received any reply to my kernel mailinglist posting regarding this issue, so far:

https://marc.info/?l=linux-kernel&m=151319388020395&w=2

[1]: I actually had some issues booting the new kernel as the grub.cfg wasn't updated by the rpm package install - see also # 1548729 for details.

Comment 10 Sameer Dhar 2018-03-14 13:17:24 UTC
I'm having the same issue using an Intel Atom C3758 processor. Any fix?

Comment 11 Justin M. Forbes 2018-07-23 15:01:27 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs.

Fedora 27 has now been rebased to 4.17.7-100.fc27.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 28, and are still experiencing this issue, please change the version to Fedora 28.

If you experience different issues, please open a new bug report for those.

Comment 12 Georg Sauthoff 2018-07-27 07:50:29 UTC
Because of Bug 1572872 (which breaks networking and thus remote on such Atom boards) I can't execute the above steps on kernel-4.17.7-100.fc27.x86_64.

Comment 13 Georg Sauthoff 2018-11-17 13:20:42 UTC
Ok, I still run into this issue under Fedora 29, 4.18.17-300.fx29.x86_64.

Here it fails in the worst possible way:

After switch-root, a cryptsetup-encrypted root filesystem is unlocked and mounted, but then system processes hang forever on writes/sync thus the boot hangs forever.

That means, systemd basically runs into an infinite loop, where it waits on systemd-tmpfiles-setup to finish, then journald hangs, systemd tries to kill/restart it, which doesn't work, systemd waits, tries again to kill journald and so on. Other stuff like sshd is also hanging.

In the systemd debug shell, because journald is dis-functional, you don't get any output from journalctl/dmesg.


Thus, it fails in a worse way than before.


I can work-around this issue by disabling the QAT hardware in the BIOS. Then the system boots without issues.

What should also work: blacklist the relevant QAT kernel module via a kernel parameter.

Comment 14 Jeremy Cline 2018-12-03 17:29:57 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.
 
Fedora 29 has now been rebased to 4.19.5-300.fc29.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you experience different issues, please open a new bug report for those.

Comment 15 Laura Abbott 2019-04-09 17:52:30 UTC
There wasn't a follow up from the previous needinfo so I'm going to close this for now. Please test on the newest kernel version and re-open if there's still a problem.

Comment 16 Georg Sauthoff 2019-11-13 19:30:34 UTC
I'm clearing the needinfo flag because I think that I've already provided more than enough info.

I reproduced this issues on 3 Fedora releases which didn't spark any interest on Redhat's/Intel's side with respect to fixing this issue.

Since I don't really need this QAT feature I simply disabled it in the BIOS at some point and I don't waste any more time on this issue.

Comment 17 Vladis Dronov 2020-12-01 11:53:19 UTC
just a follow up for the reference.

intel_qat driver use was disabled for a disk encryption. a crypto upstream
patchset introduced a flag inheritance and the CRYPTO_ALG_ALLOCATES_MEMORY
flag and set this flag for the intel_qat (since v5.9-rc1):

7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags
2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS
fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY
b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY

dm-crypt, in turn, has stopped to use crypto drivers with this flag (since v5.10-rc1):

cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY


Note You need to log in before you can comment on or make changes to this bug.