Bug 2025756
| Summary: | QAT2 firmware and module fail to load on boot | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Vilém Maršík <vmarsik> |
| Component: | kernel | Assignee: | Herbert Xu <herbert.xu> |
| kernel sub component: | Crypto | QA Contact: | Vilém Maršík <vmarsik> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | herbert.xu, vdronov, zhangxufang |
| Version: | 9.0 | Keywords: | Triaged |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-11-03 08:39:43 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Vilém Maršík
2021-11-23 00:44:54 UTC
Presumably the first load is occurring in an environment where the firmware file is not available (dracut perhaps)? Could you obtain more information about the system environment during the first load so we can redirect this bug? Thanks! Sorry, confusing description. What is happening here: 1. install latest RHEL9 on an Eaglestream machine 2. add "intel_iommu=on" to kernel cmdline and install qat_4xxx_mmp.bin and qat_4xxx.bin to /lib/firmware (e.g. by running kernel-kernel-crypto-qat2-intel test) 3. reboot 4. now the module fails 5. do the manual rmmod & modprobe 6. now the module succeeds, you can do "systemctl start qat" and use the HW After rebuilding initrd with "dracut --force" with modules installed, QAT2 boots fine again. This step was not necessary with early RHEL9 versions, and is not necessary with current RHEL8 versions either. As of RHEL-8.5.0-20211123.d.3 default installation with FW copied to /lib/firmware, where the driver works after boot, neither qat2 FW nor module exist in the initrd: # lsinitrd initramfs-4.18.0-326.el8.kpq1.x86_64.img | grep qat # dmesg | grep qat [ 38.104010] 4xxx 0000:6b:00.0: qat_dev0 started 9 acceleration engines (...) # lsmod | grep qat qat_4xxx 16384 0 intel_qat 151552 1 qat_4xxx dh_generic 16384 1 intel_qat Wondering how the module/FW gets loaded that early in the boot process without being present in the initrd image. I still think this is an initrd problem, and not an issue with the kernel. A possible theory is that the real root filesystem was mounted, but not yet pivoted to and subsequently the module was loaded but when the kernel went to look for the firmware because the real root isn't in place yet then it can't be found. It's a completely wild guess but based on the current information I can't see how the kernel can be responsible for this bug. Oh I didn't parse the RHEL8 part. That makes sense because the module would only be loaded after the real root is mounted so that's why the firmware is guaranteed to be available. Did you take a look inside the RHEL9 initrd? Perhaps the module is present in the initrd without the firmware. QAT module / FW not available in distribution initrd, for both default RHEL8 and RHEL9. The FW loads in RHEL8 and early RHEL9 during boot, but not in current RHEL9, unless you rebuild initrd after manually installing the FW to /lib/firmware. Could not find the exact point when root pivots in dmesg. Was there a recent change that could have caused this? Can you attach the complete RHEL9 boot log? Thanks. My QAT2 machine became out-of-service, will generate a boot log with errors, when I have it fixed again. Machine back online, this is what I am getting now on intel-eaglestream-spr-06. Any ideas for cause / workaround? --- RHEL-9.0.0-20220109.d.3 - fresh installation first: # lsmod | grep qat qat_4xxx 16384 0 intel_qat 176128 1 qat_4xxx # ls /lib/firmware/qat_4* ls: cannot access '/lib/firmware/qat_4*': No such file or directory # dmesg | less (...) [ 42.671184] 4xxx 0000:6b:00.0: enabling device (0140 -> 0142) [ 42.690795] QAT: AE0 is inactive!! [ 42.694702] QAT: failed to get device out of reset [ 42.700168] 4xxx 0000:6b:00.0: qat_hal_clr_reset error [ 42.706007] 4xxx 0000:6b:00.0: Failed to init the AEs [ 42.711747] 4xxx 0000:6b:00.0: Failed to initialise Acceleration Engine [ 42.720158] 4xxx 0000:6b:00.0: Resetting device qat_dev0 (...) [ 42.903987] QAT: AE0 is inactive!! [ 42.903990] QAT: failed to get device out of reset [ 42.903991] 4xxx 0000:70:00.0: qat_hal_clr_reset error [ 42.903993] 4xxx 0000:70:00.0: Failed to init the AEs [ 42.903994] 4xxx 0000:70:00.0: Failed to initialise Acceleration Engine [ 42.904819] 4xxx 0000:70:00.0: Resetting device qat_dev0 [ 42.969697] igb: Intel(R) Gigabit Ethernet Network Driver [ 42.975841] igb: Copyright (c) 2007-2014 Intel Corporation. [ 43.009462] 4xxx: probe of 0000:70:00.0 failed with error -14 [ 43.027565] 4xxx 0000:75:00.0: enabling device (0140 -> 0142) (...) Now let's install Intel's firmware: # grep -i stepping /proc/cpuinfo | head -n1 stepping : 3 (added "intel_iommu=on" to kernel cmdline) # cp qat_4xxx_mmp.bin /lib/firmware # cp c_stepping/qat_4xxx.bin /lib/firmware/ # dracut -f # rhts-reboot (...) # lsmod | grep qat qat_4xxx 16384 0 intel_qat 176128 1 qat_4xxx # ls /lib/firmware/qat_4* /lib/firmware/qat_4xxx.bin /lib/firmware/qat_4xxx_mmp.bin # dmesg | less (...) [ 47.342780] 4xxx 0000:6b:00.0: enabling device (0140 -> 0142) [ 47.343640] libata version 3.00 loaded. [ 47.357649] QAT: AE0 is inactive!! [ 47.361558] QAT: failed to get device out of reset [ 47.367012] 4xxx 0000:6b:00.0: qat_hal_clr_reset error [ 47.372849] 4xxx 0000:6b:00.0: Failed to init the AEs [ 47.378586] 4xxx 0000:6b:00.0: Failed to initialise Acceleration Engine [ 47.386829] 4xxx 0000:6b:00.0: Resetting device qat_dev0 [ 47.446730] ahci 0000:00:17.0: version 3.0 [ 47.447315] ahci 0000:00:17.0: AHCI 0001.0301 32 slots 8 ports 6 Gbps 0xff impl SATA mode [ 47.456566] ahci 0000:00:17.0: flags: 64bit ncq sntf pm clo only pio slum part ems deso sadm sds [ 47.469572] igb: Intel(R) Gigabit Ethernet Network Driver [ 47.475723] igb: Copyright (c) 2007-2014 Intel Corporation. [ 47.496318] scsi host0: ahci [ 47.499838] 4xxx: probe of 0000:6b:00.0 failed with error -14 [ 47.502064] scsi host1: ahci [ 47.509852] 4xxx 0000:70:00.0: enabling device (0140 -> 0142) [ 47.524035] QAT: AE0 is inactive!! [ 47.527944] QAT: failed to get device out of reset [ 47.533412] 4xxx 0000:70:00.0: qat_hal_clr_reset error [ 47.539265] 4xxx 0000:70:00.0: Failed to init the AEs [ 47.545005] 4xxx 0000:70:00.0: Failed to initialise Acceleration Engine [ 47.572495] 4xxx 0000:70:00.0: Resetting device qat_dev0 (...) [ 47.764844] ata8: SATA max UDMA/133 abar m2048@0xa5203000 port 0xa5203480 irq 88 [ 47.765049] 4xxx: probe of 0000:70:00.0 failed with error -14 [ 47.779982] 4xxx 0000:75:00.0: enabling device (0140 -> 0142) [ 47.795113] QAT: AE0 is inactive!! [ 47.795117] QAT: failed to get device out of reset [ 47.795119] 4xxx 0000:75:00.0: qat_hal_clr_reset error [ 47.795121] 4xxx 0000:75:00.0: Failed to init the AEs [ 47.795121] 4xxx 0000:75:00.0: Failed to initialise Acceleration Engine [ 47.795502] igb 0000:01:00.0 enp1s0: renamed from eth0 [ 47.795895] 4xxx 0000:75:00.0: Resetting device qat_dev0 [ 47.816849] igb 0000:a8:00.0 ens3: renamed from eth1 [ 47.898661] 4xxx: probe of 0000:75:00.0 failed with error -14 [ 47.905301] 4xxx 0000:7a:00.0: enabling device (0140 -> 0142) [ 47.919718] QAT: AE0 is inactive!! FYI, Intel thinks that this issue might be fixed by commit ca605f97da. Error messages changed, did we hit another issue? Okay, this is a known issue already fixed in a MR build. Will re-check with the fixed kernel once my RHEL8 testing finishes. Vilém, I'm cancelling the needinfo for now. Please set it again once your testing has finished. Thanks. This one got through my filters somehow. Going to test now.. Hello We find RHEL8.6 has the same issue I wonder which version will fix it? Thanks *** This bug has been marked as a duplicate of bug 2139439 *** |