RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2025756 - QAT2 firmware and module fail to load on boot
Summary: QAT2 firmware and module fail to load on boot
Keywords:
Status: CLOSED DUPLICATE of bug 2139439
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: 9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Herbert Xu
QA Contact: Vilém Maršík
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-23 00:44 UTC by Vilém Maršík
Modified: 2022-11-03 08:39 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-03 08:39:43 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-103601 0 None None None 2021-11-23 00:46:57 UTC

Description Vilém Maršík 2021-11-23 00:44:54 UTC
Description of problem:
Booting recent RHEL-9 with kernel-5.14.0-13.el9.x86_64 fails to load QAT2 firmware and module

Version-Release number of selected component (if applicable):
kernel-5.14.0-13.el9.x86_64

How reproducible:
100% (at least on intel-eaglestream-spr-06)

Steps to Reproduce:
1. install early RHEL9 with kernel-5.14.0-13.el9.x86_64
2. dmesg | grep qat
3.

Actual results:
[   41.508044] 4xxx 0000:6b:00.0: Direct firmware load for qat_4xxx_mmp.bin failed with error -2
[   41.517632] 4xxx 0000:6b:00.0: Failed to load MMP firmware qat_4xxx_mmp.bin
[   41.548231] 4xxx 0000:6b:00.0: Resetting device qat_dev0
(...)

Expected results:
[  169.603849] 4xxx 0000:6b:00.0: qat_dev0 started 9 acceleration engines
[  169.805851] 4xxx 0000:70:00.0: qat_dev1 started 9 acceleration engines
(...)

Additional info:
When manually re-inserting the module with
# rmmod qat_4xxx intel_qat
# modprobe qat_4xxx
then the device starts correctly.

Comment 1 Herbert Xu 2021-11-23 05:40:35 UTC
Presumably the first load is occurring in an environment where the firmware file is not available (dracut perhaps)?

Could you obtain more information about the system environment during the first load so we can redirect this bug?

Thanks!

Comment 2 Vilém Maršík 2021-11-23 11:32:15 UTC
Sorry, confusing description. What is happening here:
1. install latest RHEL9 on an Eaglestream machine
2. add "intel_iommu=on" to kernel cmdline and install qat_4xxx_mmp.bin and qat_4xxx.bin to /lib/firmware (e.g. by running kernel-kernel-crypto-qat2-intel test)
3. reboot
4. now the module fails
5. do the manual rmmod & modprobe
6. now the module succeeds, you can do "systemctl start qat" and use the HW

Comment 3 Vilém Maršík 2021-11-24 15:03:56 UTC
After rebuilding initrd with "dracut --force" with modules installed, QAT2 boots fine again. This step was not necessary with early RHEL9 versions, and is not necessary with current RHEL8 versions either.

Comment 4 Vilém Maršík 2021-11-25 15:46:48 UTC
As of RHEL-8.5.0-20211123.d.3 default installation with FW copied to /lib/firmware, where the driver works after boot, neither qat2 FW nor module exist in the initrd:

# lsinitrd initramfs-4.18.0-326.el8.kpq1.x86_64.img | grep qat
# dmesg | grep qat
[   38.104010] 4xxx 0000:6b:00.0: qat_dev0 started 9 acceleration engines
(...)
# lsmod | grep qat
qat_4xxx               16384  0
intel_qat             151552  1 qat_4xxx
dh_generic             16384  1 intel_qat

Wondering how the module/FW gets loaded that early in the boot process without being present in the initrd image.

Comment 5 Herbert Xu 2021-11-26 04:05:01 UTC
I still think this is an initrd problem, and not an issue with the kernel.  A possible theory is that the real root filesystem was mounted, but not yet pivoted to and subsequently the module was loaded but when the kernel went to look for the firmware because the real root isn't in place yet then it can't be found.

It's a completely wild guess but based on the current information I can't see how the kernel can be responsible for this bug.

Comment 6 Herbert Xu 2021-11-26 04:06:53 UTC
Oh I didn't parse the RHEL8 part.  That makes sense because the module would only be loaded after the real root is mounted so that's why the firmware is guaranteed to be available.

Did you take a look inside the RHEL9 initrd? Perhaps the module is present in the initrd without the firmware.

Comment 7 Vilém Maršík 2021-11-29 09:36:49 UTC
QAT module / FW not available in distribution initrd, for both default RHEL8 and RHEL9. The FW loads in RHEL8 and early RHEL9 during boot, but not in current RHEL9, unless you rebuild initrd after manually installing the FW to /lib/firmware. Could not find the exact point when root pivots in dmesg. Was there a recent change that could have caused this?

Comment 8 Herbert Xu 2021-11-30 03:03:01 UTC
Can you attach the complete RHEL9 boot log? Thanks.

Comment 9 Vilém Maršík 2021-12-03 21:57:06 UTC
My QAT2 machine became out-of-service, will generate a boot log with errors, when I have it fixed again.

Comment 10 Vilém Maršík 2022-01-11 00:05:34 UTC
Machine back online, this is what I am getting now on intel-eaglestream-spr-06. Any ideas for cause / workaround?

---

RHEL-9.0.0-20220109.d.3 - fresh installation first:
# lsmod | grep qat
qat_4xxx               16384  0
intel_qat             176128  1 qat_4xxx
# ls /lib/firmware/qat_4*
ls: cannot access '/lib/firmware/qat_4*': No such file or directory
# dmesg | less
(...)
[   42.671184] 4xxx 0000:6b:00.0: enabling device (0140 -> 0142)
[   42.690795] QAT: AE0 is inactive!!
[   42.694702] QAT: failed to get device out of reset
[   42.700168] 4xxx 0000:6b:00.0: qat_hal_clr_reset error
[   42.706007] 4xxx 0000:6b:00.0: Failed to init the AEs
[   42.711747] 4xxx 0000:6b:00.0: Failed to initialise Acceleration Engine
[   42.720158] 4xxx 0000:6b:00.0: Resetting device qat_dev0
(...)
[   42.903987] QAT: AE0 is inactive!!
[   42.903990] QAT: failed to get device out of reset
[   42.903991] 4xxx 0000:70:00.0: qat_hal_clr_reset error
[   42.903993] 4xxx 0000:70:00.0: Failed to init the AEs
[   42.903994] 4xxx 0000:70:00.0: Failed to initialise Acceleration Engine
[   42.904819] 4xxx 0000:70:00.0: Resetting device qat_dev0
[   42.969697] igb: Intel(R) Gigabit Ethernet Network Driver
[   42.975841] igb: Copyright (c) 2007-2014 Intel Corporation.
[   43.009462] 4xxx: probe of 0000:70:00.0 failed with error -14
[   43.027565] 4xxx 0000:75:00.0: enabling device (0140 -> 0142)
(...)

Now let's install Intel's firmware:
# grep -i stepping /proc/cpuinfo | head -n1 
stepping        : 3
(added "intel_iommu=on" to kernel cmdline)
# cp qat_4xxx_mmp.bin /lib/firmware
# cp c_stepping/qat_4xxx.bin /lib/firmware/
# dracut -f
# rhts-reboot
(...)
# lsmod | grep qat
qat_4xxx               16384  0
intel_qat             176128  1 qat_4xxx
# ls /lib/firmware/qat_4*
/lib/firmware/qat_4xxx.bin  /lib/firmware/qat_4xxx_mmp.bin
# dmesg | less
(...)
[   47.342780] 4xxx 0000:6b:00.0: enabling device (0140 -> 0142)
[   47.343640] libata version 3.00 loaded.
[   47.357649] QAT: AE0 is inactive!!
[   47.361558] QAT: failed to get device out of reset
[   47.367012] 4xxx 0000:6b:00.0: qat_hal_clr_reset error
[   47.372849] 4xxx 0000:6b:00.0: Failed to init the AEs
[   47.378586] 4xxx 0000:6b:00.0: Failed to initialise Acceleration Engine
[   47.386829] 4xxx 0000:6b:00.0: Resetting device qat_dev0
[   47.446730] ahci 0000:00:17.0: version 3.0
[   47.447315] ahci 0000:00:17.0: AHCI 0001.0301 32 slots 8 ports 6 Gbps 0xff impl SATA mode
[   47.456566] ahci 0000:00:17.0: flags: 64bit ncq sntf pm clo only pio slum part ems deso sadm sds
[   47.469572] igb: Intel(R) Gigabit Ethernet Network Driver
[   47.475723] igb: Copyright (c) 2007-2014 Intel Corporation.
[   47.496318] scsi host0: ahci
[   47.499838] 4xxx: probe of 0000:6b:00.0 failed with error -14
[   47.502064] scsi host1: ahci
[   47.509852] 4xxx 0000:70:00.0: enabling device (0140 -> 0142)
[   47.524035] QAT: AE0 is inactive!!
[   47.527944] QAT: failed to get device out of reset
[   47.533412] 4xxx 0000:70:00.0: qat_hal_clr_reset error
[   47.539265] 4xxx 0000:70:00.0: Failed to init the AEs
[   47.545005] 4xxx 0000:70:00.0: Failed to initialise Acceleration Engine
[   47.572495] 4xxx 0000:70:00.0: Resetting device qat_dev0
(...)
[   47.764844] ata8: SATA max UDMA/133 abar m2048@0xa5203000 port 0xa5203480 irq 88
[   47.765049] 4xxx: probe of 0000:70:00.0 failed with error -14
[   47.779982] 4xxx 0000:75:00.0: enabling device (0140 -> 0142)
[   47.795113] QAT: AE0 is inactive!!
[   47.795117] QAT: failed to get device out of reset
[   47.795119] 4xxx 0000:75:00.0: qat_hal_clr_reset error
[   47.795121] 4xxx 0000:75:00.0: Failed to init the AEs
[   47.795121] 4xxx 0000:75:00.0: Failed to initialise Acceleration Engine
[   47.795502] igb 0000:01:00.0 enp1s0: renamed from eth0
[   47.795895] 4xxx 0000:75:00.0: Resetting device qat_dev0
[   47.816849] igb 0000:a8:00.0 ens3: renamed from eth1
[   47.898661] 4xxx: probe of 0000:75:00.0 failed with error -14
[   47.905301] 4xxx 0000:7a:00.0: enabling device (0140 -> 0142)
[   47.919718] QAT: AE0 is inactive!!

Comment 11 Vilém Maršík 2022-01-11 11:34:54 UTC
FYI, Intel thinks that this issue might be fixed by commit ca605f97da. Error messages changed, did we hit another issue?

Comment 12 Vilém Maršík 2022-01-11 23:19:04 UTC
Okay, this is a known issue already fixed in a MR build. Will re-check with the fixed kernel once my RHEL8 testing finishes.

Comment 13 Herbert Xu 2022-01-18 05:36:01 UTC
Vilém, I'm cancelling the needinfo for now.  Please set it again once your testing has finished.  Thanks.

Comment 14 Vilém Maršík 2022-05-19 14:56:05 UTC
This one got through my filters somehow. Going to test now..

Comment 15 zhangxufang 2022-07-18 11:48:26 UTC
Hello
We find RHEL8.6 has the same issue
I wonder which version will fix it?
Thanks

Comment 16 Vladis Dronov 2022-11-03 08:39:43 UTC

*** This bug has been marked as a duplicate of bug 2139439 ***


Note You need to log in before you can comment on or make changes to this bug.