Bug 2362414 - Fedora 42 Kernel Cannot Initialize Mellanox ConnectX Ethernet Adaptor
Summary: Fedora 42 Kernel Cannot Initialize Mellanox ConnectX Ethernet Adaptor
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 42
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2025-04-25 23:46 UTC by Xinya Zhang
Modified: 2025-05-06 02:25 UTC (History)
14 users (show)

Fixed In Version: kernel-6.14.5-300.fc42 kernel-6.14.5-200.fc41 kernel-6.14.5-100.fc40
Clone Of:
Environment:
Last Closed: 2025-05-06 01:16:13 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
mlx4_core failed to initialize the device, reporting command 0x23 timed out (152.98 KB, text/plain)
2025-04-25 23:46 UTC, Xinya Zhang
no flags Details

Description Xinya Zhang 2025-04-25 23:46:42 UTC
Created attachment 2087321 [details]
mlx4_core failed to initialize the device, reporting command 0x23 timed out

Created attachment 2087321 [details]
mlx4_core failed to initialize the device, reporting command 0x23 timed out

1. Please describe the problem:

mlx4_core cannot initialize the network device.

2. What is the Version-Release number of the kernel:

Linux version 6.14.3-300.fc42.x86_64 (mockbuild@f38d0f39c8424fc98f6be30363aa5e29) (gcc (GCC) 15.0.1 20250329 (Red Hat 15.0.1-0), GNU ld version 2.44-3.fc42) #1 SMP PREEMPT_DYNAMIC Sun Apr 20 16:08:39 UTC >

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Yes. It works on Fedora 41 kernel + Fedora 42 remaining. Kernel version

Linux version 6.13.12-200.fc41.x86_64 (mockbuild@9778ff19adb64779af61fc903691b7f0) (gcc (GCC) 14.2.1 20250110 (Red Hat 14.2.1-7), GNU ld version 2.43.1-5.fc41) #1 SMP PREEMPT_DYNAMIC Sun Apr 20 15:52:43 U>

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Reboot the system, and the initialization failed.
modprobe -r mlx4 related modules and modprobe mlx4_en can also reproduce the dmesg

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Yes. Rawhide still has this problem, kernel version:

kernel: Linux version 6.15.0-0.rc3.20250424gita79be02bba5c.31.fc43.x86_64 (mockbuild@f7d7d97eb2d74025a7388a5e09a583ad) (gcc (GCC) 15.0.1 20250418 (Red Hat 15.0.1-0), GNU ld version 2.44-3.fc43) #1 SMP PREEMPT_DYN>

6. Are you running any modules that not shipped with directly Fedora's kernel?:

No. This is newly upgraded Fedora 42

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Related sections:

mlx4_core: Mellanox ConnectX core driver v4.0-0
mlx4_core: Initializing 0000:05:00.0
mlx4_core 0000:05:00.0: DMFS high rate steer mode is: disabled performance optimized steering
mlx4_core 0000:05:00.0: 31.504 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x4 link at 0000:03:02.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link)
mlx4_core 0000:05:00.0: command 0x23 timed out (go bit not cleared)
mlx4_core 0000:05:00.0: device is going to be reset
mlx4_core 0000:05:00.0: crdump: FW doesn't support health buffer access, skipping
mlx4_core 0000:05:00.0: device was reset successfully
mlx4_core 0000:05:00.0: Failed to initialize queue pair table, aborting
mlx4_core 0000:05:00.0: probe with driver mlx4_core failed with error -5

Working dmesg looks like:

kernel: mlx4_core 0000:05:00.0: DMFS high rate steer mode is: disabled performance optimized steering
kernel: mlx4_core 0000:05:00.0: 31.504 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x4 link at 0000:03:02.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link)
kernel: mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
kernel: mlx4_en 0000:05:00.0: Activating port:1
kernel: mlx4_en: 0000:05:00.0: Port 1: Using 24 TX rings
kernel: mlx4_en: 0000:05:00.0: Port 1: Using 16 RX rings
kernel: mlx4_en: 0000:05:00.0: Port 1: Initializing port
kernel: mlx4_en 0000:05:00.0: registered PHC clock
kernel: mlx4_en 0000:05:00.0: Activating port:2
kernel: mlx4_en: 0000:05:00.0: Port 2: Using 24 TX rings
kernel: mlx4_en: 0000:05:00.0: Port 2: Using 16 RX rings
kernel: mlx4_en: 0000:05:00.0: Port 2: Initializing port

Comment 1 Xinya Zhang 2025-04-26 06:30:55 UTC
Update: rebuild the kernel with CONFIG_PCI_REALLOC_ENABLE_AUTO=n can fix the problem (In practice, it is changing the config file to "# CONFIG_PCI_REALLOC_ENABLE_AUTO is not set")

Is there any feature depending on CONFIG_PCI_REALLOC_ENABLE_AUTO=y?
It seems this option is incompatible with the linux mainline mlx4 driver.

Comment 2 Xinya Zhang 2025-04-28 16:12:43 UTC
Seems there is a related issue in Linux kernel upstream: https://bugzilla.kernel.org/show_bug.cgi?id=220016

Comment 3 Fedora Update System 2025-05-02 16:38:44 UTC
FEDORA-2025-309c37bb06 (kernel-6.14.5-100.fc40) has been submitted as an update to Fedora 40.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-309c37bb06

Comment 4 Fedora Update System 2025-05-02 16:38:54 UTC
FEDORA-2025-8bc830e072 (kernel-6.14.5-200.fc41) has been submitted as an update to Fedora 41.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-8bc830e072

Comment 5 Fedora Update System 2025-05-02 16:39:04 UTC
FEDORA-2025-c5a31cf649 (kernel-6.14.5-300.fc42) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-c5a31cf649

Comment 6 Fedora Update System 2025-05-03 02:50:45 UTC
FEDORA-2025-c5a31cf649 has been pushed to the Fedora 42 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-c5a31cf649`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-c5a31cf649

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 7 Fedora Update System 2025-05-03 03:04:07 UTC
FEDORA-2025-8bc830e072 has been pushed to the Fedora 41 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-8bc830e072`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-8bc830e072

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 8 Fedora Update System 2025-05-03 03:26:26 UTC
FEDORA-2025-309c37bb06 has been pushed to the Fedora 40 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-309c37bb06`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-309c37bb06

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 9 Fedora Update System 2025-05-06 01:16:13 UTC
FEDORA-2025-c5a31cf649 (kernel-6.14.5-300.fc42) has been pushed to the Fedora 42 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 10 Fedora Update System 2025-05-06 01:37:32 UTC
FEDORA-2025-8bc830e072 (kernel-6.14.5-200.fc41) has been pushed to the Fedora 41 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 11 Fedora Update System 2025-05-06 02:25:51 UTC
FEDORA-2025-309c37bb06 (kernel-6.14.5-100.fc40) has been pushed to the Fedora 40 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.