Created attachment 2087321 [details] mlx4_core failed to initialize the device, reporting command 0x23 timed out Created attachment 2087321 [details] mlx4_core failed to initialize the device, reporting command 0x23 timed out 1. Please describe the problem: mlx4_core cannot initialize the network device. 2. What is the Version-Release number of the kernel: Linux version 6.14.3-300.fc42.x86_64 (mockbuild@f38d0f39c8424fc98f6be30363aa5e29) (gcc (GCC) 15.0.1 20250329 (Red Hat 15.0.1-0), GNU ld version 2.44-3.fc42) #1 SMP PREEMPT_DYNAMIC Sun Apr 20 16:08:39 UTC > 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : Yes. It works on Fedora 41 kernel + Fedora 42 remaining. Kernel version Linux version 6.13.12-200.fc41.x86_64 (mockbuild@9778ff19adb64779af61fc903691b7f0) (gcc (GCC) 14.2.1 20250110 (Red Hat 14.2.1-7), GNU ld version 2.43.1-5.fc41) #1 SMP PREEMPT_DYNAMIC Sun Apr 20 15:52:43 U> 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: Reboot the system, and the initialization failed. modprobe -r mlx4 related modules and modprobe mlx4_en can also reproduce the dmesg 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: Yes. Rawhide still has this problem, kernel version: kernel: Linux version 6.15.0-0.rc3.20250424gita79be02bba5c.31.fc43.x86_64 (mockbuild@f7d7d97eb2d74025a7388a5e09a583ad) (gcc (GCC) 15.0.1 20250418 (Red Hat 15.0.1-0), GNU ld version 2.44-3.fc43) #1 SMP PREEMPT_DYN> 6. Are you running any modules that not shipped with directly Fedora's kernel?: No. This is newly upgraded Fedora 42 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. Related sections: mlx4_core: Mellanox ConnectX core driver v4.0-0 mlx4_core: Initializing 0000:05:00.0 mlx4_core 0000:05:00.0: DMFS high rate steer mode is: disabled performance optimized steering mlx4_core 0000:05:00.0: 31.504 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x4 link at 0000:03:02.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link) mlx4_core 0000:05:00.0: command 0x23 timed out (go bit not cleared) mlx4_core 0000:05:00.0: device is going to be reset mlx4_core 0000:05:00.0: crdump: FW doesn't support health buffer access, skipping mlx4_core 0000:05:00.0: device was reset successfully mlx4_core 0000:05:00.0: Failed to initialize queue pair table, aborting mlx4_core 0000:05:00.0: probe with driver mlx4_core failed with error -5 Working dmesg looks like: kernel: mlx4_core 0000:05:00.0: DMFS high rate steer mode is: disabled performance optimized steering kernel: mlx4_core 0000:05:00.0: 31.504 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x4 link at 0000:03:02.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link) kernel: mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0 kernel: mlx4_en 0000:05:00.0: Activating port:1 kernel: mlx4_en: 0000:05:00.0: Port 1: Using 24 TX rings kernel: mlx4_en: 0000:05:00.0: Port 1: Using 16 RX rings kernel: mlx4_en: 0000:05:00.0: Port 1: Initializing port kernel: mlx4_en 0000:05:00.0: registered PHC clock kernel: mlx4_en 0000:05:00.0: Activating port:2 kernel: mlx4_en: 0000:05:00.0: Port 2: Using 24 TX rings kernel: mlx4_en: 0000:05:00.0: Port 2: Using 16 RX rings kernel: mlx4_en: 0000:05:00.0: Port 2: Initializing port
Update: rebuild the kernel with CONFIG_PCI_REALLOC_ENABLE_AUTO=n can fix the problem (In practice, it is changing the config file to "# CONFIG_PCI_REALLOC_ENABLE_AUTO is not set") Is there any feature depending on CONFIG_PCI_REALLOC_ENABLE_AUTO=y? It seems this option is incompatible with the linux mainline mlx4 driver.
Seems there is a related issue in Linux kernel upstream: https://bugzilla.kernel.org/show_bug.cgi?id=220016
FEDORA-2025-309c37bb06 (kernel-6.14.5-100.fc40) has been submitted as an update to Fedora 40. https://bodhi.fedoraproject.org/updates/FEDORA-2025-309c37bb06
FEDORA-2025-8bc830e072 (kernel-6.14.5-200.fc41) has been submitted as an update to Fedora 41. https://bodhi.fedoraproject.org/updates/FEDORA-2025-8bc830e072
FEDORA-2025-c5a31cf649 (kernel-6.14.5-300.fc42) has been submitted as an update to Fedora 42. https://bodhi.fedoraproject.org/updates/FEDORA-2025-c5a31cf649
FEDORA-2025-c5a31cf649 has been pushed to the Fedora 42 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-c5a31cf649` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-c5a31cf649 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2025-8bc830e072 has been pushed to the Fedora 41 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-8bc830e072` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-8bc830e072 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2025-309c37bb06 has been pushed to the Fedora 40 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-309c37bb06` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-309c37bb06 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2025-c5a31cf649 (kernel-6.14.5-300.fc42) has been pushed to the Fedora 42 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2025-8bc830e072 (kernel-6.14.5-200.fc41) has been pushed to the Fedora 41 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2025-309c37bb06 (kernel-6.14.5-100.fc40) has been pushed to the Fedora 40 stable repository. If problem still persists, please make note of it in this bug report.