Bug 2014094
Summary: | Missing infiniband network interfaces after update to 5.14 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Liss Heidrich <lissheidr> | ||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 34 | CC: | acaringi, adscvr, airlied, alciregi, bskeggs, hdegoede, jarodwilson, jeremy, jglisse, jonathan, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, ptalbert, redhat, steved, writerthewolf | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | --- | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2022-06-07 22:48:52 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Liss Heidrich
2021-10-14 13:18:37 UTC
This is not an isolated problem. I also am suffering this issue. Kernel 5.13 versions work fine, but any current 5.14 kernel causes the interfaces to simply disappear. Not all the modules seem to get loaded on boot, but loading them manually doesn't help either. I don't remember seeing any obvious warnings in dmesg either (I don't have the logs on hand right now, I'll get them after work today). It's like it partially initializes the kernel modules, then stops, and never finishes getting the device ready. Created attachment 1839498 [details]
Log testing kernel 5.14 (debug)
I had hoped running the debug kernel might, by chance, give something more revealing, but no.
The only difference between the module initialization in 5.14 vs 5.13 kernels is that, after the "mlx4_ib_add" messages, 5.13 continues and registers the device, while 5.14 throws a single error:
kernel: infiniband mlx4_0: Couldn't register device with driver model
No big warnings, just that.
That's all we get.
I'm not e kernel developer, so my knowledge here is minimal, but some tips on how to possibly get more out of the kernel here might help.
Can you try the fix here, it works for me with ConnectX 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0) https://lore.kernel.org/linux-rdma/20211115101519.27210-1-jinpu.wang@ionos.com/T/#u you need to apply the patch, and rebuild the kernel with it. (In reply to redhat from comment #3) > Can you try the fix here, it works for me with ConnectX > 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe > 2.0 5GT/s - IB QDR / 10GigE] (rev b0) > > > https://lore.kernel.org/linux-rdma/20211115101519.27210-1-jinpu.wang@ionos. > com/T/#u > > you need to apply the patch, and rebuild the kernel with it. The patch seems to work as expected on my main machine (Fedora 35). Interfaces show up and work as they are supposed to. My other machine (Fedora 34) kernel panics on boot with "attempted to kill init", but I guess that is probably something unrelated; in the unlikely event that it turns out to be related I will update you. (In reply to Liss Heidrich from comment #4) > (In reply to redhat from comment #3) > > Can you try the fix here, it works for me with ConnectX > > 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe > > 2.0 5GT/s - IB QDR / 10GigE] (rev b0) > > > > > > https://lore.kernel.org/linux-rdma/20211115101519.27210-1-jinpu.wang@ionos. > > com/T/#u > > > > you need to apply the patch, and rebuild the kernel with it. > > The patch seems to work as expected on my main machine (Fedora 35). > Interfaces show up and work as they are supposed to. Thanks for testing. good to know it works. > My other machine (Fedora 34) kernel panics on boot with "attempted to kill > init", but I guess that is probably something unrelated; in the unlikely > event that it turns out to be related I will update you. the error seems unrelated to IB. Please do let me know otherwise. (In reply to redhat from comment #5) > (In reply to Liss Heidrich from comment #4) > > (In reply to redhat from comment #3) > > > Can you try the fix here, it works for me with ConnectX > > > 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe > > > 2.0 5GT/s - IB QDR / 10GigE] (rev b0) > > > > > > > > > https://lore.kernel.org/linux-rdma/20211115101519.27210-1-jinpu.wang@ionos. > > > com/T/#u > > > > > > you need to apply the patch, and rebuild the kernel with it. > > > > The patch seems to work as expected on my main machine (Fedora 35). > > Interfaces show up and work as they are supposed to. > Thanks for testing. good to know it works. > > My other machine (Fedora 34) kernel panics on boot with "attempted to kill > > init", but I guess that is probably something unrelated; in the unlikely > > event that it turns out to be related I will update you. > the error seems unrelated to IB. Please do let me know otherwise. I can now confirm that it was indeed unrelated, tried a different kernel version and now it works flawlessly. Thanks for the patch. Thank you Liss. FTR. the patch is merged to upstream 5.16-rc6, should be soon in stable 5.15.5 This message is a reminder that Fedora Linux 34 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '34'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 34 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed. Fedora Linux 34 entered end-of-life (EOL) status on 2022-06-07. Fedora Linux 34 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. Thank you for reporting this bug and we are sorry it could not be fixed. |