Description of problem: As the Infinband link comes up a kernel oops happens in the ib_core module. Checked it by plugin the cable after complete boot and remote login. I use IP over Infiniband for some testing. Version-Release number of selected component (if applicable): This is a fresh upgrade from fedora 30. How reproducible: Three out of three times booting and trying the plug-trick. Steps to Reproduce: If Infiniband plugged on boot its happening right as the system is up and running. - or - If the Infinband gets plugged after complete boot. Actual results: Crash. Even it seems to be something still running in the log. No console nor networking available. Expected results: Let it work as usual. Additional info: Feb 25 11:22:45 odin.langes-netz.home kernel: ib_qib IB0:1 Turning LOS off Feb 25 11:22:46 odin.langes-netz.home kernel: ib_qib 0000:09:00.0: IB0:1 got a lid: 0x2 Feb 25 11:22:51 odin.langes-netz.home kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ibp9s0: link becomes ready Feb 25 11:22:51 odin.langes-netz.home NetworkManager[1053]: <info> [1582629771.5231] device (ibp9s0): carrier: link connected Feb 25 11:22:51 odin.langes-netz.home NetworkManager[1053]: <info> [1582629771.5233] device (ibp9s0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed') Feb 25 11:22:51 odin.langes-netz.home NetworkManager[1053]: <info> [1582629771.5240] policy: auto-activating connection 'fastlane' (e55b03b4-79d4-4cf7-89af-ea866965c8ba) Feb 25 11:22:51 odin.langes-netz.home NetworkManager[1053]: <info> [1582629771.5244] device (ibp9s0): Activation: starting connection 'fastlane' (e55b03b4-79d4-4cf7-89af-ea866965c8ba) Feb 25 11:22:51 odin.langes-netz.home NetworkManager[1053]: <info> [1582629771.5245] device (ibp9s0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Feb 25 11:22:51 odin.langes-netz.home kernel: BUG: kernel NULL pointer dereference, address: 0000000000000010 Feb 25 11:22:51 odin.langes-netz.home kernel: #PF: supervisor read access in kernel mode Feb 25 11:22:51 odin.langes-netz.home kernel: #PF: error_code(0x0000) - not-present page Feb 25 11:22:51 odin.langes-netz.home kernel: PGD 0 P4D 0 Feb 25 11:22:51 odin.langes-netz.home kernel: Oops: 0000 [#1] SMP NOPTI Feb 25 11:22:51 odin.langes-netz.home kernel: CPU: 3 PID: 1053 Comm: NetworkManager Not tainted 5.5.5-200.fc31.x86_64 #1 Feb 25 11:22:51 odin.langes-netz.home kernel: Hardware name: System manufacturer System Product Name/PRIME X470-PRO, BIOS 5220 09/11/2019 Feb 25 11:22:51 odin.langes-netz.home kernel: RIP: 0010:get_pkey_idx_qp_list+0x5a/0x80 [ib_core] Feb 25 11:22:51 odin.langes-netz.home kernel: Code: 06 48 69 ff b8 00 00 00 48 03 bd 98 04 00 00 4c 8b 47 20 48 8d 47 20 49 39 c0 74 26 0f b7 53 04 eb 08 4d 8b 00 49 39 c0 74 18 <66> 41 39 50 10 75 f1 48 83 c7 18 c6 07 00 0f 1f 4> Feb 25 11:22:51 odin.langes-netz.home kernel: RSP: 0018:ffffb191004d3300 EFLAGS: 00010203 Feb 25 11:22:51 odin.langes-netz.home kernel: RAX: ffff9386b993ae30 RBX: ffff9386ca367180 RCX: 0000000000000000 Feb 25 11:22:51 odin.langes-netz.home kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9386b993ae10 Feb 25 11:22:51 odin.langes-netz.home kernel: RBP: ffff9386b35f8000 R08: 0000000000000000 R09: ffff9386ca367180 Feb 25 11:22:51 odin.langes-netz.home kernel: R10: ffffb191004d3530 R11: 0000000000000000 R12: 0000000000000071 Feb 25 11:22:51 odin.langes-netz.home kernel: R13: 0000000000000000 R14: ffff9386b35f8000 R15: ffffb191004d3418 Feb 25 11:22:51 odin.langes-netz.home kernel: FS: 00007f9c4b7e6bc0(0000) GS:ffff9386ce8c0000(0000) knlGS:0000000000000000 Feb 25 11:22:51 odin.langes-netz.home kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 25 11:22:51 odin.langes-netz.home kernel: CR2: 0000000000000010 CR3: 00000003f9a7a000 CR4: 00000000003406e0 Feb 25 11:22:51 odin.langes-netz.home kernel: Call Trace: Feb 25 11:22:51 odin.langes-netz.home kernel: port_pkey_list_insert+0x30/0x1a0 [ib_core] Feb 25 11:22:51 odin.langes-netz.home kernel: ? kmem_cache_alloc_trace+0x162/0x220 Feb 25 11:22:51 odin.langes-netz.home kernel: ? ib_security_modify_qp+0xae/0x3a0 [ib_core] Feb 25 11:22:51 odin.langes-netz.home kernel: ib_security_modify_qp+0x23f/0x3a0 [ib_core] Feb 25 11:22:51 odin.langes-netz.home kernel: _ib_modify_qp+0x272/0x3e0 [ib_core] Feb 25 11:22:51 odin.langes-netz.home kernel: ? __dev_mc_del+0x53/0x70 Feb 25 11:22:51 odin.langes-netz.home kernel: ? rt6_age_exceptions+0x61/0x70 Feb 25 11:22:51 odin.langes-netz.home kernel: ipoib_init_qp+0x78/0x1a0 [ib_ipoib] Feb 25 11:22:51 odin.langes-netz.home kernel: ? fib6_walk+0x49/0x60 Feb 25 11:22:51 odin.langes-netz.home kernel: ? fib6_clean_tree+0x58/0x80 Feb 25 11:22:51 odin.langes-netz.home kernel: ? fib6_del+0x290/0x290 Feb 25 11:22:51 odin.langes-netz.home kernel: ? nf_ct_iterate_cleanup+0x6c/0x150 [nf_conntrack] Feb 25 11:22:51 odin.langes-netz.home kernel: ? rtnl_is_locked+0x11/0x20 Feb 25 11:22:51 odin.langes-netz.home kernel: ? ib_find_pkey+0x98/0xe0 [ib_core] Feb 25 11:22:51 odin.langes-netz.home kernel: ipoib_ib_dev_open_default+0x1a/0x180 [ib_ipoib] Feb 25 11:22:51 odin.langes-netz.home kernel: ipoib_ib_dev_open+0x66/0xa0 [ib_ipoib] Feb 25 11:22:51 odin.langes-netz.home kernel: ipoib_open+0x44/0x110 [ib_ipoib] Feb 25 11:22:51 odin.langes-netz.home kernel: __dev_open+0xcf/0x160 Feb 25 11:22:51 odin.langes-netz.home kernel: __dev_change_flags+0x1a1/0x200 Feb 25 11:22:51 odin.langes-netz.home kernel: ? __dev_notify_flags+0x96/0xf0 Feb 25 11:22:51 odin.langes-netz.home kernel: dev_change_flags+0x21/0x60 Feb 25 11:22:51 odin.langes-netz.home kernel: do_setlink+0x693/0xda0 Feb 25 11:22:51 odin.langes-netz.home kernel: ? __nla_validate_parse+0x51/0x830 Feb 25 11:22:51 odin.langes-netz.home kernel: ? dbs_update_util_handler+0x16/0x80 Feb 25 11:22:51 odin.langes-netz.home kernel: ? cpufreq_dbs_governor_start+0x190/0x190 Feb 25 11:22:51 odin.langes-netz.home kernel: ? update_blocked_averages+0x4f2/0x5b0 Feb 25 11:22:51 odin.langes-netz.home kernel: ? __nla_validate_parse+0x51/0x830 Feb 25 11:22:51 odin.langes-netz.home kernel: ? __snmp6_fill_stats64.isra.0+0x66/0x110 Feb 25 11:22:51 odin.langes-netz.home kernel: __rtnl_newlink+0x57b/0x950 Feb 25 11:22:51 odin.langes-netz.home kernel: ? __nla_reserve+0x38/0x50 Feb 25 11:22:51 odin.langes-netz.home kernel: ? prep_new_page+0xc4/0xf0 Feb 25 11:22:51 odin.langes-netz.home kernel: ? __nla_reserve+0x38/0x50 Feb 25 11:22:51 odin.langes-netz.home kernel: ? __nla_put+0xc/0x20 Feb 25 11:22:51 odin.langes-netz.home kernel: ? nla_put+0x28/0x40 Feb 25 11:22:51 odin.langes-netz.home kernel: ? __nla_reserve+0x38/0x50 Feb 25 11:22:51 odin.langes-netz.home kernel: ? __nla_put+0xc/0x20 Feb 25 11:22:51 odin.langes-netz.home kernel: ? nla_put+0x28/0x40 Feb 25 11:22:51 odin.langes-netz.home kernel: ? rt6_fill_node+0x2d4/0x840 Feb 25 11:22:51 odin.langes-netz.home kernel: ? prep_new_page+0xc4/0xf0 Feb 25 11:22:51 odin.langes-netz.home kernel: ? _cond_resched+0x15/0x30 Feb 25 11:22:51 odin.langes-netz.home kernel: ? kmem_cache_alloc_trace+0x162/0x220 Feb 25 11:22:51 odin.langes-netz.home kernel: rtnl_newlink+0x44/0x70 Feb 25 11:22:51 odin.langes-netz.home kernel: rtnetlink_rcv_msg+0x2b0/0x360 Feb 25 11:22:51 odin.langes-netz.home kernel: ? _cond_resched+0x15/0x30 Feb 25 11:22:51 odin.langes-netz.home kernel: ? kmem_cache_alloc+0x165/0x220 Feb 25 11:22:51 odin.langes-netz.home kernel: ? _cond_resched+0x15/0x30 Feb 25 11:22:51 odin.langes-netz.home kernel: ? rtnl_calcit.isra.0+0x110/0x110 Feb 25 11:22:51 odin.langes-netz.home kernel: netlink_rcv_skb+0x49/0x110 Feb 25 11:22:51 odin.langes-netz.home kernel: netlink_unicast+0x191/0x230 Feb 25 11:22:51 odin.langes-netz.home kernel: netlink_sendmsg+0x243/0x480 Feb 25 11:22:51 odin.langes-netz.home kernel: sock_sendmsg+0x5e/0x60 Feb 25 11:22:51 odin.langes-netz.home kernel: ____sys_sendmsg+0x1ef/0x260 Feb 25 11:22:51 odin.langes-netz.home kernel: ? copy_msghdr_from_user+0xd6/0x150 Feb 25 11:22:51 odin.langes-netz.home kernel: ___sys_sendmsg+0x81/0xc0 Feb 25 11:22:51 odin.langes-netz.home kernel: ? do_filp_open+0xa5/0x100 Feb 25 11:22:51 odin.langes-netz.home kernel: ? list_lru_add+0xb5/0x1d0 Feb 25 11:22:51 odin.langes-netz.home kernel: __sys_sendmsg+0x59/0xa0 Feb 25 11:22:51 odin.langes-netz.home kernel: do_syscall_64+0x5b/0x1c0 Feb 25 11:22:51 odin.langes-netz.home kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Feb 25 11:22:51 odin.langes-netz.home kernel: RIP: 0033:0x7f9c4c7c480d Feb 25 11:22:51 odin.langes-netz.home kernel: Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 ea ec ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2f 44 89 c7 48 89 44 24 08 e> Feb 25 11:22:51 odin.langes-netz.home kernel: RSP: 002b:00007ffe3d309e40 EFLAGS: 00000293 ORIG_RAX: 000000000000002e Feb 25 11:22:51 odin.langes-netz.home kernel: RAX: ffffffffffffffda RBX: 000055a97a10d540 RCX: 00007f9c4c7c480d Feb 25 11:22:51 odin.langes-netz.home kernel: RDX: 0000000000000000 RSI: 00007ffe3d309e90 RDI: 000000000000000c Feb 25 11:22:51 odin.langes-netz.home kernel: RBP: 00007ffe3d309e90 R08: 0000000000000000 R09: 0000000000000000 Feb 25 11:22:51 odin.langes-netz.home kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 000055a97a10d540 Feb 25 11:22:51 odin.langes-netz.home kernel: R13: 00007ffe3d30a048 R14: 00007ffe3d30a03c R15: 0000000000000000 Feb 25 11:22:51 odin.langes-netz.home kernel: Modules linked in: xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfi> Feb 25 11:22:51 odin.langes-netz.home kernel: ib_core eeepc_wmi snd asus_wmi sparse_keymap sp5100_tco crct10dif_pclmul crc32_pclmul wmi_bmof ghash_clmulni_intel rfkill i2c_piix4 k10temp pcspkr soundcore ccp gpio_amdpt gpio_gener> Feb 25 11:22:51 odin.langes-netz.home kernel: CR2: 0000000000000010 Feb 25 11:22:51 odin.langes-netz.home kernel: ---[ end trace e4d4273a2d9bfd16 ]--- Feb 25 11:22:51 odin.langes-netz.home kernel: RIP: 0010:get_pkey_idx_qp_list+0x5a/0x80 [ib_core] Feb 25 11:22:51 odin.langes-netz.home kernel: Code: 06 48 69 ff b8 00 00 00 48 03 bd 98 04 00 00 4c 8b 47 20 48 8d 47 20 49 39 c0 74 26 0f b7 53 04 eb 08 4d 8b 00 49 39 c0 74 18 <66> 41 39 50 10 75 f1 48 83 c7 18 c6 07 00 0f 1f 4> Feb 25 11:22:51 odin.langes-netz.home kernel: RSP: 0018:ffffb191004d3300 EFLAGS: 00010203 Feb 25 11:22:51 odin.langes-netz.home kernel: RAX: ffff9386b993ae30 RBX: ffff9386ca367180 RCX: 0000000000000000 Feb 25 11:22:51 odin.langes-netz.home kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9386b993ae10 Feb 25 11:22:51 odin.langes-netz.home kernel: RBP: ffff9386b35f8000 R08: 0000000000000000 R09: ffff9386ca367180 Feb 25 11:22:51 odin.langes-netz.home kernel: R10: ffffb191004d3530 R11: 0000000000000000 R12: 0000000000000071 Feb 25 11:22:51 odin.langes-netz.home kernel: R13: 0000000000000000 R14: ffff9386b35f8000 R15: ffffb191004d3418 Feb 25 11:22:51 odin.langes-netz.home kernel: FS: 00007f9c4b7e6bc0(0000) GS:ffff9386ce8c0000(0000) knlGS:0000000000000000 Feb 25 11:22:51 odin.langes-netz.home kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 25 11:22:51 odin.langes-netz.home kernel: CR2: 0000000000000010 CR3: 00000003f9a7a000 CR4: 00000000003406e0 Feb 25 11:22:52 odin.langes-netz.home abrt-dump-journal-oops[976]: abrt-dump-journal-oops: Found oopses: 1
Created a v5.5.6 kernel and used the debug config as the base. Same behaviour.
This does not happen if I start the system with the former 5.4.19-100.fc30.x86_64 kernel. So try to find a diff.
Tracked it down to the Linux v5.4.21 kernel. v5.4.20 still working but v5.4.21 crashes. As I found in the Changelog for the v5.4.21 kernel there have been some changes in the ib_core code related to the queue pair handling.
This message is a reminder that Fedora 31 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '31'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 31 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.