Bug 1806981 - Crash on infiniband link comes up
Summary: Crash on infiniband link comes up
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kerneloops
Version: 31
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Orphan Owner
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-25 12:00 UTC by Hans-Juergen
Modified: 2020-11-24 16:10 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-24 16:10:16 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Hans-Juergen 2020-02-25 12:00:21 UTC
Description of problem:

As the Infinband link comes up a kernel oops happens in the ib_core module. Checked it by plugin the cable after complete boot and remote login.

I use IP over Infiniband for some testing.


Version-Release number of selected component (if applicable):

This is a fresh upgrade from fedora 30. 


How reproducible:
Three out of three times booting and trying the plug-trick.

Steps to Reproduce:
If Infiniband plugged on boot its happening right as the system is up and running.

- or -

If the Infinband gets plugged after complete boot.


Actual results:
Crash. Even it seems to be something still running in the log. No console nor networking available.

Expected results:
Let it work as usual.


Additional info:
Feb 25 11:22:45 odin.langes-netz.home kernel: ib_qib IB0:1 Turning LOS off
Feb 25 11:22:46 odin.langes-netz.home kernel: ib_qib 0000:09:00.0: IB0:1 got a lid: 0x2
Feb 25 11:22:51 odin.langes-netz.home kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ibp9s0: link becomes ready
Feb 25 11:22:51 odin.langes-netz.home NetworkManager[1053]: <info>  [1582629771.5231] device (ibp9s0): carrier: link connected
Feb 25 11:22:51 odin.langes-netz.home NetworkManager[1053]: <info>  [1582629771.5233] device (ibp9s0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Feb 25 11:22:51 odin.langes-netz.home NetworkManager[1053]: <info>  [1582629771.5240] policy: auto-activating connection 'fastlane' (e55b03b4-79d4-4cf7-89af-ea866965c8ba)
Feb 25 11:22:51 odin.langes-netz.home NetworkManager[1053]: <info>  [1582629771.5244] device (ibp9s0): Activation: starting connection 'fastlane' (e55b03b4-79d4-4cf7-89af-ea866965c8ba)
Feb 25 11:22:51 odin.langes-netz.home NetworkManager[1053]: <info>  [1582629771.5245] device (ibp9s0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Feb 25 11:22:51 odin.langes-netz.home kernel: BUG: kernel NULL pointer dereference, address: 0000000000000010
Feb 25 11:22:51 odin.langes-netz.home kernel: #PF: supervisor read access in kernel mode
Feb 25 11:22:51 odin.langes-netz.home kernel: #PF: error_code(0x0000) - not-present page
Feb 25 11:22:51 odin.langes-netz.home kernel: PGD 0 P4D 0 
Feb 25 11:22:51 odin.langes-netz.home kernel: Oops: 0000 [#1] SMP NOPTI
Feb 25 11:22:51 odin.langes-netz.home kernel: CPU: 3 PID: 1053 Comm: NetworkManager Not tainted 5.5.5-200.fc31.x86_64 #1
Feb 25 11:22:51 odin.langes-netz.home kernel: Hardware name: System manufacturer System Product Name/PRIME X470-PRO, BIOS 5220 09/11/2019
Feb 25 11:22:51 odin.langes-netz.home kernel: RIP: 0010:get_pkey_idx_qp_list+0x5a/0x80 [ib_core]
Feb 25 11:22:51 odin.langes-netz.home kernel: Code: 06 48 69 ff b8 00 00 00 48 03 bd 98 04 00 00 4c 8b 47 20 48 8d 47 20 49 39 c0 74 26 0f b7 53 04 eb 08 4d 8b 00 49 39 c0 74 18 <66> 41 39 50 10 75 f1 48 83 c7 18 c6 07 00 0f 1f 4>
Feb 25 11:22:51 odin.langes-netz.home kernel: RSP: 0018:ffffb191004d3300 EFLAGS: 00010203
Feb 25 11:22:51 odin.langes-netz.home kernel: RAX: ffff9386b993ae30 RBX: ffff9386ca367180 RCX: 0000000000000000
Feb 25 11:22:51 odin.langes-netz.home kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9386b993ae10
Feb 25 11:22:51 odin.langes-netz.home kernel: RBP: ffff9386b35f8000 R08: 0000000000000000 R09: ffff9386ca367180
Feb 25 11:22:51 odin.langes-netz.home kernel: R10: ffffb191004d3530 R11: 0000000000000000 R12: 0000000000000071
Feb 25 11:22:51 odin.langes-netz.home kernel: R13: 0000000000000000 R14: ffff9386b35f8000 R15: ffffb191004d3418
Feb 25 11:22:51 odin.langes-netz.home kernel: FS:  00007f9c4b7e6bc0(0000) GS:ffff9386ce8c0000(0000) knlGS:0000000000000000
Feb 25 11:22:51 odin.langes-netz.home kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 25 11:22:51 odin.langes-netz.home kernel: CR2: 0000000000000010 CR3: 00000003f9a7a000 CR4: 00000000003406e0
Feb 25 11:22:51 odin.langes-netz.home kernel: Call Trace:
Feb 25 11:22:51 odin.langes-netz.home kernel:  port_pkey_list_insert+0x30/0x1a0 [ib_core]
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? kmem_cache_alloc_trace+0x162/0x220
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? ib_security_modify_qp+0xae/0x3a0 [ib_core]
Feb 25 11:22:51 odin.langes-netz.home kernel:  ib_security_modify_qp+0x23f/0x3a0 [ib_core]
Feb 25 11:22:51 odin.langes-netz.home kernel:  _ib_modify_qp+0x272/0x3e0 [ib_core]
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? __dev_mc_del+0x53/0x70
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? rt6_age_exceptions+0x61/0x70
Feb 25 11:22:51 odin.langes-netz.home kernel:  ipoib_init_qp+0x78/0x1a0 [ib_ipoib]
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? fib6_walk+0x49/0x60
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? fib6_clean_tree+0x58/0x80
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? fib6_del+0x290/0x290
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? nf_ct_iterate_cleanup+0x6c/0x150 [nf_conntrack]
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? rtnl_is_locked+0x11/0x20
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? ib_find_pkey+0x98/0xe0 [ib_core]
Feb 25 11:22:51 odin.langes-netz.home kernel:  ipoib_ib_dev_open_default+0x1a/0x180 [ib_ipoib]
Feb 25 11:22:51 odin.langes-netz.home kernel:  ipoib_ib_dev_open+0x66/0xa0 [ib_ipoib]
Feb 25 11:22:51 odin.langes-netz.home kernel:  ipoib_open+0x44/0x110 [ib_ipoib]
Feb 25 11:22:51 odin.langes-netz.home kernel:  __dev_open+0xcf/0x160
Feb 25 11:22:51 odin.langes-netz.home kernel:  __dev_change_flags+0x1a1/0x200
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? __dev_notify_flags+0x96/0xf0
Feb 25 11:22:51 odin.langes-netz.home kernel:  dev_change_flags+0x21/0x60
Feb 25 11:22:51 odin.langes-netz.home kernel:  do_setlink+0x693/0xda0
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? __nla_validate_parse+0x51/0x830
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? dbs_update_util_handler+0x16/0x80
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? cpufreq_dbs_governor_start+0x190/0x190
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? update_blocked_averages+0x4f2/0x5b0
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? __nla_validate_parse+0x51/0x830
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? __snmp6_fill_stats64.isra.0+0x66/0x110
Feb 25 11:22:51 odin.langes-netz.home kernel:  __rtnl_newlink+0x57b/0x950
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? __nla_reserve+0x38/0x50
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? prep_new_page+0xc4/0xf0
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? __nla_reserve+0x38/0x50
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? __nla_put+0xc/0x20
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? nla_put+0x28/0x40
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? __nla_reserve+0x38/0x50
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? __nla_put+0xc/0x20
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? nla_put+0x28/0x40
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? rt6_fill_node+0x2d4/0x840
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? prep_new_page+0xc4/0xf0
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? _cond_resched+0x15/0x30
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? kmem_cache_alloc_trace+0x162/0x220
Feb 25 11:22:51 odin.langes-netz.home kernel:  rtnl_newlink+0x44/0x70
Feb 25 11:22:51 odin.langes-netz.home kernel:  rtnetlink_rcv_msg+0x2b0/0x360
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? _cond_resched+0x15/0x30
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? kmem_cache_alloc+0x165/0x220
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? _cond_resched+0x15/0x30
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? rtnl_calcit.isra.0+0x110/0x110
Feb 25 11:22:51 odin.langes-netz.home kernel:  netlink_rcv_skb+0x49/0x110
Feb 25 11:22:51 odin.langes-netz.home kernel:  netlink_unicast+0x191/0x230
Feb 25 11:22:51 odin.langes-netz.home kernel:  netlink_sendmsg+0x243/0x480
Feb 25 11:22:51 odin.langes-netz.home kernel:  sock_sendmsg+0x5e/0x60
Feb 25 11:22:51 odin.langes-netz.home kernel:  ____sys_sendmsg+0x1ef/0x260
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? copy_msghdr_from_user+0xd6/0x150
Feb 25 11:22:51 odin.langes-netz.home kernel:  ___sys_sendmsg+0x81/0xc0
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? do_filp_open+0xa5/0x100
Feb 25 11:22:51 odin.langes-netz.home kernel:  ? list_lru_add+0xb5/0x1d0
Feb 25 11:22:51 odin.langes-netz.home kernel:  __sys_sendmsg+0x59/0xa0
Feb 25 11:22:51 odin.langes-netz.home kernel:  do_syscall_64+0x5b/0x1c0
Feb 25 11:22:51 odin.langes-netz.home kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Feb 25 11:22:51 odin.langes-netz.home kernel: RIP: 0033:0x7f9c4c7c480d
Feb 25 11:22:51 odin.langes-netz.home kernel: Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 ea ec ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2f 44 89 c7 48 89 44 24 08 e>
Feb 25 11:22:51 odin.langes-netz.home kernel: RSP: 002b:00007ffe3d309e40 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
Feb 25 11:22:51 odin.langes-netz.home kernel: RAX: ffffffffffffffda RBX: 000055a97a10d540 RCX: 00007f9c4c7c480d
Feb 25 11:22:51 odin.langes-netz.home kernel: RDX: 0000000000000000 RSI: 00007ffe3d309e90 RDI: 000000000000000c
Feb 25 11:22:51 odin.langes-netz.home kernel: RBP: 00007ffe3d309e90 R08: 0000000000000000 R09: 0000000000000000
Feb 25 11:22:51 odin.langes-netz.home kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 000055a97a10d540
Feb 25 11:22:51 odin.langes-netz.home kernel: R13: 00007ffe3d30a048 R14: 00007ffe3d30a03c R15: 0000000000000000
Feb 25 11:22:51 odin.langes-netz.home kernel: Modules linked in: xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfi>
Feb 25 11:22:51 odin.langes-netz.home kernel:  ib_core eeepc_wmi snd asus_wmi sparse_keymap sp5100_tco crct10dif_pclmul crc32_pclmul wmi_bmof ghash_clmulni_intel rfkill i2c_piix4 k10temp pcspkr soundcore ccp gpio_amdpt gpio_gener>
Feb 25 11:22:51 odin.langes-netz.home kernel: CR2: 0000000000000010
Feb 25 11:22:51 odin.langes-netz.home kernel: ---[ end trace e4d4273a2d9bfd16 ]---
Feb 25 11:22:51 odin.langes-netz.home kernel: RIP: 0010:get_pkey_idx_qp_list+0x5a/0x80 [ib_core]
Feb 25 11:22:51 odin.langes-netz.home kernel: Code: 06 48 69 ff b8 00 00 00 48 03 bd 98 04 00 00 4c 8b 47 20 48 8d 47 20 49 39 c0 74 26 0f b7 53 04 eb 08 4d 8b 00 49 39 c0 74 18 <66> 41 39 50 10 75 f1 48 83 c7 18 c6 07 00 0f 1f 4>
Feb 25 11:22:51 odin.langes-netz.home kernel: RSP: 0018:ffffb191004d3300 EFLAGS: 00010203
Feb 25 11:22:51 odin.langes-netz.home kernel: RAX: ffff9386b993ae30 RBX: ffff9386ca367180 RCX: 0000000000000000
Feb 25 11:22:51 odin.langes-netz.home kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9386b993ae10
Feb 25 11:22:51 odin.langes-netz.home kernel: RBP: ffff9386b35f8000 R08: 0000000000000000 R09: ffff9386ca367180
Feb 25 11:22:51 odin.langes-netz.home kernel: R10: ffffb191004d3530 R11: 0000000000000000 R12: 0000000000000071
Feb 25 11:22:51 odin.langes-netz.home kernel: R13: 0000000000000000 R14: ffff9386b35f8000 R15: ffffb191004d3418
Feb 25 11:22:51 odin.langes-netz.home kernel: FS:  00007f9c4b7e6bc0(0000) GS:ffff9386ce8c0000(0000) knlGS:0000000000000000
Feb 25 11:22:51 odin.langes-netz.home kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 25 11:22:51 odin.langes-netz.home kernel: CR2: 0000000000000010 CR3: 00000003f9a7a000 CR4: 00000000003406e0
Feb 25 11:22:52 odin.langes-netz.home abrt-dump-journal-oops[976]: abrt-dump-journal-oops: Found oopses: 1

Comment 1 Hans-Juergen 2020-02-26 12:59:46 UTC
Created a v5.5.6 kernel and used the debug config as the base.
Same behaviour.

Comment 2 Hans-Juergen 2020-02-27 08:38:52 UTC
This does not happen if I start the system with the former 5.4.19-100.fc30.x86_64 kernel.

So try to find a diff.

Comment 3 Hans-Juergen 2020-02-27 10:01:25 UTC
Tracked it down to the Linux v5.4.21 kernel.
v5.4.20 still working but v5.4.21 crashes. As I found in the Changelog for the v5.4.21 kernel there have been some changes in the ib_core code related to the queue pair handling.

Comment 4 Ben Cotton 2020-11-03 16:57:25 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 5 Ben Cotton 2020-11-24 16:10:16 UTC
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.