Created attachment 1550637 [details] kernel log 1. Please describe the problem: Periodically we get kernel lockups with this particular kernel report at the root of it. Typically under heavy load testing GT.M, which makes significant use of semaphores. Mar 30 04:25:33 kernel: list_del corruption, ffff953f1fe70e08->next is LIST_POISON1 (dead000000000100) Mar 30 04:25:33 kernel: ------------[ cut here ]------------ Mar 30 04:25:33 kernel: kernel BUG at lib/list_debug.c:47! Mar 30 04:25:33 kernel: invalid opcode: 0000 [#1] SMP PTI Mar 30 04:25:33 kernel: CPU: 1 PID: 933549 Comm: mumps Not tainted 5.0.3-200.fc29.x86_64 #1 Mar 30 04:25:33 kernel: Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.7.1 001/22/2018 Mar 30 04:25:33 kernel: RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x4c Mar 30 04:25:33 kernel: Code: c9 ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 18 16 12 b1 e8 bc 15 c9 ff 0f 0b 48 89 fe 48 89 c2 48 c7 c7 a8 16 12 b1 e8 a8 15 c9 ff <0f> 0b 48 c7 c7 58 17 12 b1 e8 9a 15 c9 ff 0f 0b 48 89 f 2 48 89 fe Mar 30 04:25:33 kernel: RSP: 0018:ffffb02826157e00 EFLAGS: 00010246 Mar 30 04:25:33 kernel: RAX: 000000000000004e RBX: ffff953eed251600 RCX: 0000000000000000 Mar 30 04:25:33 kernel: RDX: 0000000000000000 RSI: ffff95407f4168c8 RDI: ffff95407f4168c8 Mar 30 04:25:33 kernel: RBP: ffff953f1fe70de0 R08: 000000000000065b R09: 0000000000000003 Mar 30 04:25:33 kernel: R10: 0000000000000000 R11: 0000000000000001 R12: 000000001af48021 Mar 30 04:25:33 kernel: R13: ffff953f11710d40 R14: ffff953f11710d48 R15: ffff953eed251688 Mar 30 04:25:33 kernel: FS: 00007f9e93f5a440(0000) GS:ffff95407f400000(0000) knlGS:0000000000000000 Mar 30 04:25:33 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 30 04:25:33 kernel: CR2: 00007fa2cc113000 CR3: 000000155e40e001 CR4: 00000000003606e0 Mar 30 04:25:33 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 30 04:25:33 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Mar 30 04:25:33 kernel: Call Trace: Mar 30 04:25:33 kernel: exit_sem+0x12d/0x577 Mar 30 04:25:33 kernel: do_exit+0x2a4/0xbb0 Mar 30 04:25:33 kernel: ? __do_page_fault+0x26f/0x500 Mar 30 04:25:33 kernel: do_group_exit+0x3a/0xa0 Mar 30 04:25:33 kernel: __x64_sys_exit_group+0x14/0x20 Mar 30 04:25:33 kernel: do_syscall_64+0x5b/0x160 Mar 30 04:25:33 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Mar 30 04:25:33 kernel: RIP: 0033:0x7f9e9406aad6 Mar 30 04:25:33 kernel: Code: Bad RIP value. Mar 30 04:25:33 kernel: RSP: 002b:00007ffc79a3ac28 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 Mar 30 04:25:33 kernel: RAX: ffffffffffffffda RBX: 00007f9e9415d740 RCX: 00007f9e9406aad6 Mar 30 04:25:33 kernel: RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 Mar 30 04:25:33 kernel: RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff78 Mar 30 04:25:33 kernel: R10: 00007ffc79a3aa8e R11: 0000000000000246 R12: 00007f9e9415d740 Mar 30 04:25:33 kernel: R13: 0000000000000002 R14: 00007f9e94166448 R15: 0000000000000000 Mar 30 04:25:33 kernel: Modules linked in: btrfs xor zstd_compress raid6_pq zstd_decompress fuse vfat fat rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache iscsi_tcp libiscsi_tcp libiscsi scsi_transport_i scsi f2fs intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass raid1 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate mgag200 intel_uncore i2c_algo_bit intel_rapl _perf ttm ipmi_ssif drm_kms_helper drm iTCO_wdt iTCO_vendor_support mei_me dcdbas lpc_ich mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter mxm_wmi pcc_cpufreq auth_rpcgss sunrpc binfmt_misc xfs libcrc32c nvme crc32c_intel nvme_core megaraid_sas tg3 wmi loop Mar 30 04:25:33 kernel: ---[ end trace d208a32963f4ac0e ]--- 2. What is the Version-Release number of the kernel: Linux fed.sanchez.com 5.0.3-200.fc29.x86_64 #1 SMP Tue Mar 19 15:07:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : We have observed the issue for some time, maybe a year or so, though it isn't clear when the problem started. 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: Unfortunately, while we do see the issue on a weekly to monthly basis, we don't have a specific trigger for it, as I'm assuming it is timing sensitive. 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: It would not be practical to test with a rawhide kernel on the affected system. 6. Are you running any modules that not shipped with directly Fedora's kernel?: No. 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. The full kernel log from that boot is just under 51MB, and quite repetitive, so I'll trim most of it.
We just got another instance with F30 and a more recent if not current kernel. Jun 09 03:00:41 kernel: ------------[ cut here ]------------ Jun 09 03:00:41 kernel: kernel BUG at lib/list_debug.c:45! Jun 09 03:00:41 kernel: invalid opcode: 0000 [#1] SMP PTI Jun 09 03:00:41 kernel: CPU: 0 PID: 324558 Comm: mumps Not tainted 5.0.14-300.fc30.x86_64 #1 Jun 09 03:00:41 kernel: Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.7.1 001/22/2018 Jun 09 03:00:41 kernel: RIP: 0010:__list_del_entry_valid.cold+0xf/0x55 Jun 09 03:00:41 kernel: Code: c8 ff 0f 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 48 58 10 84 e8 75 05 c8 ff 0f 0b 48 89 fe 48 c7 c7 d8 58 10 84 e8 64 05 c8 ff <0f> 0b 48 c7 c7 88 59 10 84 e8 56 05 c8> Jun 09 03:00:41 kernel: RSP: 0018:ffff94a3e4c87e00 EFLAGS: 00010246 Jun 09 03:00:41 kernel: RAX: 000000000000004e RBX: ffff893e7648aa00 RCX: 0000000000000000 Jun 09 03:00:41 kernel: RDX: 0000000000000000 RSI: ffff893e7f4168c8 RDI: ffff893e7f4168c8 Jun 09 03:00:41 kernel: RBP: ffff893dcda741e0 R08: ffff893e7f4168c8 R09: 000000000000073e Jun 09 03:00:41 kernel: R10: 0000000000026c28 R11: 0000000000000003 R12: ffff893df56cdec8 Jun 09 03:00:41 kernel: R13: ffff893df56cdec0 R14: 000000000e7a804c R15: ffff893e7648aa88 Jun 09 03:00:41 kernel: FS: 00007f13340d2dc0(0000) GS:ffff893e7f400000(0000) knlGS:0000000000000000 Jun 09 03:00:41 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 09 03:00:41 kernel: CR2: 000055586b6075f8 CR3: 000000125920e006 CR4: 00000000003606f0 Jun 09 03:00:41 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 09 03:00:41 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 09 03:00:41 kernel: Call Trace: Jun 09 03:00:41 kernel: exit_sem+0x13b/0x575 Jun 09 03:00:41 kernel: do_exit+0x2ab/0xbd0 Jun 09 03:00:41 kernel: ? handle_mm_fault+0xdc/0x210 Jun 09 03:00:41 kernel: ? do_user_addr_fault+0x218/0x450 Jun 09 03:00:41 kernel: do_group_exit+0x3a/0xa0 Jun 09 03:00:41 kernel: __x64_sys_exit_group+0x14/0x20 Jun 09 03:00:41 kernel: do_syscall_64+0x5b/0x150 Jun 09 03:00:41 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jun 09 03:00:41 kernel: RIP: 0033:0x7f13341e2c21 Jun 09 03:00:41 kernel: Code: Bad RIP value. Jun 09 03:00:41 kernel: RSP: 002b:00007ffc46b91f18 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 Jun 09 03:00:41 kernel: RAX: ffffffffffffffda RBX: 00007f13342d7740 RCX: 00007f13341e2c21 Jun 09 03:00:41 kernel: RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000 Jun 09 03:00:41 kernel: RBP: 0000000000000000 R08: ffffffffffffff78 R09: 00007f1323a205e0 Jun 09 03:00:41 kernel: R10: 000000000000001c R11: 0000000000000246 R12: 00007f13342d7740 Jun 09 03:00:41 kernel: R13: 000000000000000c R14: 00007f13342e0448 R15: 0000000000000000 Jun 09 03:00:41 kernel: Modules linked in: btrfs xor zstd_compress raid6_pq zstd_decompress fuse vfat fat rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache iscsi_tcp libiscsi_tcp libiscsi> Jun 09 03:00:41 kernel: ---[ end trace 2a01f1e0f661fc83 ]--- Jun 09 03:00:41 kernel: RIP: 0010:__list_del_entry_valid.cold+0xf/0x55 Jun 09 03:00:41 kernel: Code: c8 ff 0f 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 48 58 10 84 e8 75 05 c8 ff 0f 0b 48 89 fe 48 c7 c7 d8 58 10 84 e8 64 05 c8 ff <0f> 0b 48 c7 c7 88 59 10 84 e8 56 05 c8> Jun 09 03:00:41 kernel: RSP: 0018:ffff94a3e4c87e00 EFLAGS: 00010246 Jun 09 03:00:41 kernel: RAX: 000000000000004e RBX: ffff893e7648aa00 RCX: 0000000000000000 Jun 09 03:00:41 kernel: RDX: 0000000000000000 RSI: ffff893e7f4168c8 RDI: ffff893e7f4168c8 Jun 09 03:00:41 kernel: RBP: ffff893dcda741e0 R08: ffff893e7f4168c8 R09: 000000000000073e Jun 09 03:00:41 kernel: R10: 0000000000026c28 R11: 0000000000000003 R12: ffff893df56cdec8 Jun 09 03:00:41 kernel: R13: ffff893df56cdec0 R14: 000000000e7a804c R15: ffff893e7648aa88 Jun 09 03:00:41 kernel: FS: 00007f13340d2dc0(0000) GS:ffff893e7f400000(0000) knlGS:0000000000000000 Jun 09 03:00:41 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 09 03:00:41 kernel: CR2: 00007f13341e2bf7 CR3: 000000125920e006 CR4: 00000000003606f0 Jun 09 03:00:41 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 09 03:00:41 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 09 03:00:41 kernel: Fixing recursive fault but reboot is needed!
Another instance: Jun 21 15:51:53 kernel: list_del corruption, ffff9f00b8ef58c8->next is LIST_POISON1 (dead000000000100) Jun 21 15:51:53 kernel: ------------[ cut here ]------------ Jun 21 15:51:53 kernel: kernel BUG at lib/list_debug.c:45! Jun 21 15:51:53 kernel: invalid opcode: 0000 [#1] SMP PTI Jun 21 15:51:53 kernel: CPU: 0 PID: 369535 Comm: mumps Not tainted 5.1.5-300.fc30.x86_64 #1 Jun 21 15:51:53 kernel: Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.7.1 001/22/2018 Jun 21 15:51:53 kernel: RIP: 0010:__list_del_entry_valid.cold+0xf/0x55 Jun 21 15:51:53 kernel: Code: c7 ff 0f 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 50 54 11 99 e8 85 0e c7 ff 0f 0b 48 89 fe 48 c7 c7 e0 54 11 99 e8 74 0e c7 ff <0f> 0b 48 c7 c7 90 55 11 99 e8 66 0e c7 ff 0f 0b 48 89 f2 48 89 fe Jun 21 15:51:53 kernel: RSP: 0018:ffffaedaa4f07e00 EFLAGS: 00010246 Jun 21 15:51:53 kernel: RAX: 000000000000004e RBX: ffff9f01f7ac0800 RCX: 0000000000000000 Jun 21 15:51:53 kernel: RDX: 0000000000000000 RSI: ffff9ef37f4168c8 RDI: ffff9ef37f4168c8 Jun 21 15:51:53 kernel: RBP: ffff9f00b8ef58a0 R08: ffff9ef37f4168c8 R09: 00000000000006e6 Jun 21 15:51:53 kernel: R10: ffff9f03bff6c348 R11: 0000000000000003 R12: ffff9eeea3e6bda8 Jun 21 15:51:53 kernel: R13: ffff9eeea3e6bda0 R14: 0000000010b28009 R15: ffff9f01f7ac0888 Jun 21 15:51:53 kernel: FS: 00007f063d713dc0(0000) GS:ffff9ef37f400000(0000) knlGS:0000000000000000 Jun 21 15:51:53 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 21 15:51:53 kernel: CR2: 000055af58b84138 CR3: 000000047020e001 CR4: 00000000003606f0 Jun 21 15:51:53 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 21 15:51:53 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 21 15:51:53 kernel: Call Trace: Jun 21 15:51:53 kernel: exit_sem+0x156/0x572 Jun 21 15:51:53 kernel: do_exit+0x2ab/0xbd0 Jun 21 15:51:53 kernel: ? handle_mm_fault+0xdc/0x210 Jun 21 15:51:53 kernel: ? do_user_addr_fault+0x216/0x450 Jun 21 15:51:53 kernel: do_group_exit+0x3a/0xa0 Jun 21 15:51:53 kernel: __x64_sys_exit_group+0x14/0x20 Jun 21 15:51:53 kernel: do_syscall_64+0x5b/0x170 Jun 21 15:51:53 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jun 21 15:51:53 kernel: RIP: 0033:0x7f063d823c26 Jun 21 15:51:53 kernel: Code: Bad RIP value. Jun 21 15:51:53 kernel: RSP: 002b:00007ffd660c7748 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 Jun 21 15:51:53 kernel: RAX: ffffffffffffffda RBX: 00007f063d918740 RCX: 00007f063d823c26 Jun 21 15:51:53 kernel: RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 Jun 21 15:51:53 kernel: RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff78 Jun 21 15:51:53 kernel: R10: 000000000000000e R11: 0000000000000246 R12: 00007f063d918740 Jun 21 15:51:53 kernel: R13: 0000000000000002 R14: 00007f063d921448 R15: 0000000000000000 Jun 21 15:51:53 kernel: Modules linked in: btrfs xor zstd_compress raid6_pq zstd_decompress fuse vfat fat rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi f2fs intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm raid1 mgag200 irqbypass i2c_algo_bit ipmi_ssif ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel drm intel_cstate iTCO_wdt mei_me joydev mei iTCO_vendor_support intel_uncore intel_rapl_perf lpc_ich dcdbas ipmi_si mxm_wmi ipmi_devintf ipmi_msghandler acpi_power_meter pcc_cpufreq auth_rpcgss binfmt_misc sunrpc ip_tables xfs libcrc32c nvme crc32c_intel nvme_core megaraid_sas tg3 wmi loop Jun 21 15:51:53 kernel: ---[ end trace 0e95669b5b9b14b0 ]--- Jun 21 15:51:53 kernel: RIP: 0010:__list_del_entry_valid.cold+0xf/0x55 Jun 21 15:51:53 kernel: Code: c7 ff 0f 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 50 54 11 99 e8 85 0e c7 ff 0f 0b 48 89 fe 48 c7 c7 e0 54 11 99 e8 74 0e c7 ff <0f> 0b 48 c7 c7 90 55 11 99 e8 66 0e c7 ff 0f 0b 48 89 f2 48 89 fe Jun 21 15:51:53 kernel: RSP: 0018:ffffaedaa4f07e00 EFLAGS: 00010246 Jun 21 15:51:53 kernel: RAX: 000000000000004e RBX: ffff9f01f7ac0800 RCX: 0000000000000000 Jun 21 15:51:53 kernel: RDX: 0000000000000000 RSI: ffff9ef37f4168c8 RDI: ffff9ef37f4168c8 Jun 21 15:51:53 kernel: RBP: ffff9f00b8ef58a0 R08: ffff9ef37f4168c8 R09: 00000000000006e6 Jun 21 15:51:53 kernel: R10: ffff9f03bff6c348 R11: 0000000000000003 R12: ffff9eeea3e6bda8 Jun 21 15:51:53 kernel: R13: ffff9eeea3e6bda0 R14: 0000000010b28009 R15: ffff9f01f7ac0888 Jun 21 15:51:53 kernel: FS: 00007f063d713dc0(0000) GS:ffff9ef37f400000(0000) knlGS:0000000000000000 Jun 21 15:51:53 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 21 15:51:53 kernel: CR2: 00007f063d823bfc CR3: 000000047020e001 CR4: 00000000003606f0 Jun 21 15:51:53 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 21 15:51:53 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 21 15:51:53 kernel: Fixing recursive fault but reboot is needed!
Would holding the ulp->lock spin lock across the call to __lookup_undo(ulp, semid) and/or the list_del(&un->list_id) in sem.c:exit_sem() be more correct?
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 30 kernel bugs. Fedora 30 has now been rebased to 5.2.9-200.fc30. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 31, and are still experiencing this issue, please change the version to Fedora 31. If you experience different issues, please open a new bug report for those.
Unfortunately, I don't (yet) have a test to reproduce the issue on demand. However, I will boot into the latest kernel tomorrow, and we'll see if the problem recurs.
Sep 09 16:49:17 kernel: list_del corruption. next->prev should be ffff8a50f45f9180, but was dead000000000200 Sep 09 16:49:17 kernel: ------------[ cut here ]------------ Sep 09 16:49:17 kernel: kernel BUG at lib/list_debug.c:54! Sep 09 16:49:17 kernel: invalid opcode: 0000 [#1] SMP PTI Sep 09 16:49:17 kernel: CPU: 18 PID: 383101 Comm: mumps Not tainted 5.2.8-200.fc30.x86_64 #1 Sep 09 16:49:17 kernel: Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.7.1 001/22/2018 Sep 09 16:49:17 kernel: RIP: 0010:__list_del_entry_valid.cold+0x1d/0x55 Sep 09 16:49:17 kernel: Code: c7 c7 b8 92 13 aa e8 f5 36 c6 ff 0f 0b 48 89 fe 48 c7 c7 48 93 13 aa e8 e4 36 c6 ff 0f 0b 48 c7 c7 f8 93 13 aa e8 d6 36 c6 ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 b8 93 13 aa e8 c2 36 c6 ff 0f 0b Sep 09 16:49:17 kernel: RSP: 0018:ffffb26621e6fdf8 EFLAGS: 00010246 Sep 09 16:49:17 kernel: RAX: 0000000000000054 RBX: ffff8a6262391200 RCX: 0000000000000000 Sep 09 16:49:17 kernel: RDX: 0000000000000000 RSI: ffff8a543f657908 RDI: ffff8a543f657908 Sep 09 16:49:17 kernel: RBP: ffff8a50f45f9180 R08: ffff8a543f657908 R09: 00000000000006f0 Sep 09 16:49:17 kernel: R10: ffff8a647ff6c20c R11: 0000000000000003 R12: ffff8a4d30595348 Sep 09 16:49:17 kernel: R13: ffff8a4d30595340 R14: 0000000050b6002c R15: ffff8a6262393c88 Sep 09 16:49:17 kernel: FS: 0000000000000000(0000) GS:ffff8a543f640000(0000) knlGS:0000000000000000 Sep 09 16:49:17 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 09 16:49:17 kernel: CR2: 00007fdc5841f280 CR3: 00000002e740a004 CR4: 00000000003606e0 Sep 09 16:49:17 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 09 16:49:17 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Sep 09 16:49:17 kernel: Call Trace: Sep 09 16:49:17 kernel: exit_sem+0x191/0x585 Sep 09 16:49:17 kernel: do_exit+0x2ba/0xbd0 Sep 09 16:49:17 kernel: ? do_user_addr_fault+0x216/0x450 Sep 09 16:49:17 kernel: do_group_exit+0x3a/0xa0 Sep 09 16:49:17 kernel: __x64_sys_exit_group+0x14/0x20 Sep 09 16:49:17 kernel: do_syscall_64+0x5f/0x1a0 Sep 09 16:49:17 kernel: ? page_fault+0x8/0x30 Sep 09 16:49:17 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Sep 09 16:49:17 kernel: RIP: 0033:0x7fdc58df2c26 Sep 09 16:49:17 kernel: Code: Bad RIP value. Sep 09 16:49:17 kernel: RSP: 002b:00007ffe2635f108 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 Sep 09 16:49:17 kernel: RAX: ffffffffffffffda RBX: 00007fdc58ee7740 RCX: 00007fdc58df2c26 Sep 09 16:49:17 kernel: RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 Sep 09 16:49:17 kernel: RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff78 Sep 09 16:49:17 kernel: R10: 0000000000000017 R11: 0000000000000246 R12: 00007fdc58ee7740 Sep 09 16:49:17 kernel: R13: 0000000000000003 R14: 00007fdc58ef0448 R15: 0000000000000000 Sep 09 16:49:17 kernel: Modules linked in: crypto_user sha512_ssse3 sha512_generic btrfs xor zstd_compress raid6_pq zstd_decompress fuse vfat fat rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi f2fs intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm raid1 irqbypass mgag200 i2c_algo_bit crct10dif_pclmul ttm crc32_pclmul drm_kms_helper ghash_clmulni_intel ipmi_ssif intel_cstate iTCO_wdt drm intel_uncore iTCO_vendor_support dcdbas intel_rapl_perf pcc_cpufreq lpc_ich mei_me ipmi_si mxm_wmi mei ipmi_devintf ipmi_msghandler acpi_power_meter binfmt_misc auth_rpcgss sunrpc ip_tables xfs libcrc32c nvme nvme_core crc32c_intel tg3 megaraid_sas wmi loop Sep 09 16:49:17 kernel: ---[ end trace 5913f64e05cbe9ad ]--- Sep 09 16:49:17 kernel: RIP: 0010:__list_del_entry_valid.cold+0x1d/0x55 Sep 09 16:49:17 kernel: Code: c7 c7 b8 92 13 aa e8 f5 36 c6 ff 0f 0b 48 89 fe 48 c7 c7 48 93 13 aa e8 e4 36 c6 ff 0f 0b 48 c7 c7 f8 93 13 aa e8 d6 36 c6 ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 b8 93 13 aa e8 c2 36 c6 ff 0f 0b Sep 09 16:49:17 kernel: RSP: 0018:ffffb26621e6fdf8 EFLAGS: 00010246 Sep 09 16:49:17 kernel: RAX: 0000000000000054 RBX: ffff8a6262391200 RCX: 0000000000000000 Sep 09 16:49:17 kernel: RDX: 0000000000000000 RSI: ffff8a543f657908 RDI: ffff8a543f657908 Sep 09 16:49:17 kernel: RBP: ffff8a50f45f9180 R08: ffff8a543f657908 R09: 00000000000006f0 Sep 09 16:49:17 kernel: R10: ffff8a647ff6c20c R11: 0000000000000003 R12: ffff8a4d30595348 Sep 09 16:49:17 kernel: R13: ffff8a4d30595340 R14: 0000000050b6002c R15: ffff8a6262393c88 Sep 09 16:49:17 kernel: FS: 0000000000000000(0000) GS:ffff8a543f640000(0000) knlGS:0000000000000000 Sep 09 16:49:17 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 09 16:49:17 kernel: CR2: 00007fdc58df2bfc CR3: 00000002e740a004 CR4: 00000000003606e0 Sep 09 16:49:17 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 09 16:49:17 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Sep 09 16:49:17 kernel: Fixing recursive fault but reboot is needed!
Created attachment 1634901 [details] Test Case I can't confirm that this test exercises exactly the bug in question, as it locks up the machine before any kernel message can be logged, but it is attacking the same area. I suspect that the issue has been addressed by commit 984035ad7b247ccc62b06e113eea3fc673f114cc , which went into 5.4-rc1; I'll test again once 5.4 appears in F31.
The following patch resolves this issue : https://lkml.org/lkml/2019/12/11/1718 Currently the patch sits in linux-next.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 30 kernel bugs. Fedora 30 has now been rebased to 5.5.7-100.fc30. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 31, and are still experiencing this issue, please change the version to Fedora 31. If you experience different issues, please open a new bug report for those.
It looks like a fix for this issue may be in the upcoming 5.6 kernel.
This message is a reminder that Fedora 31 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '31'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 31 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
I believe this issue has been addressed in the upstream kernel.