Bug 1694779 - list_del corruption in exit_sem
Summary: list_del corruption in exit_sem
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 31
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-01 16:28 UTC by Gary Duzan
Modified: 2020-11-03 21:15 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-03 21:15:12 UTC
Type: Bug


Attachments (Terms of Use)
kernel log (175.26 KB, text/plain)
2019-04-01 16:28 UTC, Gary Duzan
no flags Details
Test Case (2.46 KB, text/plain)
2019-11-11 14:48 UTC, Gary Duzan
no flags Details

Description Gary Duzan 2019-04-01 16:28:07 UTC
Created attachment 1550637 [details]
kernel log

1. Please describe the problem:

Periodically we get kernel lockups with this particular kernel report at the root of it. Typically under heavy load testing GT.M, which makes significant use of semaphores.
Mar 30 04:25:33 kernel: list_del corruption, ffff953f1fe70e08->next is LIST_POISON1 (dead000000000100)
Mar 30 04:25:33 kernel: ------------[ cut here ]------------
Mar 30 04:25:33 kernel: kernel BUG at lib/list_debug.c:47!
Mar 30 04:25:33 kernel: invalid opcode: 0000 [#1] SMP PTI
Mar 30 04:25:33 kernel: CPU: 1 PID: 933549 Comm: mumps Not tainted 5.0.3-200.fc29.x86_64 #1
Mar 30 04:25:33 kernel: Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.7.1 001/22/2018
Mar 30 04:25:33 kernel: RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x4c
Mar 30 04:25:33 kernel: Code: c9 ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 18 16 12 b1 e8 bc 15 c9 ff 0f 0b 48 89 fe 48 89 c2 48 c7 c7 a8 16 12 b1 e8 a8 15 c9 ff <0f> 0b 48 c7 c7 58 17 12 b1 e8 9a 15 c9 ff 0f 0b 48 89 f
2 48 89 fe
Mar 30 04:25:33 kernel: RSP: 0018:ffffb02826157e00 EFLAGS: 00010246
Mar 30 04:25:33 kernel: RAX: 000000000000004e RBX: ffff953eed251600 RCX: 0000000000000000
Mar 30 04:25:33 kernel: RDX: 0000000000000000 RSI: ffff95407f4168c8 RDI: ffff95407f4168c8
Mar 30 04:25:33 kernel: RBP: ffff953f1fe70de0 R08: 000000000000065b R09: 0000000000000003
Mar 30 04:25:33 kernel: R10: 0000000000000000 R11: 0000000000000001 R12: 000000001af48021
Mar 30 04:25:33 kernel: R13: ffff953f11710d40 R14: ffff953f11710d48 R15: ffff953eed251688
Mar 30 04:25:33 kernel: FS:  00007f9e93f5a440(0000) GS:ffff95407f400000(0000) knlGS:0000000000000000
Mar 30 04:25:33 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 30 04:25:33 kernel: CR2: 00007fa2cc113000 CR3: 000000155e40e001 CR4: 00000000003606e0
Mar 30 04:25:33 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 30 04:25:33 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 30 04:25:33 kernel: Call Trace:
Mar 30 04:25:33 kernel:  exit_sem+0x12d/0x577
Mar 30 04:25:33 kernel:  do_exit+0x2a4/0xbb0
Mar 30 04:25:33 kernel:  ? __do_page_fault+0x26f/0x500
Mar 30 04:25:33 kernel:  do_group_exit+0x3a/0xa0
Mar 30 04:25:33 kernel:  __x64_sys_exit_group+0x14/0x20
Mar 30 04:25:33 kernel:  do_syscall_64+0x5b/0x160
Mar 30 04:25:33 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Mar 30 04:25:33 kernel: RIP: 0033:0x7f9e9406aad6
Mar 30 04:25:33 kernel: Code: Bad RIP value.
Mar 30 04:25:33 kernel: RSP: 002b:00007ffc79a3ac28 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
Mar 30 04:25:33 kernel: RAX: ffffffffffffffda RBX: 00007f9e9415d740 RCX: 00007f9e9406aad6
Mar 30 04:25:33 kernel: RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
Mar 30 04:25:33 kernel: RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff78
Mar 30 04:25:33 kernel: R10: 00007ffc79a3aa8e R11: 0000000000000246 R12: 00007f9e9415d740
Mar 30 04:25:33 kernel: R13: 0000000000000002 R14: 00007f9e94166448 R15: 0000000000000000
Mar 30 04:25:33 kernel: Modules linked in: btrfs xor zstd_compress raid6_pq zstd_decompress fuse vfat fat rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache iscsi_tcp libiscsi_tcp libiscsi scsi_transport_i
scsi f2fs intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass raid1 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate mgag200 intel_uncore i2c_algo_bit intel_rapl
_perf ttm ipmi_ssif drm_kms_helper drm iTCO_wdt iTCO_vendor_support mei_me dcdbas lpc_ich mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter mxm_wmi pcc_cpufreq auth_rpcgss sunrpc binfmt_misc xfs libcrc32c
nvme crc32c_intel nvme_core megaraid_sas tg3 wmi loop
Mar 30 04:25:33 kernel: ---[ end trace d208a32963f4ac0e ]---

2. What is the Version-Release number of the kernel:

Linux fed.sanchez.com 5.0.3-200.fc29.x86_64 #1 SMP Tue Mar 19 15:07:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

We have observed the issue for some time, maybe a year or so, though it isn't clear when the problem started.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Unfortunately, while we do see the issue on a weekly to monthly basis, we don't have a specific trigger for it, as I'm assuming it is timing sensitive.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

It would not be practical to test with a rawhide kernel on the affected system.

6. Are you running any modules that not shipped with directly Fedora's kernel?:

No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

The full kernel log from that boot is just under 51MB, and quite repetitive, so I'll trim most of it.

Comment 1 Gary Duzan 2019-06-10 14:23:39 UTC
We just got another instance with F30 and a more recent if not current kernel.

Jun 09 03:00:41 kernel: ------------[ cut here ]------------
Jun 09 03:00:41 kernel: kernel BUG at lib/list_debug.c:45!
Jun 09 03:00:41 kernel: invalid opcode: 0000 [#1] SMP PTI
Jun 09 03:00:41 kernel: CPU: 0 PID: 324558 Comm: mumps Not tainted 5.0.14-300.fc30.x86_64 #1
Jun 09 03:00:41 kernel: Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.7.1 001/22/2018
Jun 09 03:00:41 kernel: RIP: 0010:__list_del_entry_valid.cold+0xf/0x55
Jun 09 03:00:41 kernel: Code: c8 ff 0f 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 48 58 10 84 e8 75 05 c8 ff 0f 0b 48 89 fe 48 c7 c7 d8 58 10 84 e8 64 05 c8 ff <0f> 0b 48 c7 c7 88 59 10 84 e8 56 05 c8>
Jun 09 03:00:41 kernel: RSP: 0018:ffff94a3e4c87e00 EFLAGS: 00010246
Jun 09 03:00:41 kernel: RAX: 000000000000004e RBX: ffff893e7648aa00 RCX: 0000000000000000
Jun 09 03:00:41 kernel: RDX: 0000000000000000 RSI: ffff893e7f4168c8 RDI: ffff893e7f4168c8
Jun 09 03:00:41 kernel: RBP: ffff893dcda741e0 R08: ffff893e7f4168c8 R09: 000000000000073e
Jun 09 03:00:41 kernel: R10: 0000000000026c28 R11: 0000000000000003 R12: ffff893df56cdec8
Jun 09 03:00:41 kernel: R13: ffff893df56cdec0 R14: 000000000e7a804c R15: ffff893e7648aa88
Jun 09 03:00:41 kernel: FS:  00007f13340d2dc0(0000) GS:ffff893e7f400000(0000) knlGS:0000000000000000
Jun 09 03:00:41 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 09 03:00:41 kernel: CR2: 000055586b6075f8 CR3: 000000125920e006 CR4: 00000000003606f0
Jun 09 03:00:41 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 09 03:00:41 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 09 03:00:41 kernel: Call Trace:
Jun 09 03:00:41 kernel:  exit_sem+0x13b/0x575
Jun 09 03:00:41 kernel:  do_exit+0x2ab/0xbd0
Jun 09 03:00:41 kernel:  ? handle_mm_fault+0xdc/0x210
Jun 09 03:00:41 kernel:  ? do_user_addr_fault+0x218/0x450
Jun 09 03:00:41 kernel:  do_group_exit+0x3a/0xa0
Jun 09 03:00:41 kernel:  __x64_sys_exit_group+0x14/0x20
Jun 09 03:00:41 kernel:  do_syscall_64+0x5b/0x150
Jun 09 03:00:41 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 09 03:00:41 kernel: RIP: 0033:0x7f13341e2c21
Jun 09 03:00:41 kernel: Code: Bad RIP value.
Jun 09 03:00:41 kernel: RSP: 002b:00007ffc46b91f18 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
Jun 09 03:00:41 kernel: RAX: ffffffffffffffda RBX: 00007f13342d7740 RCX: 00007f13341e2c21
Jun 09 03:00:41 kernel: RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
Jun 09 03:00:41 kernel: RBP: 0000000000000000 R08: ffffffffffffff78 R09: 00007f1323a205e0
Jun 09 03:00:41 kernel: R10: 000000000000001c R11: 0000000000000246 R12: 00007f13342d7740
Jun 09 03:00:41 kernel: R13: 000000000000000c R14: 00007f13342e0448 R15: 0000000000000000
Jun 09 03:00:41 kernel: Modules linked in: btrfs xor zstd_compress raid6_pq zstd_decompress fuse vfat fat rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache iscsi_tcp libiscsi_tcp libiscsi>
Jun 09 03:00:41 kernel: ---[ end trace 2a01f1e0f661fc83 ]---
Jun 09 03:00:41 kernel: RIP: 0010:__list_del_entry_valid.cold+0xf/0x55
Jun 09 03:00:41 kernel: Code: c8 ff 0f 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 48 58 10 84 e8 75 05 c8 ff 0f 0b 48 89 fe 48 c7 c7 d8 58 10 84 e8 64 05 c8 ff <0f> 0b 48 c7 c7 88 59 10 84 e8 56 05 c8>
Jun 09 03:00:41 kernel: RSP: 0018:ffff94a3e4c87e00 EFLAGS: 00010246
Jun 09 03:00:41 kernel: RAX: 000000000000004e RBX: ffff893e7648aa00 RCX: 0000000000000000
Jun 09 03:00:41 kernel: RDX: 0000000000000000 RSI: ffff893e7f4168c8 RDI: ffff893e7f4168c8
Jun 09 03:00:41 kernel: RBP: ffff893dcda741e0 R08: ffff893e7f4168c8 R09: 000000000000073e
Jun 09 03:00:41 kernel: R10: 0000000000026c28 R11: 0000000000000003 R12: ffff893df56cdec8
Jun 09 03:00:41 kernel: R13: ffff893df56cdec0 R14: 000000000e7a804c R15: ffff893e7648aa88
Jun 09 03:00:41 kernel: FS:  00007f13340d2dc0(0000) GS:ffff893e7f400000(0000) knlGS:0000000000000000
Jun 09 03:00:41 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 09 03:00:41 kernel: CR2: 00007f13341e2bf7 CR3: 000000125920e006 CR4: 00000000003606f0
Jun 09 03:00:41 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 09 03:00:41 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 09 03:00:41 kernel: Fixing recursive fault but reboot is needed!

Comment 2 Gary Duzan 2019-06-24 13:58:09 UTC
Another instance:

Jun 21 15:51:53 kernel: list_del corruption, ffff9f00b8ef58c8->next is LIST_POISON1 (dead000000000100)
Jun 21 15:51:53 kernel: ------------[ cut here ]------------
Jun 21 15:51:53 kernel: kernel BUG at lib/list_debug.c:45!
Jun 21 15:51:53 kernel: invalid opcode: 0000 [#1] SMP PTI
Jun 21 15:51:53 kernel: CPU: 0 PID: 369535 Comm: mumps Not tainted 5.1.5-300.fc30.x86_64 #1
Jun 21 15:51:53 kernel: Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.7.1 001/22/2018
Jun 21 15:51:53 kernel: RIP: 0010:__list_del_entry_valid.cold+0xf/0x55
Jun 21 15:51:53 kernel: Code: c7 ff 0f 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 50 54 11 99 e8 85 0e c7 ff 0f 0b 48 89 fe 48 c7 c7 e0 54 11 99 e8 74 0e c7 ff <0f> 0b 48 c7 c7 90 55 11 99 e8 66 0e c7 ff 0f 0b 48 89 f2 48 89 fe
Jun 21 15:51:53 kernel: RSP: 0018:ffffaedaa4f07e00 EFLAGS: 00010246
Jun 21 15:51:53 kernel: RAX: 000000000000004e RBX: ffff9f01f7ac0800 RCX: 0000000000000000
Jun 21 15:51:53 kernel: RDX: 0000000000000000 RSI: ffff9ef37f4168c8 RDI: ffff9ef37f4168c8
Jun 21 15:51:53 kernel: RBP: ffff9f00b8ef58a0 R08: ffff9ef37f4168c8 R09: 00000000000006e6
Jun 21 15:51:53 kernel: R10: ffff9f03bff6c348 R11: 0000000000000003 R12: ffff9eeea3e6bda8
Jun 21 15:51:53 kernel: R13: ffff9eeea3e6bda0 R14: 0000000010b28009 R15: ffff9f01f7ac0888
Jun 21 15:51:53 kernel: FS:  00007f063d713dc0(0000) GS:ffff9ef37f400000(0000) knlGS:0000000000000000
Jun 21 15:51:53 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 21 15:51:53 kernel: CR2: 000055af58b84138 CR3: 000000047020e001 CR4: 00000000003606f0
Jun 21 15:51:53 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 21 15:51:53 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 21 15:51:53 kernel: Call Trace:
Jun 21 15:51:53 kernel:  exit_sem+0x156/0x572
Jun 21 15:51:53 kernel:  do_exit+0x2ab/0xbd0
Jun 21 15:51:53 kernel:  ? handle_mm_fault+0xdc/0x210
Jun 21 15:51:53 kernel:  ? do_user_addr_fault+0x216/0x450
Jun 21 15:51:53 kernel:  do_group_exit+0x3a/0xa0
Jun 21 15:51:53 kernel:  __x64_sys_exit_group+0x14/0x20
Jun 21 15:51:53 kernel:  do_syscall_64+0x5b/0x170
Jun 21 15:51:53 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 21 15:51:53 kernel: RIP: 0033:0x7f063d823c26
Jun 21 15:51:53 kernel: Code: Bad RIP value.
Jun 21 15:51:53 kernel: RSP: 002b:00007ffd660c7748 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
Jun 21 15:51:53 kernel: RAX: ffffffffffffffda RBX: 00007f063d918740 RCX: 00007f063d823c26
Jun 21 15:51:53 kernel: RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
Jun 21 15:51:53 kernel: RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff78
Jun 21 15:51:53 kernel: R10: 000000000000000e R11: 0000000000000246 R12: 00007f063d918740
Jun 21 15:51:53 kernel: R13: 0000000000000002 R14: 00007f063d921448 R15: 0000000000000000
Jun 21 15:51:53 kernel: Modules linked in: btrfs xor zstd_compress raid6_pq zstd_decompress fuse vfat fat rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi f2fs intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm raid1 mgag200 irqbypass i2c_algo_bit ipmi_ssif ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel drm intel_cstate iTCO_wdt mei_me joydev mei iTCO_vendor_support intel_uncore intel_rapl_perf lpc_ich dcdbas ipmi_si mxm_wmi ipmi_devintf ipmi_msghandler acpi_power_meter pcc_cpufreq auth_rpcgss binfmt_misc sunrpc ip_tables xfs libcrc32c nvme crc32c_intel nvme_core megaraid_sas tg3 wmi loop
Jun 21 15:51:53 kernel: ---[ end trace 0e95669b5b9b14b0 ]---
Jun 21 15:51:53 kernel: RIP: 0010:__list_del_entry_valid.cold+0xf/0x55
Jun 21 15:51:53 kernel: Code: c7 ff 0f 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 50 54 11 99 e8 85 0e c7 ff 0f 0b 48 89 fe 48 c7 c7 e0 54 11 99 e8 74 0e c7 ff <0f> 0b 48 c7 c7 90 55 11 99 e8 66 0e c7 ff 0f 0b 48 89 f2 48 89 fe
Jun 21 15:51:53 kernel: RSP: 0018:ffffaedaa4f07e00 EFLAGS: 00010246
Jun 21 15:51:53 kernel: RAX: 000000000000004e RBX: ffff9f01f7ac0800 RCX: 0000000000000000
Jun 21 15:51:53 kernel: RDX: 0000000000000000 RSI: ffff9ef37f4168c8 RDI: ffff9ef37f4168c8
Jun 21 15:51:53 kernel: RBP: ffff9f00b8ef58a0 R08: ffff9ef37f4168c8 R09: 00000000000006e6
Jun 21 15:51:53 kernel: R10: ffff9f03bff6c348 R11: 0000000000000003 R12: ffff9eeea3e6bda8
Jun 21 15:51:53 kernel: R13: ffff9eeea3e6bda0 R14: 0000000010b28009 R15: ffff9f01f7ac0888
Jun 21 15:51:53 kernel: FS:  00007f063d713dc0(0000) GS:ffff9ef37f400000(0000) knlGS:0000000000000000
Jun 21 15:51:53 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 21 15:51:53 kernel: CR2: 00007f063d823bfc CR3: 000000047020e001 CR4: 00000000003606f0
Jun 21 15:51:53 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 21 15:51:53 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 21 15:51:53 kernel: Fixing recursive fault but reboot is needed!

Comment 3 Gary Duzan 2019-06-24 15:27:40 UTC
Would holding the ulp->lock spin lock across the call to __lookup_undo(ulp, semid) and/or the list_del(&un->list_id) in sem.c:exit_sem() be more correct?

Comment 4 Justin M. Forbes 2019-08-20 17:40:22 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 30 kernel bugs.

Fedora 30 has now been rebased to 5.2.9-200.fc30.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 31, and are still experiencing this issue, please change the version to Fedora 31.

If you experience different issues, please open a new bug report for those.

Comment 5 Gary Duzan 2019-08-20 18:09:38 UTC
Unfortunately, I don't (yet) have a test to reproduce the issue on demand. However, I will boot into the latest kernel tomorrow, and we'll see if the problem recurs.

Comment 6 Gary Duzan 2019-09-09 20:57:10 UTC
Sep 09 16:49:17 kernel: list_del corruption. next->prev should be ffff8a50f45f9180, but was dead000000000200
Sep 09 16:49:17 kernel: ------------[ cut here ]------------
Sep 09 16:49:17 kernel: kernel BUG at lib/list_debug.c:54!
Sep 09 16:49:17 kernel: invalid opcode: 0000 [#1] SMP PTI
Sep 09 16:49:17 kernel: CPU: 18 PID: 383101 Comm: mumps Not tainted 5.2.8-200.fc30.x86_64 #1
Sep 09 16:49:17 kernel: Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.7.1 001/22/2018
Sep 09 16:49:17 kernel: RIP: 0010:__list_del_entry_valid.cold+0x1d/0x55
Sep 09 16:49:17 kernel: Code: c7 c7 b8 92 13 aa e8 f5 36 c6 ff 0f 0b 48 89 fe 48 c7 c7 48 93 13 aa e8 e4 36 c6 ff 0f 0b 48 c7 c7 f8 93 13 aa e8 d6 36 c6 ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 b8 93 13 aa e8 c2 36 c6 ff 0f 0b
Sep 09 16:49:17 kernel: RSP: 0018:ffffb26621e6fdf8 EFLAGS: 00010246
Sep 09 16:49:17 kernel: RAX: 0000000000000054 RBX: ffff8a6262391200 RCX: 0000000000000000
Sep 09 16:49:17 kernel: RDX: 0000000000000000 RSI: ffff8a543f657908 RDI: ffff8a543f657908
Sep 09 16:49:17 kernel: RBP: ffff8a50f45f9180 R08: ffff8a543f657908 R09: 00000000000006f0
Sep 09 16:49:17 kernel: R10: ffff8a647ff6c20c R11: 0000000000000003 R12: ffff8a4d30595348
Sep 09 16:49:17 kernel: R13: ffff8a4d30595340 R14: 0000000050b6002c R15: ffff8a6262393c88
Sep 09 16:49:17 kernel: FS:  0000000000000000(0000) GS:ffff8a543f640000(0000) knlGS:0000000000000000
Sep 09 16:49:17 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 09 16:49:17 kernel: CR2: 00007fdc5841f280 CR3: 00000002e740a004 CR4: 00000000003606e0
Sep 09 16:49:17 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 09 16:49:17 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep 09 16:49:17 kernel: Call Trace:
Sep 09 16:49:17 kernel:  exit_sem+0x191/0x585
Sep 09 16:49:17 kernel:  do_exit+0x2ba/0xbd0
Sep 09 16:49:17 kernel:  ? do_user_addr_fault+0x216/0x450
Sep 09 16:49:17 kernel:  do_group_exit+0x3a/0xa0
Sep 09 16:49:17 kernel:  __x64_sys_exit_group+0x14/0x20
Sep 09 16:49:17 kernel:  do_syscall_64+0x5f/0x1a0
Sep 09 16:49:17 kernel:  ? page_fault+0x8/0x30
Sep 09 16:49:17 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Sep 09 16:49:17 kernel: RIP: 0033:0x7fdc58df2c26
Sep 09 16:49:17 kernel: Code: Bad RIP value.
Sep 09 16:49:17 kernel: RSP: 002b:00007ffe2635f108 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
Sep 09 16:49:17 kernel: RAX: ffffffffffffffda RBX: 00007fdc58ee7740 RCX: 00007fdc58df2c26
Sep 09 16:49:17 kernel: RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
Sep 09 16:49:17 kernel: RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff78
Sep 09 16:49:17 kernel: R10: 0000000000000017 R11: 0000000000000246 R12: 00007fdc58ee7740
Sep 09 16:49:17 kernel: R13: 0000000000000003 R14: 00007fdc58ef0448 R15: 0000000000000000
Sep 09 16:49:17 kernel: Modules linked in: crypto_user sha512_ssse3 sha512_generic btrfs xor zstd_compress raid6_pq zstd_decompress fuse vfat fat rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi f2fs intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm raid1 irqbypass mgag200 i2c_algo_bit crct10dif_pclmul ttm crc32_pclmul drm_kms_helper ghash_clmulni_intel ipmi_ssif intel_cstate iTCO_wdt drm intel_uncore iTCO_vendor_support dcdbas intel_rapl_perf pcc_cpufreq lpc_ich mei_me ipmi_si mxm_wmi mei ipmi_devintf ipmi_msghandler acpi_power_meter binfmt_misc auth_rpcgss sunrpc ip_tables xfs libcrc32c nvme nvme_core crc32c_intel tg3 megaraid_sas wmi loop
Sep 09 16:49:17 kernel: ---[ end trace 5913f64e05cbe9ad ]---
Sep 09 16:49:17 kernel: RIP: 0010:__list_del_entry_valid.cold+0x1d/0x55
Sep 09 16:49:17 kernel: Code: c7 c7 b8 92 13 aa e8 f5 36 c6 ff 0f 0b 48 89 fe 48 c7 c7 48 93 13 aa e8 e4 36 c6 ff 0f 0b 48 c7 c7 f8 93 13 aa e8 d6 36 c6 ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 b8 93 13 aa e8 c2 36 c6 ff 0f 0b
Sep 09 16:49:17 kernel: RSP: 0018:ffffb26621e6fdf8 EFLAGS: 00010246
Sep 09 16:49:17 kernel: RAX: 0000000000000054 RBX: ffff8a6262391200 RCX: 0000000000000000
Sep 09 16:49:17 kernel: RDX: 0000000000000000 RSI: ffff8a543f657908 RDI: ffff8a543f657908
Sep 09 16:49:17 kernel: RBP: ffff8a50f45f9180 R08: ffff8a543f657908 R09: 00000000000006f0
Sep 09 16:49:17 kernel: R10: ffff8a647ff6c20c R11: 0000000000000003 R12: ffff8a4d30595348
Sep 09 16:49:17 kernel: R13: ffff8a4d30595340 R14: 0000000050b6002c R15: ffff8a6262393c88
Sep 09 16:49:17 kernel: FS:  0000000000000000(0000) GS:ffff8a543f640000(0000) knlGS:0000000000000000
Sep 09 16:49:17 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 09 16:49:17 kernel: CR2: 00007fdc58df2bfc CR3: 00000002e740a004 CR4: 00000000003606e0
Sep 09 16:49:17 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 09 16:49:17 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep 09 16:49:17 kernel: Fixing recursive fault but reboot is needed!

Comment 7 Gary Duzan 2019-11-11 14:48:08 UTC
Created attachment 1634901 [details]
Test Case

I can't confirm that this test exercises exactly the bug in question, as it locks up the machine before any kernel message can be logged, but it is attacking the same area. I suspect that the issue has been addressed by commit 984035ad7b247ccc62b06e113eea3fc673f114cc , which went into 5.4-rc1; I'll test again once 5.4 appears in F31.

Comment 8 joalif 2020-01-08 18:08:46 UTC
The following patch resolves this issue : 
https://lkml.org/lkml/2019/12/11/1718

Currently the patch sits in linux-next.

Comment 9 Justin M. Forbes 2020-03-03 16:33:16 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 30 kernel bugs.

Fedora 30 has now been rebased to 5.5.7-100.fc30.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 31, and are still experiencing this issue, please change the version to Fedora 31.

If you experience different issues, please open a new bug report for those.

Comment 10 Gary Duzan 2020-03-03 17:02:01 UTC
It looks like a fix for this issue may be in the upcoming 5.6 kernel.

Comment 11 Ben Cotton 2020-11-03 16:52:23 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 12 Gary Duzan 2020-11-03 21:15:12 UTC
I believe this issue has been addressed in the upstream kernel.


Note You need to log in before you can comment on or make changes to this bug.