1. Please describe the problem: System freeze every 1-4 hours. Logs contain the following: > Workqueue: events key_garbage_collector 2. What is the Version-Release number of the kernel: 5.3.xx 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : It worked in f30 and stopped after upgrade to f31. 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: Just start the system and wait, it will freeze in 1-4 hours. 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: It works with kernel-5.2.17-200.fc30.x86_64 6. Are you running any modules that not shipped with directly Fedora's kernel?: Yes and no. I use nvidia.ko but did try without it with no difference. 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. nov 19 10:53:55 bear.pdc.lkkm kernel: BUG: kernel NULL pointer dereference, address: 0000000000000014 nov 19 10:53:55 bear.pdc.lkkm kernel: #PF: supervisor read access in kernel mode nov 19 10:53:55 bear.pdc.lkkm kernel: #PF: error_code(0x0000) - not-present page nov 19 10:53:55 bear.pdc.lkkm kernel: PGD 0 P4D 0 nov 19 10:53:56 bear.pdc.lkkm kernel: Oops: 0000 [#1] SMP PTI nov 19 10:53:56 bear.pdc.lkkm kernel: CPU: 14 PID: 216 Comm: kworker/14:1 Tainted: P OE 5.3.11-300.fc31.x86_64 #1 nov 19 10:53:56 bear.pdc.lkkm kernel: Hardware name: ASUS All Series/X99-A, BIOS 3801 08/10/2017 nov 19 10:53:56 bear.pdc.lkkm kernel: Workqueue: events key_garbage_collector nov 19 10:53:56 bear.pdc.lkkm kernel: RIP: 0010:keyring_gc_check_iterator+0x2c/0x40 nov 19 10:53:56 bear.pdc.lkkm kernel: Code: 44 00 00 48 83 e7 fc b8 01 00 00 00 f6 87 80 00 00 00 21 75 19 48 8b 57 58 48 39 16 7c 05 48 85 d2 7f 0b 48 8b 87 a0 00 00 00 <0f> b6 40 14 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f nov 19 10:53:56 bear.pdc.lkkm kernel: RSP: 0018:ffffbbc3016e7dd8 EFLAGS: 00010246 nov 19 10:53:56 bear.pdc.lkkm kernel: RAX: 0000000000000000 RBX: ffff9636b239c1c8 RCX: ffffbbc3016e7e20 nov 19 10:53:56 bear.pdc.lkkm kernel: RDX: 0000000000000000 RSI: ffffbbc3016e7e20 RDI: ffff963895da2800 nov 19 10:53:56 bear.pdc.lkkm kernel: RBP: ffffbbc3016e7e20 R08: ffff96387dd04a08 R09: 000000000000000f nov 19 10:53:56 bear.pdc.lkkm kernel: R10: ffff96389ba3472c R11: 0000000000000018 R12: ffffffff91427f10 nov 19 10:53:56 bear.pdc.lkkm kernel: R13: ffff9636b239c210 R14: ffff963effffff41 R15: ffff9636b239c180 nov 19 10:53:56 bear.pdc.lkkm kernel: FS: 0000000000000000(0000) GS:ffff96389fb80000(0000) knlGS:0000000000000000 nov 19 10:53:56 bear.pdc.lkkm kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 nov 19 10:53:56 bear.pdc.lkkm kernel: CR2: 0000000000000014 CR3: 00000007ee026006 CR4: 00000000001626e0 nov 19 10:53:56 bear.pdc.lkkm kernel: Call Trace: nov 19 10:53:56 bear.pdc.lkkm kernel: assoc_array_subtree_iterate+0x57/0xd0 nov 19 10:53:56 bear.pdc.lkkm kernel: keyring_gc+0x3d/0x80 nov 19 10:53:56 bear.pdc.lkkm kernel: key_garbage_collector+0x363/0x3d0 nov 19 10:53:56 bear.pdc.lkkm kernel: ? __schedule+0x2a7/0x680 nov 19 10:53:56 bear.pdc.lkkm kernel: process_one_work+0x19d/0x340 nov 19 10:53:56 bear.pdc.lkkm kernel: worker_thread+0x50/0x3b0 nov 19 10:53:56 bear.pdc.lkkm kernel: kthread+0xfb/0x130 nov 19 10:53:56 bear.pdc.lkkm kernel: ? process_one_work+0x340/0x340 nov 19 10:53:56 bear.pdc.lkkm kernel: ? kthread_park+0x80/0x80 nov 19 10:53:56 bear.pdc.lkkm kernel: ret_from_fork+0x35/0x40 nov 19 10:53:56 bear.pdc.lkkm kernel: Modules linked in: 8021q garp mrp stp llc nfsv4 dns_resolver nfs fscache vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) rp csec_gss_krb5 vfat fat nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(OE) nvidia(POE) snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel intel_rapl_msr intel_rapl_common snd_hda_codec joydev x86_pkg_temp_thermal intel_powerclamp snd_hda_core coretemp kvm_intel snd_hwdep snd_seq snd_seq_device snd_pcm kvm snd_timer drm_kms_helper drm ipmi_devintf ipmi_msghandler irqbypass crct10dif_pclmul crc32_pclmul snd raid1 ghash_clmulni_intel intel_cstate mei_me mei eee pc_wmi asus_wmi sparse_keymap rfkill video intel_wmi_thunderbolt wmi_bmof intel_uncore iTCO_wdt iTCO_vendor_support intel_rapl_perf i2c_i801 soundcore e1000e lpc_ich nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables uas usb_storage mxm_wmi crc32c_intel wmi fuse [last unloaded: vboxdrv] nov 19 10:53:56 bear.pdc.lkkm kernel: CR2: 0000000000000014 nov 19 10:53:56 bear.pdc.lkkm kernel: ---[ end trace 86d8799f1807a8d4 ]--- nov 19 10:53:56 bear.pdc.lkkm kernel: RIP: 0010:keyring_gc_check_iterator+0x2c/0x40 nov 19 10:53:56 bear.pdc.lkkm kernel: Code: 44 00 00 48 83 e7 fc b8 01 00 00 00 f6 87 80 00 00 00 21 75 19 48 8b 57 58 48 39 16 7c 05 48 85 d2 7f 0b 48 8b 87 a0 00 00 00 <0f> b6 40 14 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f nov 19 10:53:56 bear.pdc.lkkm kernel: RSP: 0018:ffffbbc3016e7dd8 EFLAGS: 00010246 nov 19 10:53:56 bear.pdc.lkkm kernel: RAX: 0000000000000000 RBX: ffff9636b239c1c8 RCX: ffffbbc3016e7e20 nov 19 10:53:56 bear.pdc.lkkm kernel: RDX: 0000000000000000 RSI: ffffbbc3016e7e20 RDI: ffff963895da2800 nov 19 10:53:56 bear.pdc.lkkm kernel: RBP: ffffbbc3016e7e20 R08: ffff96387dd04a08 R09: 000000000000000f nov 19 10:53:56 bear.pdc.lkkm kernel: R10: ffff96389ba3472c R11: 0000000000000018 R12: ffffffff91427f10 nov 19 10:53:56 bear.pdc.lkkm kernel: R13: ffff9636b239c210 R14: ffff963effffff41 R15: ffff9636b239c180 nov 19 10:53:56 bear.pdc.lkkm kernel: FS: 0000000000000000(0000) GS:ffff96389fb80000(0000) knlGS:0000000000000000 nov 19 10:53:56 bear.pdc.lkkm kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 nov 19 10:53:56 bear.pdc.lkkm kernel: CR2: 0000000000000014 CR3: 00000007ee026006 CR4: 00000000001626e0
We are seeing this issue on four different hosts. Trace is about the same: Nov 23 19:24:21 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000014 Nov 23 19:24:21 kernel: #PF: supervisor read access in kernel mode Nov 23 19:24:21 kernel: #PF: error_code(0x0000) - not-present page Nov 23 19:24:21 kernel: PGD 0 P4D 0 Nov 23 19:24:21 kernel: Oops: 0000 [#1] SMP PTI Nov 23 19:24:21 kernel: CPU: 41 PID: 27467 Comm: kworker/41:2 Tainted: G OE 5.3.11-200.fc30.x86_64 #1 Nov 23 19:24:21 kernel: Hardware name: Dell Inc. PowerEdge R430/0HFG24, BIOS 1.6.2 01/08/2016 Nov 23 19:24:21 kernel: Workqueue: events key_garbage_collector Nov 23 19:24:21 kernel: RIP: 0010:keyring_gc_check_iterator+0x2c/0x40 Nov 23 19:24:21 kernel: Code: 44 00 00 48 83 e7 fc b8 01 00 00 00 f6 87 80 00 00 00 21 75 19 48 8b 57 58 48 39 16 7c 05 48 85 d2 7f 0b 48 8b 87 a0 00 00 00 <0f> b6 40 14 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f Nov 23 19:24:21 kernel: RSP: 0018:ffffbb7d47717dd8 EFLAGS: 00010246 Nov 23 19:24:21 kernel: RAX: 0000000000000000 RBX: ffff9e0e363ff450 RCX: ffffbb7d47717e20 Nov 23 19:24:21 kernel: RDX: 0000000000000000 RSI: ffffbb7d47717e20 RDI: ffff9e1d7168cc00 Nov 23 19:24:21 kernel: RBP: ffffbb7d47717e20 R08: ffff9e0e32e1d208 R09: 8080808080808080 Nov 23 19:24:21 kernel: R10: ffff9e1d726790ec R11: 0000000000000018 R12: ffffffff97427f10 Nov 23 19:24:21 kernel: R13: ffff9e0e363ff4d0 R14: ffff9e1d7168cc00 R15: ffff9e0e363ff440 Nov 23 19:24:21 kernel: FS: 0000000000000000(0000) GS:ffff9e1e3fb00000(0000) knlGS:0000000000000000 Nov 23 19:24:21 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 23 19:24:21 kernel: CR2: 0000000000000014 CR3: 0000001ed2f2a001 CR4: 00000000001606e0 Nov 23 19:24:21 kernel: Call Trace: Nov 23 19:24:21 kernel: assoc_array_subtree_iterate+0x57/0xd0 Nov 23 19:24:21 kernel: keyring_gc+0x3d/0x80 Nov 23 19:24:21 kernel: key_garbage_collector+0x363/0x3d0 Nov 23 19:24:21 kernel: ? __schedule+0x2a7/0x680 Nov 23 19:24:21 kernel: process_one_work+0x19d/0x340 Nov 23 19:24:21 kernel: worker_thread+0x50/0x3b0 Nov 23 19:24:21 kernel: kthread+0xfb/0x130 Nov 23 19:24:21 kernel: ? process_one_work+0x340/0x340 Nov 23 19:24:21 kernel: ? kthread_park+0x80/0x80 Nov 23 19:24:21 kernel: ret_from_fork+0x35/0x40 Nov 23 19:24:21 kernel: Modules linked in: xt_multiport vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) rpcsec_gss_krb5 nfsv4 dns_resolver xt_CHECKSUM xt_MASQUERADE tun bridge stp llc nfsv3 nfs_acl nfs lockd grace fscache cfg80211 rfkill nf_conntrack_net bios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtabl e_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_ conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rp crdma ib_isert iscsi_target_mod vfat ib_iser fat ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad iw_cxgb4 r dma_cm iw_cm ib_cm iw_cxgb3 ib_uverbs ib_core intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp k vm_intel kvm irqbypass intel_cstate pktcdvd intel_uncore ipmi_ssif Nov 23 19:24:21 kernel: intel_rapl_perf iTCO_wdt iTCO_vendor_support dcdbas mei_me ipmi_si ipmi_devintf mei lpc_ich ipmi_msgh andler acpi_power_meter auth_rpcgss ip_tables mgag200 drm_vram_helper i2c_algo_bit drm_kms_helper crct10dif_pclmul crc32_pclmul ttm crc 32c_intel drm ghash_clmulni_intel tg3 megaraid_sas wmi dm_multipath sunrpc be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcx gbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse Nov 23 19:24:21 kernel: CR2: 0000000000000014 Nov 23 19:24:21 kernel: ---[ end trace f0ff9fb68506d008 ]--- Nov 23 19:24:21 kernel: RIP: 0010:keyring_gc_check_iterator+0x2c/0x40 Nov 23 19:24:21 kernel: Code: 44 00 00 48 83 e7 fc b8 01 00 00 00 f6 87 80 00 00 00 21 75 19 48 8b 57 58 48 39 16 7c 05 48 85 d2 7f 0b 48 8b 87 a0 00 00 00 <0f> b6 40 14 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f Nov 23 19:24:21 kernel: RSP: 0018:ffffbb7d47717dd8 EFLAGS: 00010246 Nov 23 19:24:21 kernel: RAX: 0000000000000000 RBX: ffff9e0e363ff450 RCX: ffffbb7d47717e20 Nov 23 19:24:21 kernel: RDX: 0000000000000000 RSI: ffffbb7d47717e20 RDI: ffff9e1d7168cc00 Nov 23 19:24:21 kernel: RBP: ffffbb7d47717e20 R08: ffff9e0e32e1d208 R09: 8080808080808080 Nov 23 19:24:21 kernel: R10: ffff9e1d726790ec R11: 0000000000000018 R12: ffffffff97427f10 Nov 23 19:24:21 kernel: R13: ffff9e0e363ff4d0 R14: ffff9e1d7168cc00 R15: ffff9e0e363ff440 Nov 23 19:24:21 kernel: FS: 0000000000000000(0000) GS:ffff9e1e3fb00000(0000) knlGS:0000000000000000 Nov 23 19:24:21 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 23 19:24:21 kernel: CR2: 0000000000000014 CR3: 0000001ed2f2a001 CR4: 00000000001606e0 We started seeing this on an upgrade from Fedora 29 to Fedora 30, when we switched to the 5.3 kernel. We're currently testing a 5.0.9-301.fc30.x86_64 kernel in the hopes that works as a temporary stop-gap. Time to hit the bug has varied between as little as 25 minutes to over 2 weeks. The systems are multi-user (10-15 users) with NFSv4 home directories. Once the above NULL reference is hit the system starts to degenerate. Many processes go into IO wait, and eventually /usr/lib/systemd/systemd (PID 1) goes into IO Wait as well. A reboot is required to recover (where the "reboot" doesn't actually do anything, and 'echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger' is needed to get things to reset). If I had to bet money on the bug, my bet would be on security/keys/keyring.c keyring_gc(), where the comment above the function says: * Not called with any locks held. The keyring's key struct will not be * deallocated under us as only our caller may deallocate it. I'm betting that is no longer the case, since about the time that key_move() was introduced. I'd be happy to provide any other info as I'm able to... Thanks for looking at this!
> The systems are multi-user (10-15 users) with NFSv4 home directories. Yes, we have nfs mounted homes too. Some google search shows https://lkml.org/lkml/2019/9/20/822 also.
Fedora 29 changed to end-of-life (EOL) status on 2019-11-26. Fedora 29 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.
kernel-5.4.0-2.fc32.x86_64 tested, still bad.
Seeing this on kernel-5.3.15-300.fc31.x86_64 (and earlier), when hammering NFS4 sec=krb5i, sync doing a kernel compile.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 31 kernel bugs. Fedora 31 has now been rebased to 5.5.7-200.fc31. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 32, and are still experiencing this issue, please change the version to Fedora 32. If you experience different issues, please open a new bug report for those.
Latest kernel works fine for me. If anyone has this issue please reopen.