Bug 1776572
Summary: | Frequent kernel freeze with key_garbage_collector | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Dmitrij S. Kryzhevich <kryzhev> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 31 | CC: | airlied, bskeggs, hdegoede, ichavero, imc, itamar, james, jarodwilson, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, masami256, mchehab, mjg59, orion, oster, phil_g, steved, tom |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-03-09 05:11:59 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Dmitrij S. Kryzhevich
2019-11-26 02:55:23 UTC
We are seeing this issue on four different hosts. Trace is about the same: Nov 23 19:24:21 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000014 Nov 23 19:24:21 kernel: #PF: supervisor read access in kernel mode Nov 23 19:24:21 kernel: #PF: error_code(0x0000) - not-present page Nov 23 19:24:21 kernel: PGD 0 P4D 0 Nov 23 19:24:21 kernel: Oops: 0000 [#1] SMP PTI Nov 23 19:24:21 kernel: CPU: 41 PID: 27467 Comm: kworker/41:2 Tainted: G OE 5.3.11-200.fc30.x86_64 #1 Nov 23 19:24:21 kernel: Hardware name: Dell Inc. PowerEdge R430/0HFG24, BIOS 1.6.2 01/08/2016 Nov 23 19:24:21 kernel: Workqueue: events key_garbage_collector Nov 23 19:24:21 kernel: RIP: 0010:keyring_gc_check_iterator+0x2c/0x40 Nov 23 19:24:21 kernel: Code: 44 00 00 48 83 e7 fc b8 01 00 00 00 f6 87 80 00 00 00 21 75 19 48 8b 57 58 48 39 16 7c 05 48 85 d2 7f 0b 48 8b 87 a0 00 00 00 <0f> b6 40 14 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f Nov 23 19:24:21 kernel: RSP: 0018:ffffbb7d47717dd8 EFLAGS: 00010246 Nov 23 19:24:21 kernel: RAX: 0000000000000000 RBX: ffff9e0e363ff450 RCX: ffffbb7d47717e20 Nov 23 19:24:21 kernel: RDX: 0000000000000000 RSI: ffffbb7d47717e20 RDI: ffff9e1d7168cc00 Nov 23 19:24:21 kernel: RBP: ffffbb7d47717e20 R08: ffff9e0e32e1d208 R09: 8080808080808080 Nov 23 19:24:21 kernel: R10: ffff9e1d726790ec R11: 0000000000000018 R12: ffffffff97427f10 Nov 23 19:24:21 kernel: R13: ffff9e0e363ff4d0 R14: ffff9e1d7168cc00 R15: ffff9e0e363ff440 Nov 23 19:24:21 kernel: FS: 0000000000000000(0000) GS:ffff9e1e3fb00000(0000) knlGS:0000000000000000 Nov 23 19:24:21 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 23 19:24:21 kernel: CR2: 0000000000000014 CR3: 0000001ed2f2a001 CR4: 00000000001606e0 Nov 23 19:24:21 kernel: Call Trace: Nov 23 19:24:21 kernel: assoc_array_subtree_iterate+0x57/0xd0 Nov 23 19:24:21 kernel: keyring_gc+0x3d/0x80 Nov 23 19:24:21 kernel: key_garbage_collector+0x363/0x3d0 Nov 23 19:24:21 kernel: ? __schedule+0x2a7/0x680 Nov 23 19:24:21 kernel: process_one_work+0x19d/0x340 Nov 23 19:24:21 kernel: worker_thread+0x50/0x3b0 Nov 23 19:24:21 kernel: kthread+0xfb/0x130 Nov 23 19:24:21 kernel: ? process_one_work+0x340/0x340 Nov 23 19:24:21 kernel: ? kthread_park+0x80/0x80 Nov 23 19:24:21 kernel: ret_from_fork+0x35/0x40 Nov 23 19:24:21 kernel: Modules linked in: xt_multiport vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) rpcsec_gss_krb5 nfsv4 dns_resolver xt_CHECKSUM xt_MASQUERADE tun bridge stp llc nfsv3 nfs_acl nfs lockd grace fscache cfg80211 rfkill nf_conntrack_net bios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtabl e_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_ conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rp crdma ib_isert iscsi_target_mod vfat ib_iser fat ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad iw_cxgb4 r dma_cm iw_cm ib_cm iw_cxgb3 ib_uverbs ib_core intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp k vm_intel kvm irqbypass intel_cstate pktcdvd intel_uncore ipmi_ssif Nov 23 19:24:21 kernel: intel_rapl_perf iTCO_wdt iTCO_vendor_support dcdbas mei_me ipmi_si ipmi_devintf mei lpc_ich ipmi_msgh andler acpi_power_meter auth_rpcgss ip_tables mgag200 drm_vram_helper i2c_algo_bit drm_kms_helper crct10dif_pclmul crc32_pclmul ttm crc 32c_intel drm ghash_clmulni_intel tg3 megaraid_sas wmi dm_multipath sunrpc be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcx gbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse Nov 23 19:24:21 kernel: CR2: 0000000000000014 Nov 23 19:24:21 kernel: ---[ end trace f0ff9fb68506d008 ]--- Nov 23 19:24:21 kernel: RIP: 0010:keyring_gc_check_iterator+0x2c/0x40 Nov 23 19:24:21 kernel: Code: 44 00 00 48 83 e7 fc b8 01 00 00 00 f6 87 80 00 00 00 21 75 19 48 8b 57 58 48 39 16 7c 05 48 85 d2 7f 0b 48 8b 87 a0 00 00 00 <0f> b6 40 14 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f Nov 23 19:24:21 kernel: RSP: 0018:ffffbb7d47717dd8 EFLAGS: 00010246 Nov 23 19:24:21 kernel: RAX: 0000000000000000 RBX: ffff9e0e363ff450 RCX: ffffbb7d47717e20 Nov 23 19:24:21 kernel: RDX: 0000000000000000 RSI: ffffbb7d47717e20 RDI: ffff9e1d7168cc00 Nov 23 19:24:21 kernel: RBP: ffffbb7d47717e20 R08: ffff9e0e32e1d208 R09: 8080808080808080 Nov 23 19:24:21 kernel: R10: ffff9e1d726790ec R11: 0000000000000018 R12: ffffffff97427f10 Nov 23 19:24:21 kernel: R13: ffff9e0e363ff4d0 R14: ffff9e1d7168cc00 R15: ffff9e0e363ff440 Nov 23 19:24:21 kernel: FS: 0000000000000000(0000) GS:ffff9e1e3fb00000(0000) knlGS:0000000000000000 Nov 23 19:24:21 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 23 19:24:21 kernel: CR2: 0000000000000014 CR3: 0000001ed2f2a001 CR4: 00000000001606e0 We started seeing this on an upgrade from Fedora 29 to Fedora 30, when we switched to the 5.3 kernel. We're currently testing a 5.0.9-301.fc30.x86_64 kernel in the hopes that works as a temporary stop-gap. Time to hit the bug has varied between as little as 25 minutes to over 2 weeks. The systems are multi-user (10-15 users) with NFSv4 home directories. Once the above NULL reference is hit the system starts to degenerate. Many processes go into IO wait, and eventually /usr/lib/systemd/systemd (PID 1) goes into IO Wait as well. A reboot is required to recover (where the "reboot" doesn't actually do anything, and 'echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger' is needed to get things to reset). If I had to bet money on the bug, my bet would be on security/keys/keyring.c keyring_gc(), where the comment above the function says: * Not called with any locks held. The keyring's key struct will not be * deallocated under us as only our caller may deallocate it. I'm betting that is no longer the case, since about the time that key_move() was introduced. I'd be happy to provide any other info as I'm able to... Thanks for looking at this! > The systems are multi-user (10-15 users) with NFSv4 home directories. Yes, we have nfs mounted homes too. Some google search shows https://lkml.org/lkml/2019/9/20/822 also. Fedora 29 changed to end-of-life (EOL) status on 2019-11-26. Fedora 29 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. kernel-5.4.0-2.fc32.x86_64 tested, still bad. Seeing this on kernel-5.3.15-300.fc31.x86_64 (and earlier), when hammering NFS4 sec=krb5i, sync doing a kernel compile. *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 31 kernel bugs. Fedora 31 has now been rebased to 5.5.7-200.fc31. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 32, and are still experiencing this issue, please change the version to Fedora 32. If you experience different issues, please open a new bug report for those. Latest kernel works fine for me. If anyone has this issue please reopen. |