| Summary: | [abrt] kernel BUG at mm/slub.c:3661! | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | James <james> | ||||
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 24 | CC: | Colin.Simpson, dylan.combs, gansalmon, ichavero, itamar, james, jonathan, kernel-maint, madhu.chinakonda, mchehab | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Unspecified | ||||||
| URL: | https://retrace.fedoraproject.org/faf/reports/bthash/e1dbeb0454b2906d8878ba6b7111e3d247ed8c86 | ||||||
| Whiteboard: | abrt_hash:2386d415bc79a66eaf064bc1f222791fe252ba53;VARIANT_ID=server; | ||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-04-13 23:57:14 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
James
2016-09-18 06:41:31 UTC
Created attachment 1202100 [details]
File: dmesg
I'm seeing this too. The system locks pretty much within a half a minute of login to the desktop. I'm guessing only seen by any one using kerberized NFSv4, so maybe not common. Oddly I've not seen it on any of my workstations. Only on the Kerberised NFSv4 server... My desktop is also an NFS server for my homedir, maybe I'm hitting it via that from some remote machine that has a process of mine. *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 24 kernel bugs. Fedora 24 has now been rebased to 4.7.4-200.fc24. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 25, and are still experiencing this issue, please change the version to Fedora 25. If you experience different issues, please open a new bug report for those. Well I hammered it (In reply to Colin Simpson from comment #4) > My desktop is also an NFS server for my homedir, maybe I'm hitting it via > that from some remote machine that has a process of mine. Well I hammered mine last night on kernel-4.7.4-200.fc24.x86_64 by building a kernel over NFSv4, it's still alive this morning. Uptime's only about 13 hours though so I'm not yet ready to declare this fixed from my pov. (In reply to James from comment #6) > Well I hammered it (In reply to Colin Simpson from comment #4) > > My desktop is also an NFS server for my homedir, maybe I'm hitting it via > > that from some remote machine that has a process of mine. > > Well I hammered mine last night on kernel-4.7.4-200.fc24.x86_64 by building > a kernel over NFSv4, it's still alive this morning. Uptime's only about 13 > hours though so I'm not yet ready to declare this fixed from my pov. Seems like I spoke too soon, it just froze. Still present in 4.7.4-200. Yup 4.7.4-200 still freezes up for me, and quick in my case. This looks like an upstream bug so I took a look at the Kernel bugzilla. These two reports look very similar to ours, but they haven't been linked together by the upstream so maybe these are distinct from each other. One has a patch... https://bugzilla.kernel.org/show_bug.cgi?id=150831 https://bugzilla.kernel.org/show_bug.cgi?id=154001 I was hoping the probable release of 4.8 this coming weekend may resolve this but maybe not as it doesn't seem like these upstream bugs have been auctioned yet. I'm surprised this bug hasn't had more interest from RH Kernel people, as this is a very Enterprise bug by the look of things i.e. Kerberized NFSv4 server. So will not be great if this bug hits anywhere near the enterprise products. (In reply to Colin Simpson from comment #8) > I'm surprised this bug hasn't had more interest from RH Kernel people, as > this is a very Enterprise bug by the look of things i.e. Kerberized NFSv4 > server. So will not be great if this bug hits anywhere near the enterprise > products. I doubt any enterprise users are on 4.7 in production yet. Looks like I'm hit by this, as well: Sep 18 15:38:49 hostname kernel: ------------[ cut here ]------------ Sep 18 15:38:49 hostname kernel: kernel BUG at mm/slub.c:3661! Sep 18 15:38:49 hostname kernel: invalid opcode: 0000 [#1] SMP Sep 18 15:38:49 hostname kernel: Modules linked in: vhost_net vhost macvtap macvlan vfio_pci vfio_iommu_type1 vfio_virqfd vfio xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ip Sep 18 15:38:49 hostname kernel: nuvoton_cir i2c_i801 lpc_ich snd_pcm mei_me rc_core ie31200_edac mei edac_core snd_timer snd shpchp soundcore tpm_tis tpm nfsd auth_rpcgss n Sep 18 15:38:49 hostname kernel: CPU: 3 PID: 420 Comm: usb-storage Not tainted 4.7.3-100.fc23.x86_64 #1 Sep 18 15:38:49 hostname kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Extreme4, BIOS P2.90 07/11/2013 Sep 18 15:38:49 hostname kernel: task: ffff88080af25b80 ti: ffff8800abd34000 task.ti: ffff8800abd34000 Sep 18 15:38:49 hostname kernel: RIP: 0010:[<ffffffff9e21e23d>] [<ffffffff9e21e23d>] kfree+0x12d/0x170 Sep 18 15:38:49 hostname kernel: RSP: 0018:ffff8800abd37c68 EFLAGS: 00010246 Sep 18 15:38:49 hostname kernel: RAX: ffffea0000c0b5e0 RBX: ffff8800302d7872 RCX: 0000000000009175 Sep 18 15:38:49 hostname kernel: RDX: 0000000000000000 RSI: ffff88082f2db560 RDI: ffffea0000c0b5df Sep 18 15:38:49 hostname kernel: RBP: ffff8800abd37c80 R08: 000000000001b560 R09: ffffffff9e5bdf54 Sep 18 15:38:49 hostname kernel: R10: ffffea0000c0b5c0 R11: 0000000000000054 R12: ffff880806dc08f0 Sep 18 15:38:49 hostname kernel: R13: ffffffff9e5bed34 R14: ffff880806dc08c8 R15: 0000000000000000 Sep 18 15:38:49 hostname kernel: FS: 0000000000000000(0000) GS:ffff88082f2c0000(0000) knlGS:0000000000000000 Sep 18 15:38:49 hostname kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 18 15:38:49 hostname kernel: CR2: 00001b31d8895000 CR3: 0000000038c30000 CR4: 00000000001426e0 Sep 18 15:38:49 hostname kernel: Stack: Sep 18 15:38:49 hostname kernel: ffff880806dc08b8 ffff880806dc08f0 0000000000000000 ffff8800abd37c98 Sep 18 15:38:49 hostname kernel: ffffffff9e5bed34 ffff880806dc08b8 ffff8800abd37cd8 ffffffff9e5bf8a0 Sep 18 15:38:49 hostname kernel: 00000002c0008280 ffff880806dc07d8 ffff880806dc08b8 00000000c0008280 Sep 18 15:38:49 hostname kernel: Call Trace: Sep 18 15:38:49 hostname kernel: [<ffffffff9e5bed34>] sg_clean+0x44/0x60 Sep 18 15:38:49 hostname kernel: [<ffffffff9e5bf8a0>] usb_sg_wait+0x110/0x160 Sep 18 15:38:49 hostname kernel: [<ffffffffc013b56e>] usb_stor_bulk_transfer_sglist.part.4+0x7e/0xd0 [usb_storage] Sep 18 15:38:49 hostname kernel: [<ffffffffc013b62c>] usb_stor_bulk_srb+0x6c/0x80 [usb_storage] Sep 18 15:38:49 hostname kernel: [<ffffffffc013b7d3>] usb_stor_Bulk_transport+0x193/0x420 [usb_storage] Sep 18 15:38:49 hostname kernel: [<ffffffff9e7e310c>] ? schedule_timeout+0x1ac/0x270 Sep 18 15:38:49 hostname kernel: [<ffffffff9e7e3e7e>] ? _raw_spin_unlock_irqrestore+0xe/0x10 Sep 18 15:38:49 hostname kernel: [<ffffffffc013c0ab>] usb_stor_invoke_transport+0x3b/0x520 [usb_storage] Sep 18 15:38:49 hostname kernel: [<ffffffff9e3f441d>] ? list_del+0xd/0x30 Sep 18 15:38:49 hostname kernel: [<ffffffff9e7e12fc>] ? wait_for_completion_interruptible+0x17c/0x1a0 Sep 18 15:38:49 hostname kernel: [<ffffffffc013ad0e>] usb_stor_transparent_scsi_command+0xe/0x10 [usb_storage] Sep 18 15:38:49 hostname kernel: [<ffffffffc013da2f>] usb_stor_control_thread+0x15f/0x260 [usb_storage] Sep 18 15:38:49 hostname kernel: [<ffffffffc013d8d0>] ? storage_probe+0x320/0x320 [usb_storage] Sep 18 15:38:49 hostname kernel: [<ffffffffc013d8d0>] ? storage_probe+0x320/0x320 [usb_storage] Sep 18 15:38:49 hostname kernel: [<ffffffff9e0bf268>] kthread+0xd8/0xf0 Sep 18 15:38:49 hostname kernel: [<ffffffff9e7e44bf>] ret_from_fork+0x1f/0x40 Sep 18 15:38:49 hostname kernel: [<ffffffff9e0bf190>] ? kthread_worker_fn+0x170/0x170 Sep 18 15:38:49 hostname kernel: Code: 08 49 83 c4 18 48 89 da 4c 89 ee ff d0 49 8b 04 24 48 85 c0 75 e6 e9 fd fe ff ff 49 8b 02 f6 c4 40 75 0a 49 8b 42 20 a8 01 75 02 <0f> 0 Sep 18 15:38:49 hostname kernel: RIP [<ffffffff9e21e23d>] kfree+0x12d/0x170 Sep 18 15:38:49 hostname kernel: RSP <ffff8800abd37c68> Sep 18 15:38:49 hostname kernel: general protection fault: 0000 [#2] SMP Sep 18 15:38:49 hostname kernel: Modules linked in: vhost_net vhost macvtap macvlan vfio_pci vfio_iommu_type1 vfio_virqfd vfio xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ip Sep 18 15:38:49 hostname kernel: nuvoton_cir i2c_i801 lpc_ich snd_pcm mei_me rc_core ie31200_edac mei edac_core snd_timer snd shpchp soundcore tpm_tis tpm nfsd auth_rpcgss n Sep 18 15:38:49 hostname kernel: CPU: 3 PID: 420 Comm: usb-storage Not tainted 4.7.3-100.fc23.x86_64 #1 Sep 18 15:38:49 hostname kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Extreme4, BIOS P2.90 07/11/2013 Sep 18 15:38:49 hostname kernel: task: ffff88080af25b80 ti: ffff8800abd34000 task.ti: ffff8800abd34000 Sep 18 15:38:49 hostname kernel: RIP: 0010:[<ffffffff9e21d6a0>] [<ffffffff9e21d6a0>] __kmalloc+0xa0/0x240 Sep 18 15:38:49 hostname kernel: RSP: 0018:ffff8800abd375e0 EFLAGS: 00010046 Sep 18 15:38:49 hostname kernel: RAX: 1344dca6e86c59cb RBX: 0000000002088020 RCX: 0000000000000001 Sep 18 15:38:49 hostname kernel: RDX: 000000000000538f RSI: 0000000000000000 RDI: 000000000001b4a0 Sep 18 15:38:49 hostname kernel: RBP: ffff8800abd37618 R08: ffff88082f2db4a0 R09: ffff88080ec03cc0 Sep 18 15:38:49 hostname kernel: R10: 1344dca6e86c59cb R11: 0000000000000000 R12: 0000000002088020 Sep 18 15:38:49 hostname kernel: R13: 0000000000000008 R14: ffffffff9e486432 R15: ffff88080ec03cc0 Sep 18 15:38:49 hostname kernel: FS: 0000000000000000(0000) GS:ffff88082f2c0000(0000) knlGS:0000000000000000 Sep 18 15:38:49 hostname kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 18 15:38:49 hostname kernel: CR2: 00001b31d8895000 CR3: 0000000038c30000 CR4: 00000000001426e0 Sep 18 15:38:49 hostname kernel: Stack: Sep 18 15:38:49 hostname kernel: ffff8800abd3774d ffff88012bd37745 0000000000000008 ffff8800abd37698 Sep 18 15:38:49 hostname kernel: 0000000000000000 ffff88080ecb5410 0000000000000001 ffff8800abd37628 Sep 18 15:38:49 hostname kernel: ffffffff9e486432 ffff8800abd37680 ffffffff9e48670c ffffffffc00fbe5c Sep 18 15:38:49 hostname kernel: Call Trace: Sep 18 15:38:49 hostname kernel: [<ffffffff9e486432>] kzalloc+0xf/0x11 Sep 18 15:38:49 hostname kernel: [<ffffffff9e48670c>] acpi_ns_internalize_name+0x6c/0xc6 Sep 18 15:38:49 hostname kernel: [<ffffffff9e486a4e>] acpi_ns_get_node+0x84/0x104 Sep 18 15:38:49 hostname kernel: [<ffffffff9e490701>] ? acpi_ut_allocate_object_desc_dbg+0x46/0x73 Sep 18 15:38:49 hostname kernel: [<ffffffff9e49079b>] ? acpi_ut_create_internal_object_dbg+0x23/0x89 Sep 18 15:38:49 hostname kernel: [<ffffffff9e483ffe>] acpi_ns_evaluate+0x51/0x24c Sep 18 15:38:49 hostname kernel: [<ffffffff9e483ffe>] ? acpi_ns_evaluate+0x51/0x24c Sep 18 15:38:49 hostname kernel: [<ffffffff9e486e2c>] acpi_evaluate_object+0x147/0x257 Sep 18 15:38:49 hostname kernel: [<ffffffff9e461a1d>] acpi_execute_simple_method+0x50/0x66 Sep 18 15:38:49 hostname kernel: [<ffffffffc00f77af>] acpi_video_device_lcd_set_level+0x2c/0xbc [video] Sep 18 15:38:49 hostname kernel: [<ffffffffc00f7a60>] acpi_video_set_brightness+0x3c/0x41 [video] Sep 18 15:38:49 hostname kernel: [<ffffffff9e45456f>] fb_notifier_callback+0xff/0x120 Sep 18 15:38:49 hostname kernel: [<ffffffff9e0c01da>] notifier_call_chain+0x4a/0x70 Sep 18 15:38:49 hostname kernel: [<ffffffff9e0c0517>] __blocking_notifier_call_chain+0x47/0x60 Sep 18 15:38:49 hostname kernel: [<ffffffff9e0c0546>] blocking_notifier_call_chain+0x16/0x20 Sep 18 15:38:49 hostname kernel: [<ffffffff9e45475b>] fb_notifier_call_chain+0x1b/0x20 Sep 18 15:38:49 hostname kernel: [<ffffffff9e44b173>] fbcon_blank+0x213/0x350 Sep 18 15:38:49 hostname kernel: [<ffffffff9e0f8495>] ? console_unlock+0x255/0x580 Sep 18 15:38:49 hostname kernel: [<ffffffff9e0f8fea>] ? vprintk_emit+0x2aa/0x520 Sep 18 15:38:49 hostname kernel: [<ffffffff9e7e3e7e>] ? _raw_spin_unlock_irqrestore+0xe/0x10 Sep 18 15:38:49 hostname kernel: [<ffffffff9e10dd45>] ? mod_timer+0x105/0x230 Sep 18 15:38:49 hostname kernel: [<ffffffff9e0f93e9>] ? vprintk_default+0x29/0x40 Sep 18 15:38:49 hostname kernel: [<ffffffff9e4d5583>] do_unblank_screen+0xd3/0x1a0 Sep 18 15:38:49 hostname kernel: [<ffffffff9e4d5660>] unblank_screen+0x10/0x20 Sep 18 15:38:49 hostname kernel: [<ffffffff9e3e5dc5>] bust_spinlocks+0x15/0x30 Sep 18 15:38:49 hostname kernel: [<ffffffff9e029745>] oops_end+0x35/0xd0 Sep 18 15:38:49 hostname kernel: [<ffffffff9e029c7b>] die+0x4b/0x70 Sep 18 15:38:49 hostname kernel: [<ffffffff9e026bd3>] do_trap+0xb3/0x140 Sep 18 15:38:49 hostname kernel: [<ffffffff9e026fb9>] do_error_trap+0x89/0x110 Sep 18 15:38:49 hostname kernel: [<ffffffff9e21e23d>] ? kfree+0x12d/0x170 Sep 18 15:38:49 hostname kernel: [<ffffffff9e7dfb11>] ? __schedule+0x2f1/0x760 Sep 18 15:38:49 hostname kernel: [<ffffffff9e5bed34>] ? sg_clean+0x44/0x60 Sep 18 15:38:49 hostname kernel: [<ffffffff9e027510>] do_invalid_op+0x20/0x30 Sep 18 15:38:49 hostname kernel: [<ffffffff9e7e5e7e>] invalid_op+0x1e/0x30 Sep 18 15:38:49 hostname kernel: [<ffffffff9e5bed34>] ? sg_clean+0x44/0x60 Sep 18 15:38:49 hostname kernel: [<ffffffff9e5bdf54>] ? urb_destroy+0x24/0x30 Sep 18 15:38:49 hostname kernel: [<ffffffff9e21e23d>] ? kfree+0x12d/0x170 Sep 18 15:38:49 hostname kernel: [<ffffffff9e5bdf54>] ? urb_destroy+0x24/0x30 Sep 18 15:38:49 hostname kernel: [<ffffffff9e5bed34>] sg_clean+0x44/0x60 Sep 18 15:38:49 hostname kernel: [<ffffffff9e5bf8a0>] usb_sg_wait+0x110/0x160 Sep 18 15:38:49 hostname kernel: [<ffffffffc013b56e>] usb_stor_bulk_transfer_sglist.part.4+0x7e/0xd0 [usb_storage] Sep 18 15:38:49 hostname kernel: [<ffffffffc013b62c>] usb_stor_bulk_srb+0x6c/0x80 [usb_storage] Sep 18 15:38:49 hostname kernel: [<ffffffffc013b7d3>] usb_stor_Bulk_transport+0x193/0x420 [usb_storage] Sep 18 15:38:49 hostname kernel: [<ffffffff9e7e310c>] ? schedule_timeout+0x1ac/0x270 Sep 18 15:38:49 hostname kernel: [<ffffffff9e7e3e7e>] ? _raw_spin_unlock_irqrestore+0xe/0x10 Sep 18 15:38:49 hostname kernel: [<ffffffffc013c0ab>] usb_stor_invoke_transport+0x3b/0x520 [usb_storage] Sep 18 15:38:49 hostname kernel: [<ffffffff9e3f441d>] ? list_del+0xd/0x30 Sep 18 15:38:49 hostname kernel: [<ffffffff9e7e12fc>] ? wait_for_completion_interruptible+0x17c/0x1a0 Sep 18 15:38:49 hostname kernel: [<ffffffffc013ad0e>] usb_stor_transparent_scsi_command+0xe/0x10 [usb_storage] Sep 18 15:38:49 hostname kernel: [<ffffffffc013da2f>] usb_stor_control_thread+0x15f/0x260 [usb_storage] Sep 18 15:38:49 hostname kernel: [<ffffffffc013d8d0>] ? storage_probe+0x320/0x320 [usb_storage] Sep 18 15:38:49 hostname kernel: [<ffffffffc013d8d0>] ? storage_probe+0x320/0x320 [usb_storage] Sep 18 15:38:49 hostname kernel: [<ffffffff9e0bf268>] kthread+0xd8/0xf0 Sep 18 15:38:49 hostname kernel: [<ffffffff9e7e44bf>] ret_from_fork+0x1f/0x40 Sep 18 15:38:49 hostname kernel: [<ffffffff9e0bf190>] ? kthread_worker_fn+0x170/0x170 Sep 18 15:38:49 hostname kernel: Code: 49 83 78 10 00 4d 8b 10 0f 84 2e 01 00 00 4d 85 d2 0f 84 25 01 00 00 49 63 41 20 49 8b 39 4c 01 d0 40 f6 c7 0f 0f 85 91 01 00 00 <48> 8 Sep 18 15:38:49 hostname kernel: RIP [<ffffffff9e21d6a0>] __kmalloc+0xa0/0x240 Sep 18 15:38:49 hostname kernel: RSP <ffff8800abd375e0> Sep 18 15:38:49 hostname kernel: ---[ end trace 911820535e11c961 ]--- The system is used as a virtualization platform; it crashes reliably when a guest domain using USB devices from the host (be they received via IOMMU-based PCI Passthrough of the USB Hub or standard KVM/QEMU host USB device passhthrough) is started or stopped. Usually the system crashes immediately with the stopping of such a guest domain, rarely it crashes at boot of such a guest domain. Only once did the system survive three guest domain shutdown/power on cycles without crashing. I'm happy to provide more details. Kernel 4.6 worked without this issue. Lots of folks seem to be running into this bug during the sunrpc do_cache_clean process: https://lkml.org/lkml/2016/7/26/499 Googling reveals lots of similar issues. I have been running with 4.7.5-200 for a few days now and it's hasn't repeated this issue for me. Anyone else having success with this version? (In reply to Colin Simpson from comment #12) > I have been running with 4.7.5-200 for a few days now and it's hasn't > repeated this issue for me. Anyone else having success with this version? Been OK for over 2 days now with 4.7.5. Unfortunately, this is still happening to me all the way through 4.7.8. Any news from anyone else? I didn't see any commits that led me to believe the issue had been addressed, so I'm afraid it's probably still a matter of time for others. It appears the issue occurs most frequently with kmalloc operations, for what it's worth. Well, I apologize for not getting back to this sooner (I thought I had closed it a while back), but this problem was seemingly resolved for me with kernel version 4.8. *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 24 kernel bugs. Fedora 25 has now been rebased to 4.10.9-100.fc24. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 26, and are still experiencing this issue, please change the version to Fedora 26. If you experience different issues, please open a new bug report for those. Not seen this in a while, closing. |