Fedora Account System
Red Hat Associate
Red Hat Customer
Created attachment 1063176 [details] Kernel log with a bunch of oopses Description of problem: I can't seem to have the machine remain stable more than a few hours. It generally locks up when idle, but sometimes while using it, it locks up solid, possibly two or 3 times a day. I've managed to capture some Oops after reboot from the systemd log that may or may not be related to the lockup, I'll attach the full dmesg. Version-Release number of selected component (if applicable): It's running gnome3 (wayland). The machine was installed via korora but it's basically a standard fc22 package set (kernel, wayland, ...)
Was this machine freshly installed with 4.1.4, or was it running fine with an older kernel version and started misbehaving with 4.1.4? I've seen gen3 X1s here at Flock running just fine with recent kernels. The random nature of the failures coupled with the fact that I've seen identical hardware running without that kind of problem makes me wonder if the hardware you have is completely stable.
Good question.... I installed with 4.1.3 but it almost immediately updated itself to 4.1.4. I will try booting in 4.1.3 again and see if any of this happens. Otherwise, any kernel you have on these machines that seems to work ? Jerome & Stephane also suggested it could be related to the Mesa version.
Good question.... I installed with 4.1.3 but it almost immediately updated itself to 4.1.4. I will try booting in 4.1.3 again and see if any of this happens. Otherwise, any kernel you have on these machines that seems to work ? Jerome & Stephane also suggested it could be related to the Mesa version. If you have them, give me the kernel & mesa/drm versions from a machine that hasn't reported any issue and I'll try the same versions here and report. If they work I might be able to try a bisection myself and report the results (depending on time, this is my main work laptop now).
So I rebooted in 4.1.3, and started some package downloads. It was large and slow (corporate UDP based VPN, so *very* slow :-) though no kernel module involved at least), and when I came back after dinner, it was dead. Screen was blank, which is expected, but it wouldn't come up. There's no telling whether it was done downloading when it died or not. After a reboot, I dug the following oops with journalctl -k -b-1, it could be unrelated or it could be the same thing, I'll go back to 4.1.4 see if I can observe something similar. Aug 15 18:47:38 pasglop kernel: ------------[ cut here ]------------ Aug 15 18:47:38 pasglop kernel: WARNING: CPU: 1 PID: 501 at drivers/net/wireless/iwlwifi/mvm/tx.c:1000 iwl_mvm_rx_ba_notif+0x3f1/0x Aug 15 18:47:38 pasglop kernel: Modules linked in: tun rfcomm fuse hidp ccm cmac nf_conntrack_netbios_ns nf_conntrack_broadcast ip6 Aug 15 18:47:38 pasglop kernel: uvcvideo joydev iwlwifi snd_pcm videobuf2_vmalloc videobuf2_core videobuf2_memops v4l2_common vide Aug 15 18:47:38 pasglop kernel: CPU: 1 PID: 501 Comm: irq/54-iwlwifi Not tainted 4.1.3-200.fc22.x86_64 #1 Aug 15 18:47:38 pasglop kernel: Hardware name: LENOVO 20BTCTO1WW/20BTCTO1WW, BIOS N14ET31W (1.09 ) 06/26/2015 Aug 15 18:47:38 pasglop kernel: 0000000000000000 000000000dca5ea0 ffff8800b92afba8 ffffffff8179b4cd Aug 15 18:47:38 pasglop kernel: 0000000000000000 0000000000000000 ffff8800b92afbe8 ffffffff810a163a Aug 15 18:47:38 pasglop kernel: 0000000000000015 ffff880094c47100 ffff8802234893a8 ffff8800b92afc68 Aug 15 18:47:38 pasglop kernel: Call Trace: Aug 15 18:47:38 pasglop kernel: [<ffffffff8179b4cd>] dump_stack+0x45/0x57 Aug 15 18:47:38 pasglop kernel: [<ffffffff810a163a>] warn_slowpath_common+0x8a/0xc0 Aug 15 18:47:38 pasglop kernel: [<ffffffff810a176a>] warn_slowpath_null+0x1a/0x20 Aug 15 18:47:38 pasglop kernel: [<ffffffffa06d82d1>] iwl_mvm_rx_ba_notif+0x3f1/0x5d0 [iwlmvm] Aug 15 18:47:38 pasglop kernel: [<ffffffff810b99b5>] ? __queue_work+0x275/0x370 Aug 15 18:47:38 pasglop kernel: [<ffffffffa06d0ef4>] iwl_mvm_rx_dispatch+0x184/0x250 [iwlmvm] Aug 15 18:47:38 pasglop kernel: [<ffffffffa0540c48>] iwl_pcie_irq_handler+0xa68/0xf20 [iwlwifi] Aug 15 18:47:38 pasglop kernel: [<ffffffff810de806>] ? pick_next_task_fair+0x186/0x980 Aug 15 18:47:38 pasglop kernel: [<ffffffff810fae80>] ? irq_finalize_oneshot.part.29+0xf0/0xf0 Aug 15 18:47:38 pasglop kernel: [<ffffffff810faea0>] irq_thread_fn+0x20/0x50 Aug 15 18:47:38 pasglop kernel: [<ffffffff810fb15f>] irq_thread+0x13f/0x170 Aug 15 18:47:38 pasglop kernel: [<ffffffff810faf70>] ? wake_threads_waitq+0x30/0x30 Aug 15 18:47:38 pasglop kernel: [<ffffffff810fb020>] ? irq_thread_dtor+0xb0/0xb0 Aug 15 18:47:38 pasglop kernel: [<ffffffff810c0b88>] kthread+0xd8/0xf0 Aug 15 18:47:38 pasglop kernel: [<ffffffff810c0ab0>] ? kthread_worker_fn+0x180/0x180 Aug 15 18:47:38 pasglop kernel: [<ffffffff817a1e62>] ret_from_fork+0x42/0x70 Aug 15 18:47:38 pasglop kernel: [<ffffffff810c0ab0>] ? kthread_worker_fn+0x180/0x180 Aug 15 18:47:38 pasglop kernel: ---[ end trace 4b9a001261fde05d ]--- There are also some dubious xhci originated messages but at this point I doubt they are related. I'll attach the entire dmesg. I'll then go back to 4.1.4 see if I can snatch another oops in case we start to see a pattern around iwlwifi.
Created attachment 1063237 [details] Kernel log of 4.1.3 with oops
Note that this is only a WARN_ON, but it could be an indication of something going wrong...
So back to 4.1.4, same deal: slow download from that VPN, and same symptom of dead machine with blank screen. This time a different oops: Aug 15 20:44:09 pasglop kernel: BUG: unable to handle kernel NULL pointer dereference at (null) Aug 15 20:44:09 pasglop kernel: IP: [<ffffffff81640c7e>] hidinput_disconnect+0x2e/0xb0 Aug 15 20:44:09 pasglop kernel: PGD 0 Aug 15 20:44:09 pasglop kernel: Oops: 0000 [#1] SMP Aug 15 20:44:09 pasglop kernel: Modules linked in: tun hidp rfcomm fuse ccm cmac nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_RE Aug 15 20:44:09 pasglop kernel: btusb snd_seq videobuf2_vmalloc videobuf2_core snd_seq_device videobuf2_memops btbcm v4l2_common snd_pcm joydev vide Aug 15 20:44:09 pasglop kernel: CPU: 3 PID: 2205 Comm: khidpd_17ef6002 Not tainted 4.1.4-200.fc22.x86_64 #1 Aug 15 20:44:09 pasglop kernel: Hardware name: LENOVO 20BTCTO1WW/20BTCTO1WW, BIOS N14ET31W (1.09 ) 06/26/2015 Aug 15 20:44:09 pasglop kernel: task: ffff8801fd1ebb40 ti: ffff8801fe774000 task.ti: ffff8801fe774000 Aug 15 20:44:09 pasglop kernel: RIP: 0010:[<ffffffff81640c7e>] [<ffffffff81640c7e>] hidinput_disconnect+0x2e/0xb0 Aug 15 20:44:09 pasglop kernel: RSP: 0018:ffff8801fe777c08 EFLAGS: 00010296 Aug 15 20:44:09 pasglop kernel: RAX: 0000000000000000 RBX: ffff8801fc89e000 RCX: 000000018080007c Aug 15 20:44:09 pasglop kernel: RDX: 000000018080007d RSI: ffffea000270e3c0 RDI: 0000000040000000 Aug 15 20:44:09 pasglop kernel: RBP: ffff8801fe777c28 R08: 000000009c38f901 R09: 000000018080007c Aug 15 20:44:09 pasglop kernel: R10: ffff88009c38f9e0 R11: 0000000000000000 R12: ffff8801fc89f8e8 Aug 15 20:44:09 pasglop kernel: R13: ffff8801fc89e000 R14: ffff8801fc89e000 R15: ffff8801fc89f8d0 Aug 15 20:44:09 pasglop kernel: FS: 0000000000000000(0000) GS:ffff88022dcc0000(0000) knlGS:0000000000000000 Aug 15 20:44:09 pasglop kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 15 20:44:09 pasglop kernel: CR2: 0000000000000000 CR3: 0000000001c0b000 CR4: 00000000003407e0 Aug 15 20:44:09 pasglop kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Aug 15 20:44:09 pasglop kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Aug 15 20:44:09 pasglop kernel: Stack: Aug 15 20:44:09 pasglop kernel: ffff8801fc89e000 ffff8801fc89f8e8 ffff8801fc89e000 ffff8801fc89f8b8 Aug 15 20:44:09 pasglop kernel: ffff8801fe777c48 ffffffff8163d970 00000000fffffffc ffff8801fc89f8e8 Aug 15 20:44:09 pasglop kernel: ffff8801fe777c88 ffffffff8163da45 ffff8801fe777c88 ffff8801fc89f8e8 Aug 15 20:44:09 pasglop kernel: Call Trace: Aug 15 20:44:09 pasglop kernel: [<ffffffff8163d970>] hid_disconnect+0x80/0x90 Aug 15 20:44:09 pasglop kernel: [<ffffffff8163da45>] hid_device_remove+0xc5/0xe0 Aug 15 20:44:09 pasglop kernel: [<ffffffff814ec107>] __device_release_driver+0x87/0x120 Aug 15 20:44:09 pasglop kernel: [<ffffffff814ec1c3>] device_release_driver+0x23/0x30 Aug 15 20:44:09 pasglop kernel: [<ffffffff814eba38>] bus_remove_device+0x108/0x180 Aug 15 20:44:09 pasglop kernel: [<ffffffff814e7bc1>] device_del+0x141/0x270 Aug 15 20:44:09 pasglop kernel: [<ffffffff8163dae7>] hid_destroy_device+0x27/0x60 Aug 15 20:44:09 pasglop kernel: [<ffffffffa08e0b0b>] hidp_session_remove+0x4b/0xb0 [hidp] Aug 15 20:44:09 pasglop kernel: [<ffffffffa045af1b>] l2cap_unregister_user+0x5b/0x70 [bluetooth] Aug 15 20:44:09 pasglop kernel: [<ffffffffa08e04b0>] hidp_session_thread+0x560/0xaf0 [hidp] Aug 15 20:44:09 pasglop kernel: [<ffffffff810ce710>] ? wake_up_state+0x20/0x20 Aug 15 20:44:09 pasglop kernel: [<ffffffff810ce710>] ? wake_up_state+0x20/0x20 Aug 15 20:44:09 pasglop kernel: [<ffffffffa08dff50>] ? hidp_open+0x10/0x10 [hidp] Aug 15 20:44:09 pasglop kernel: [<ffffffff810c0ba8>] kthread+0xd8/0xf0 Aug 15 20:44:09 pasglop kernel: [<ffffffff810c0ad0>] ? kthread_worker_fn+0x180/0x180 Aug 15 20:44:09 pasglop kernel: [<ffffffff817a20a2>] ret_from_fork+0x42/0x70 Aug 15 20:44:09 pasglop kernel: [<ffffffff810c0ad0>] ? kthread_worker_fn+0x180/0x180 Aug 15 20:44:09 pasglop kernel: Code: 00 00 55 48 89 e5 41 56 49 89 fe 41 55 41 54 53 48 8b bf b0 1b 00 00 48 85 ff 74 31 e8 3c 1a fb ff 49 8b 86 b0 Aug 15 20:44:09 pasglop kernel: RIP [<ffffffff81640c7e>] hidinput_disconnect+0x2e/0xb0 Aug 15 20:44:09 pasglop kernel: RSP <ffff8801fe777c08> Aug 15 20:44:09 pasglop kernel: CR2: 0000000000000000 Aug 15 20:44:09 pasglop kernel: ---[ end trace b154a11c8cbf6493 ]---
The wide variety of the oopses makes me feel like we may have a case of memory corruption caused by one of the drivers, though it's hard to tell which one.
And another one just as I diconnected by BT mouse and rebooted. I'll run with BT off for a while see if it could be related. Aug 15 21:12:56 pasglop kernel: usb 1-6: reset full-speed USB device number 2 using xhci_hcd Aug 15 21:13:39 pasglop kernel: usb 1-7: USB disconnect, device number 3 Aug 15 21:13:39 pasglop kernel: Bluetooth: hci0 sending frame failed (-19) Aug 15 21:13:39 pasglop kernel: Bluetooth: hci0: turning off Intel device LED failed (-19) Aug 15 21:13:49 pasglop kernel: BUG: unable to handle kernel paging request at 000000310101046c Aug 15 21:13:50 pasglop kernel: IP: [<ffffffff8120f034>] __kmalloc_node_track_caller+0x1c4/0x320 Aug 15 21:13:50 pasglop kernel: PGD 0 Aug 15 21:13:50 pasglop kernel: Oops: 0000 [#1] SMP Aug 15 21:13:50 pasglop kernel: Modules linked in: hidp rfcomm fuse ccm cmac nf_conntrack_netbios_ns nf_conntrack_broadcast Aug 15 21:13:50 pasglop kernel: videobuf2_memops v4l2_common snd_hda_core cfg80211 snd_hwdep videodev snd_seq snd_seq_devi Aug 15 21:13:50 pasglop kernel: CPU: 0 PID: 640 Comm: dbus-daemon Not tainted 4.1.4-200.fc22.x86_64 #1 Aug 15 21:13:50 pasglop kernel: Hardware name: LENOVO 20BTCTO1WW/20BTCTO1WW, BIOS N14ET31W (1.09 ) 06/26/2015 Aug 15 21:13:50 pasglop kernel: task: ffff880221e1eca0 ti: ffff8800b6528000 task.ti: ffff8800b6528000 Aug 15 21:13:50 pasglop kernel: RIP: 0010:[<ffffffff8120f034>] [<ffffffff8120f034>] __kmalloc_node_track_caller+0x1c4/0x32 Aug 15 21:13:50 pasglop kernel: RSP: 0018:ffff8800b652ba28 EFLAGS: 00010246 Aug 15 21:13:50 pasglop kernel: RAX: 000000310101046c RBX: 00000000000106d0 RCX: ffffffff8166e567 Aug 15 21:13:50 pasglop kernel: RDX: 0000000000003fdc RSI: 0000000000000000 RDI: 000000000001ac00 Aug 15 21:13:50 pasglop kernel: RBP: ffff8800b652ba88 R08: ffff88022dc1ac00 R09: ffff880225403500 Aug 15 21:13:50 pasglop kernel: R10: ffff8800ad377300 R11: 000000310101046c R12: 00000000000106d0 Aug 15 21:13:50 pasglop kernel: R13: 0000000000000240 R14: 00000000ffffffff R15: ffff880225403500 Aug 15 21:13:50 pasglop kernel: FS: 00007fb373ad4880(0000) GS:ffff88022dc00000(0000) knlGS:0000000000000000 Aug 15 21:13:50 pasglop kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 15 21:13:50 pasglop kernel: CR2: 000000310101046c CR3: 00000000b8882000 CR4: 00000000003407f0 Aug 15 21:13:50 pasglop kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Aug 15 21:13:50 pasglop kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Aug 15 21:13:50 pasglop kernel: Stack: Aug 15 21:13:50 pasglop kernel: ffff880000000000 ffff88022dc57800 ffff8800b652ba58 ffff88022dc57800 Aug 15 21:13:50 pasglop kernel: ffff8800b97aeca0 ffffffff8166e567 0000000000000046 ffff8800ad377300 Aug 15 21:13:50 pasglop kernel: ffff8800b652baff 00000000000004d0 0000000000000240 00000000ffffffff Aug 15 21:13:50 pasglop kernel: Call Trace: Aug 15 21:13:50 pasglop kernel: [<ffffffff8166e567>] ? __alloc_skb+0x87/0x210 Aug 15 21:13:50 pasglop kernel: [<ffffffff8166d551>] __kmalloc_reserve.isra.28+0x31/0x90 Aug 15 21:13:50 pasglop kernel: [<ffffffff8166e53b>] ? __alloc_skb+0x5b/0x210 Aug 15 21:13:50 pasglop kernel: [<ffffffff8166e567>] __alloc_skb+0x87/0x210 Aug 15 21:13:50 pasglop kernel: [<ffffffff8166e907>] alloc_skb_with_frags+0x57/0x1f0 Aug 15 21:13:50 pasglop kernel: [<ffffffff8166949e>] sock_alloc_send_pskb+0x1fe/0x280 Aug 15 21:13:50 pasglop kernel: [<ffffffff810e49b4>] ? __wake_up_sync_key+0x54/0x70 Aug 15 21:13:50 pasglop kernel: [<ffffffff817398fd>] unix_stream_sendmsg+0x29d/0x420 Aug 15 21:13:50 pasglop kernel: [<ffffffff8166447d>] sock_sendmsg+0x3d/0x50 Aug 15 21:13:50 pasglop kernel: [<ffffffff81664e93>] ___sys_sendmsg+0x2b3/0x2c0 Aug 15 21:13:50 pasglop kernel: [<ffffffff8120b786>] ? kmem_cache_free+0x1f6/0x220 Aug 15 21:13:50 pasglop kernel: [<ffffffff81663423>] ? sock_destroy_inode+0x33/0x40 Aug 15 21:13:50 pasglop kernel: [<ffffffff8120b786>] ? kmem_cache_free+0x1f6/0x220 Aug 15 21:13:50 pasglop kernel: [<ffffffff8124310f>] ? dentry_free+0x5f/0xb0 Aug 15 21:13:50 pasglop kernel: [<ffffffff81243b31>] ? __dentry_kill+0x151/0x1f0 Aug 15 21:13:50 pasglop kernel: [<ffffffff81665997>] __sys_sendmsg+0x57/0xa0 Aug 15 21:13:50 pasglop kernel: [<ffffffff816659f2>] SyS_sendmsg+0x12/0x20 Aug 15 21:13:50 pasglop kernel: [<ffffffff817a1cae>] system_call_fastpath+0x12/0x71 Aug 15 21:13:50 pasglop kernel: Code: 83 c4 38 48 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 44 00 00 49 63 41 20 49 8 Aug 15 21:13:50 pasglop kernel: RIP [<ffffffff8120f034>] __kmalloc_node_track_caller+0x1c4/0x320
BTW. Regarding HW stability, the machine is completely stable under Ubuntu 15.04, which I used for a few days before I decided to try out Fedora due to generally more up to date packages. It also seems stable under Windows 10 but I haven't spent *that* much time with it.
Haven't had a problem since I disabled BT, let's see if that holds but this: https://bbs.archlinux.org/viewtopic.php?id=200691 Makes me think it could be the culprit.
http://www.spinics.net/lists/linux-mm/msg92700.html That seems to be the same problem. I'll try to build a fedora kernel with that patch applied in the next couple of days and see if that helps as well.
There is a potentially similar issue in https://bugzilla.redhat.com/show_bug.cgi?id=1248741 which is fixed in the recent 4.1.5 kernel package: https://admin.fedoraproject.org/updates/kernel-4.1.5-200.fc22 Might be worth a try but I haven't checked to see if the patch from that thread is actually in that update.
Ok, the fix for the HID input problem isn't in the upstream stable 4.1.5 as far as I can tell but maybe Fedora included it (how do you check short of downloading the package source ?) I'll give it a go later today or tomorrow.
From the changelog, it looks like Laura applied it. Note that i've been up now since yesterday with 4.1.4 without the BT mouse so it does look like it.
You can check the package git tree and see what patches are in there. Of course, this is the kernel so there are a number of patches, but not really that many. http://pkgs.fedoraproject.org/cgit/kernel.git
It's in there, Laura added it: http://pkgs.fedoraproject.org/cgit/kernel.git/commit/?h=f22&id=9d542d8c1db8cca1613de49eb040b8b474e76d01 I've also been running it since this morning with the BT mouse connected and so far so good ... give it til tomorrow and we can close the bug as fixed.
*** This bug has been marked as a duplicate of bug 1248741 ***