Created attachment 958756 [details] crash log from journalctl -b1 Description of problem: Frequent crashes since kernel-3.17 Version-Release number of selected component (if applicable): kernel-3.17.2-200.fc20.i686 How reproducible: random Steps to Reproduce: 1. Use Fedora with kernel-3.17.2 until it crashes 2. 3. Actual results: Kernel BUG crash Expected results: years of uptime Additional info: I didn't report immediately because it *could* have been one of those 3700 memory errors per DIMM * year - but it keeps happening.
Created attachment 958760 [details] Another crash log from same day I may have to go back to kernel-3.16
With kernel-3.17, I get about 3 crashes a day. I reverted to 3.16.7, and I've been up 2 days so far with no problem. So it is definitely the kernel. The last time 3.17 crashed/locked, I was able to log in via SSH. However, the Xorg process that got the fault could not be killed - even with kill -9. The system was also unable to shutdown.
Severity should be "high". I don't see how to change it.
can you attach a full dmesg.
I noticed a new update for both kernel-3.17.3, and xorg-x11-drv-intel-2.21.15-9.fc20.i686. I updated them both, and it has been 10 hours with no crash. I suspect it is fixed, but I'll wait a few more days before closing the issue.
Created attachment 960831 [details] dmesg output Not fixed. :-( I logged remotely, and got the output of dmesg - which looks the same as journalctl.
Created attachment 960832 [details] journalctl -b extract journalctl
Still broken in 3.17.4
Still broken in 3.17.7 Reproduced after every suspend/resume, random Xorg freeze, but system accesible over ssh Fedora release 21 (Twenty One) kernel 3.17.7-300.fc21.x86_64 BOOT_IMAGE=/boot/vmlinuz-3.17.7-300.fc21.x86_64 root=UUID=025db430-1171-47df-aa1e-448b72436eee ro vconsole.font=latarcyrheb-sun16 rhgb quiet LANG=ru_RU.UTF-8 selinux=0 resume=PARTUUID=b67ac7ae-04 zswap.enabled=1 kernel BUG at mm/memcontrol.c:6742! invalid opcode: 0000 [#2] SMP Modules linked in: xt_TCPMSS ccm hidp rfcomm zram ppdev parport_pc parport fuse uvcvideo bnep vmw_vsock_vmci_transport vsock videobuf2_vmalloc xt_multiport videobuf2_memops xt_conntrack nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack arc4 videobuf2_core vmw_vmci iwl3945 v4l2_common iTCO_wdt videodev coretemp iTCO_vendor_support btusb iwlegacy kvm_intel bluetooth media kvm mac80211 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel samsung_laptop snd_hda_controller joydev serio_raw snd_hda_codec sdhci_pci cfg80211 i2c_i801 sdhci snd_hwdep mmc_core snd_seq r592 memstick snd_seq_device lpc_ich rfkill snd_pcm mfd_core snd_timer wmi snd soundcore shpchp acpi_cpufreq binfmt_misc i915 firewire_ohci i2c_algo_bit drm_kms_helper firewire_core yenta_socket ata_generic drm pata_acpi sky2 crc_itu_t video [last unloaded: nf_nat] CPU: 0 PID: 27136 Comm: swapoff Tainted: G D 3.17.7-300.fc21.x86_64 #1 Hardware name: SAMSUNG ELECTRONICS CO., LTD. SQ45S70S/SQ45S70S, BIOS 18ST 02/04/2010 task: ffff8800bb5c75c0 ti: ffff880095634000 task.ti: ffff880095634000 RIP: 0010:[<ffffffff81203985>] [<ffffffff81203985>] mem_cgroup_migrate+0x1e5/0x250 RSP: 0018:ffff880095637d80 EFLAGS: 00010246 RAX: ffff88013a189e90 RBX: ffffea0002366200 RCX: 0000000000000041 RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000001109e90 RBP: ffff880095637db0 R08: 0000000000000086 R09: 0000000000000541 R10: ffff880095637d80 R11: 0000000000000541 R12: ffffea0004427a40 R13: ffffffff81c717f0 R14: ffff880095637e20 R15: ffffea0002366200 FS: 00007f46c3ddb840(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000f7704000 CR3: 000000012cdbb000 CR4: 00000000000007f0 Stack: ffff880095e7aaf0 00000000550f9894 ffff880095637db0 00000000550f9894 ffffea0004427a40 0000000000000000 ffff880095637df8 ffffffff811ace31 000a00d4813882a3 ffffea0002366202 ffff8800366bc2d0 ffffea0004427a40 Call Trace: [<ffffffff811ace31>] shmem_replace_page+0x131/0x1f0 [<ffffffff811af6d7>] shmem_unuse+0x257/0x320 [<ffffffff811d9204>] try_to_unuse+0x4f4/0x5c0 [<ffffffff811d95cf>] SyS_swapoff+0x1ef/0x5c0 [<ffffffff81135d96>] ? __audit_syscall_exit+0x1f6/0x2a0 [<ffffffff817473a9>] system_call_fastpath+0x16/0x1b Code: e8 11 53 f9 ff 0f 0b 0f 1f 80 00 00 00 00 48 c7 c6 20 44 a5 81 48 89 df e8 f9 52 f9 ff 0f 0b 48 c7 c6 88 43 a5 81 e8 eb 52 f9 ff <0f> 0b 48 8d 75 e4 4c 89 e7 89 55 d4 48 89 45 d8 e8 f6 bb ff ff RIP [<ffffffff81203985>] mem_cgroup_migrate+0x1e5/0x250 NAME=Fedora VERSION="21 (Twenty One)" ID=fedora ID_LIKE=fedora VERSION_ID=21 PRETTY_NAME="RFRemix 21 (Twenty One)" ANSI_COLOR="0;34" CPE_NAME="cpe:/o:fedoraproject:fedora:21" HOME_URL="https://fedoraproject.org/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="Fedora" REDHAT_BUGZILLA_PRODUCT_VERSION=21 REDHAT_SUPPORT_PRODUCT="Fedora" REDHAT_SUPPORT_PRODUCT_VERSION=21
This bug makes F21 almost unusable as there is no old kernel to go back to. It constantly logs these faults, and eventually locks.
Maybe I should report it for F21 separately?
*** Bug 1177764 has been marked as a duplicate of this bug. ***
Just to be clear, kernel-3.16.x works perfectly. Only kernel-3.17.x is broken. Only the Xorg process seems to be affected - but that is pretty important for a workstation.
*** Bug 1177885 has been marked as a duplicate of this bug. ***
kernel-3.17.8 still broken. Possible duplicate bug#1167114
kernel-3.18.3 still broken. Xorg still crashes and becomes unkillable after a few hours. Even if you ssh in and try to shutdown, it can't shutdown. But you can unmount filesystems that Xorg isn't using and do a final sync before powering off. bug#962386 is related I think, as that symptom also appears with the broken kernels. Theory: bug#962386 corrupts memory, eventually causing the full crash. Last known good kernel was 3.16.x
http://kojipkgs.fedoraproject.org/scratch/airlied/task_8760024/ is a test kernel with a suggested patch from upstream, please let me know if it works.
I've started a building a 32bit version of the above kernel for testing (still building): http://copr-fe.cloud.fedoraproject.org/coprs/moggers87/kernel-bz1165369/builds/
I need 32-bit to test. I'll watch koji.
Is the above build different from the 3.18.4-200.fc21 packages on koji.fedoraproject.org/koji ? It has a bz# in the release, which makes me think it has an additional patch.
I've installed kernel-3.18.5-200.fc21.i686 from bodhi, and it boots, and the intel driver errors at boot time are gone. The acid test is whether Xorg can stay up all day without crashing, and that test has commenced. :-) I'll go ahead and give this kernel a smiley on bodhi.
Still up after more than 12 hours, so I think this bug is fixed with kernel-3.18.5. I'll leave on overnight just to be sure, then close it.
I am getting this periodically (every few hours - so no performance problem): drm:[intel_connector_check_state] *ERROR* wrong connector dpms state drm:[intel_connector_check_state] *ERROR* active connector not linked to encoder drm:[intel_connector_check_state] *ERROR* encoder->connectors_active not set drm:[intel_connector_check_state] *ERROR* WARN_ON(!encoder->base.crtc) drm:[check_encoder_state] *ERROR* encoder's hw state doesn't match sw tracking (expected 0, found 1) No kernel oops or tainting, however.
This kernel does, however, break chronyd! Or rather, adjtimex kernel call. Not a concern of this bug, will report separately. I'm torn as to whether to give a frowny. I suppose I can use ntpdate while waiting for adjtimex to get fixed. I don't exactly *need* exact time...
Testing on F20 now. No tainting at Xorg startup, so looks promising.
Still crashes within an hour on F20. BUT the kernel oops is no longer tainted, so the crash can be reported automatically. And there have been lots of reports.
Well, the 32bit version above wouldn't build (it timed out - guess kernels take too long on copr?) I still get the crash on 3.18.5-200.fc21.i686, though it does seem to last longer without crashing.
Here is the oops from 3.18.5-201.fc21.i686 Feb 03 15:35:55 localhost kernel: CPU: 1 PID: 1258 Comm: Xorg.bin Not tainted 3.18.5-201.fc21.i686 #1 Feb 03 15:35:56 localhost kernel: Hardware name: Dell Inc. Inspiron 1525 /0U990C, BIOS A16 10/16/2008 Feb 03 15:35:56 localhost kernel: task: f42d2a00 ti: f4396000 task.ti: f4396000 Feb 03 15:35:56 localhost kernel: EIP: 0060:[<c05868ca>] EFLAGS: 00013292 CPU: 1 Feb 03 15:35:56 localhost kernel: EIP is at mem_cgroup_migrate+0x12a/0x180 Feb 03 15:35:56 localhost kernel: EAX: f7102f20 EBX: f4ca4a00 ECX: 00000041 EDX: 00000000 Feb 03 15:35:56 localhost kernel: ESI: f5435ca0 EDI: f5435ca0 EBP: f4397b14 ESP: f4397afc Feb 03 15:35:56 localhost kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Feb 03 15:35:56 localhost kernel: CR0: 8005003b CR2: 093d800c CR3: 2adec000 CR4: 000007d0 Feb 03 15:35:56 localhost kernel: Stack: Feb 03 15:35:56 localhost kernel: c0d6d540 f4ca4a00 591d5370 f4ca4a00 00000000 f5435ca0 f4397b40 c0549f35 Feb 03 15:35:57 localhost kernel: f4ca4a00 fff91000 f4397b8c 000000c4 c0d16100 f5435ca0 000000c4 ecac5528 Feb 03 15:35:57 localhost kernel: f5435ca0 f4397ba0 c054a63e 0000030e 00000312 f4397bb4 f42d2a00 00000000 Feb 03 15:35:57 localhost kernel: Call Trace: Feb 03 15:35:57 localhost kernel: [<c0549f35>] shmem_replace_page.isra.29+0x115/0x1d0 Feb 03 15:35:57 localhost kernel: [<c054a63e>] shmem_getpage_gfp+0x64e/0x7c0 Feb 03 15:35:57 localhost kernel: [<c054a880>] shmem_read_mapping_page_gfp+0x40/0x70 Feb 03 15:35:57 localhost kernel: [<f7fb6204>] i915_gem_object_get_pages_gtt+0x144/0x2d0 [i915] Feb 03 15:35:57 localhost kernel: [<f7fb1cf9>] i915_gem_object_get_pages+0x49/0xa0 [i915] Feb 03 15:35:57 localhost kernel: [<f7fb67a0>] i915_gem_object_pin+0x370/0x6a0 [i915] Feb 03 15:35:57 localhost kernel: [<c05a1360>] ? poll_select_copy_remaining+0x130/0x130 Feb 03 15:35:57 localhost kernel: [<f7fb74dc>] i915_gem_object_pin_to_display_plane+0x11c/0x1d0 [i915] Feb 03 15:35:57 localhost kernel: [<f7fe4a89>] intel_pin_and_fence_fb_obj+0xc9/0x150 [i915] Feb 03 15:35:57 localhost kernel: [<f7fe78cf>] __intel_set_mode+0x55f/0x14e0 [i915] Feb 03 15:35:57 localhost kernel: [<c05a1360>] ? poll_select_copy_remaining+0x130/0x130 Feb 03 15:35:57 localhost kernel: [<f7ff0383>] intel_set_mode+0x23/0x40 [i915] Feb 03 15:35:57 localhost kernel: [<f7ff1244>] intel_crtc_set_config+0x934/0xc70 [i915] Feb 03 15:35:57 localhost kernel: [<c0a50a84>] ? __ww_mutex_lock+0x14/0x90 Feb 03 15:35:57 localhost kernel: [<f7e0efde>] drm_mode_set_config_internal+0x4e/0xc0 [drm] Feb 03 15:35:57 localhost kernel: [<f7e133ec>] drm_mode_setcrtc+0x24c/0x590 [drm] Feb 03 15:35:57 localhost kernel: [<f7e131a0>] ? drm_mode_setplane+0x240/0x240 [drm] Feb 03 15:35:57 localhost kernel: [<f7e06035>] drm_ioctl+0x1f5/0x560 [drm] Feb 03 15:35:57 localhost kernel: [<f7e131a0>] ? drm_mode_setplane+0x240/0x240 [drm] Feb 03 15:35:57 localhost kernel: [<c06864a2>] ? policydb_write+0x7a2/0x7e0 Feb 03 15:35:57 localhost kernel: [<f7e05e40>] ? drm_getmap+0xc0/0xc0 [drm] Feb 03 15:35:57 localhost kernel: [<c05a07b2>] do_vfs_ioctl+0x302/0x4f0 Feb 03 15:35:57 localhost kernel: [<c0676c82>] ? inode_has_perm.isra.30+0x32/0x50 Feb 03 15:35:57 localhost kernel: [<c0676de7>] ? file_has_perm+0x97/0xa0 Feb 03 15:35:57 localhost kernel: [<c0677a1b>] ? selinux_file_ioctl+0x4b/0xe0 Feb 03 15:35:57 localhost kernel: [<c05a0a00>] SyS_ioctl+0x60/0x90 Feb 03 15:35:57 localhost kernel: [<c06864a2>] ? policydb_write+0x7a2/0x7e0 Feb 03 15:35:57 localhost kernel: [<c06864a2>] ? policydb_write+0x7a2/0x7e0 Feb 03 15:35:57 localhost kernel: [<c0a52a5f>] sysenter_do_call+0x12/0x12 Feb 03 15:35:57 localhost kernel: [<c06864a2>] ? policydb_write+0x7a2/0x7e0 Feb 03 15:35:57 localhost kernel: [<c06864a2>] ? policydb_write+0x7a2/0x7e0 Feb 03 15:35:57 localhost kernel: Code: ff ba b0 ad c1 c0 89 d8 e8 64 14 fd ff 0f 0b 66 90 ba e0 ad c1 c0 89 d8 e8 54 14 fd ff 0f 0b ba 80 ad c1 c0 89 f0 e8 46 14 fd ff <0f> 0b 8d 55 ec 89 f0 89 4d e8 e8 57 d1 ff ff c7 07 00 00 00 00 Feb 03 15:35:57 localhost kernel: EIP: [<c05868ca>] mem_cgroup_migrate+0x12a/0x180 SS:ESP 0068:f4397afc Feb 03 15:35:57 localhost abrt-dump-journal-oops[784]: Reported 1 kernel oopses to Abrt Feb 03 15:35:58 localhost abrt-dump-journal-oops[784]: abrt-dump-journal-oops: Found oopses: 1 Feb 03 15:35:58 localhost abrt-dump-journal-oops[784]: abrt-dump-journal-oops: Creating problem directories Feb 03 15:35:58 localhost abrt-server[10380]: Can't find a meaningful backtrace for hashing in '.' Feb 03 15:35:58 localhost abrt-server[10380]: Option 'DropNotReportableOopses' is not configured Feb 03 15:35:58 localhost abrt-server[10380]: Preserving oops '.' because DropNotReportableOopses is 'no' Feb 03 15:35:58 localhost abrt-server[10380]: Looking for kernel package Feb 03 15:35:58 localhost abrt-server[10380]: Kernel package kernel-core-3.18.5-201.fc21.i686 found
Grrrr - abrt says "There is not enough information to report this bug, but don't worry, there is nothing really wrong with your computer". Except that it can only by used via SSH, and can't be shutdown.
OK, I managed to recompile David Airlie's patched kernel for i686 and my laptop has been running without issue for over a day now (including suspends/resumes). I believe that patch solves this issue. I will try and remember to upload those RPMs tomorrow for others to test, feel free to email me if I forget :)
32 bit kernels can be found here: http://moggers.co.uk/~moggers87/rpms/ I've now been using this laptop for 4 days with this kernel and I've not had a single crash.
*** Bug 1189293 has been marked as a duplicate of this bug. ***
OK, the patch for this went into upstream with: commit f5e03a4989e80a86f8b514659dca8539132e6e09 Author: Michal Hocko <mhocko> Date: Thu Feb 5 12:25:14 2015 -0800 memcg, shmem: fix shmem migration to use lrucare which landed in the 3.19 upstream release. It's fixed in rawhide. It has also been queued for 3.18.7, so it should be fixed in the next update we do for Fedora 20/21.
Thanks for the update Josh!
*** Bug 1191255 has been marked as a duplicate of this bug. ***
3.18.7 is a huge improvement - Xorg no longer becomes unkillable, and the switch console keys still work, so I can go to another console to kill it and get a new login screen (without having to ssh in from another computer). And of course, this particular kernel oops no longer appears. However, Xorg still crashes and loses all my work in progress. And the problem still goes away when I run kernel-3.16.7. So - start a new bug?
If you aren't seeing this exact kernel backtrace with 3.18.7, yes please file a new bug with the relevant details.
Filed bug#1192550 against xorg-x11-drv-intel (my best guess - although problem goes away with kernel-3.16.7).
(In reply to Josh Boyer from comment #37) > If you aren't seeing this exact kernel backtrace with 3.18.7, yes please > file a new bug with the relevant details. Thanks for fixing this bug. It is now a user-space problem (at least Xorg is killable), and there is a workaround.
This message is a reminder that Fedora 21 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 21. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '21'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 21 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.