Description of problem: The laptop is an ASUS Zenbook UX303UB, running up-to-date Fedora 25 with the standard GNOME Wayland desktop. Leaving the desktop idle until screen blank often (usually) renders the machine unresponsive locally. Mouse movements and keypresses do not unblank the screen. Machine is still "up", as I am able to SSH into it. This is a regression - it has only just started happening recently, and I have been running F25 since alpha. I am not sure which component is responsible, but am reporting against the kernel since the following message is invariably found in the journal when the problem occurs: kernel: [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle Version-Release number of selected component (if applicable): 4.10.8-200.fc25.x86_64 How reproducible: Almost always Steps to Reproduce: 1. Boot into desktop 2. Wait for screen blank 3. Try to unblank screen via keypresses/mouse movements Actual results: Screen remains blank Expected results: Screen wakes up Additional info:
I seem to have just encountered this now without the screen being blanked - the laptop simply stops responding to any input. ABRT registered a kernel oops associated with the event, but claims the backtrace "does not contain enough meaningful function frames to be reported". The reason for the crash is given as: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 Dmesg contains the oops: [344807.129754] Oops: 0002 [#1] SMP [344807.129766] Modules linked in: uinput cmac rfcomm fuse ccm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables bnep vfat fat snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_sst_match snd_hda_codec_hdmi intel_rapl arc4 x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_conexant snd_hda_codec_generic snd_soc_core coretemp [344807.129992] kvm_intel snd_compress snd_pcm_dmaengine kvm ac97_bus snd_hda_intel iwlmvm snd_hda_codec iTCO_wdt iTCO_vendor_support mac80211 asus_nb_wmi irqbypass crct10dif_pclmul crc32_pclmul asus_wmi sparse_keymap snd_hda_core ghash_clmulni_intel intel_cstate intel_uncore snd_hwdep iwlwifi snd_seq uvcvideo snd_seq_device intel_rapl_perf snd_pcm videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 cfg80211 videobuf2_core videodev btusb hci_uart snd_timer snd btrtl media soundcore i2c_i801 btbcm btqca btintel bluetooth mei_me mei processor_thermal_device joydev intel_soc_dts_iosf shpchp intel_pch_thermal int3403_thermal rfkill intel_lpss_acpi intel_lpss acpi_als pinctrl_sunrisepoint pinctrl_intel kfifo_buf industrialio int3402_thermal tpm_crb int340x_thermal_zone int3406_thermal tpm_tis asus_wireless [344807.130208] tpm_tis_core int3400_thermal acpi_thermal_rel acpi_pad tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc hid_multitouch btrfs xor i915 nouveau raid6_pq mxm_wmi ttm i2c_algo_bit drm_kms_helper crc32c_intel drm serio_raw wmi video i2c_hid fjes [344807.130282] CPU: 1 PID: 15666 Comm: totem Not tainted 4.10.8-200.fc25.x86_64 #1 [344807.130305] Hardware name: ASUSTeK COMPUTER INC. UX303UB/UX303UB, BIOS UX303UB.206 03/02/2016 [344807.130331] task: ffff9c8b762d8000 task.stack: ffffc0d186178000 [344807.130368] RIP: 0010:gen8_ppgtt_alloc_page_directories.isra.36+0x115/0x250 [i915] [344807.130393] RSP: 0018:ffffc0d18617b880 EFLAGS: 00010246 [344807.130410] RAX: ffff9c8a33969280 RBX: 0000000000000003 RCX: 0000000000000003 [344807.130433] RDX: 0000000000000000 RSI: ffff9c8b745d5000 RDI: ffff9c8b744d8000 [344807.130457] RBP: ffffc0d18617b8d8 R08: 0000000000000000 R09: 0000000000000000 [344807.130480] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9c8b63766000 [344807.130502] R13: ffff9c8ae2599f10 R14: 00000000fc379000 R15: 0000000000800000 [344807.130525] FS: 00007fe2f6214a80(0000) GS:ffff9c8b83c80000(0000) knlGS:0000000000000000 [344807.130550] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [344807.130568] CR2: 0000000000000018 CR3: 00000001fa9a6000 CR4: 00000000003406e0 [344807.130590] Call Trace: [344807.130615] gen8_alloc_va_range_3lvl+0xfb/0x9e0 [i915] [344807.130634] ? sg_free_table+0x5c/0x70 [344807.130647] ? sg_next+0x4/0x30 [344807.130660] ? swiotlb_map_sg_attrs+0x49/0x110 [344807.130688] gen8_alloc_va_range+0x23d/0x470 [i915] [344807.130719] i915_vma_bind+0x7e/0x170 [i915] [344807.130747] __i915_vma_do_pin+0x2f1/0x4a0 [i915] [344807.130776] i915_gem_execbuffer_reserve_vma.isra.30+0x144/0x1b0 [i915] [344807.130809] i915_gem_execbuffer_reserve.isra.31+0x44a/0x480 [i915] [344807.130841] i915_gem_do_execbuffer.isra.37+0x652/0x1820 [i915] [344807.130861] ? ___slab_alloc+0x294/0x540 [344807.130880] ? enqueue_entity+0x113/0x6b0 [344807.130915] i915_gem_execbuffer2+0xc5/0x240 [i915] [344807.130950] drm_ioctl+0x21b/0x4c0 [drm] [344807.131001] ? i915_gem_execbuffer+0x310/0x310 [i915] [344807.131028] ? pick_next_task_fair+0x324/0x4d0 [344807.131053] do_vfs_ioctl+0xa3/0x5f0 [344807.131074] SyS_ioctl+0x79/0x90 [344807.131095] entry_SYSCALL_64_fastpath+0x1a/0xa9 [344807.131119] RIP: 0033:0x7fe2edfb7787 [344807.131138] RSP: 002b:00007ffd5eaac918 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [344807.131183] RAX: ffffffffffffffda RBX: 000055dc72621040 RCX: 00007fe2edfb7787 [344807.131216] RDX: 00007ffd5eaac960 RSI: 00000000c0406469 RDI: 000000000000000c [344807.131247] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 [344807.131280] R10: 0000000000000550 R11: 0000000000000246 R12: 00007fe2f40c63d8 [344807.131312] R13: 00007ffd5eaabcc0 R14: 000055dc7224c4e0 R15: 00007fe2f40bf6a0 [344807.131345] Code: e6 48 8b 90 20 03 00 00 48 8b b8 d8 02 00 00 48 8b 52 08 48 83 ca 03 e8 aa cc ff ff 48 8b 45 b0 48 8b 4d c8 48 8b 10 48 8b 45 d0 <4c> 89 24 ca 48 0f ab 08 0f 1f 44 00 00 e9 53 ff ff ff 65 8b 05 [344807.131492] RIP: gen8_ppgtt_alloc_page_directories.isra.36+0x115/0x250 [i915] RSP: ffffc0d18617b880 [344807.131534] CR2: 0000000000000018 [344807.140522] ---[ end trace 8c9f3becf22cb14e ]---
I use a Dell Precision 5520 with Core i7 7820HQ, Fedora 25 kernel 4.10.10-200.fc25.x86_64, when I go to Gnome Settings -> Details, the graphical environment hangs. In journal : same message "[drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle" Thanks for you help.
This seems to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1441906
(In reply to Aaron Sowry from comment #1) > I seem to have just encountered this now without the screen being blanked - > the laptop simply stops responding to any input. ABRT registered a kernel > oops associated with the event, but claims the backtrace "does not contain > enough meaningful function frames to be reported". The reason for the crash > ... I don't think it is - I see this error all the time, but it comes without the stack trace - the only related info in dmesg is that 'i915_gem_idle_work_handler' message. So because of that, I don't think it's a duplication of https://bugzilla.redhat.com/show_bug.cgi?id=1441906
I'm seeing the same behavior as the initial bug report on this bug and I'm not seeing anything like 1441906. I have no abrt report of the "hang" nor do I see the kernel panic from 1441906. As the commenter in #4 states, I think this and 1441906 are potentially separate issues.
I forgot to add I'm seeing the behavior on an XPS 13 (9360) with the qhd+ screen in wayland. I just switched to Xorg to see if the problem occurs there.
The backtrace I supplied as part of comment #1 does look very much like RH 1441906, and this is the one I seem to be encountering now. I haven't seen the "Timeout waiting for engines to idle" message for a while now. Kernel 4.10.15-200.fc25.x86_64. No idea if this is the same bug or not, but the identical timing/behaviour of them seems to suggest a dupe IMO.
Interesting. I've never seen that oops before but I can reproduce the blank screen issue simply by leaving my XPS 13 up and running for about 45 min. Come back and it's unresponsive but still reachable by network.
Also seeing this in a Lenovo X1 Carbon with Wayland on F25; attached stack trace on lenovo-x1.txt on 4.10.13-200
Created attachment 1281766 [details] full kernel log and system data for this crash on a lenovo X1 carbon
I'm watching both this bug and #1441906. I'm pretty sure they're separate issues. I'm able to reproduce this bug, the drm:i915_gem_idle_work_handler, using kernel 4.11.2-200.fdo99295.fc25.x86_64, which was provided as a potential solution for #1441906.
Gordon - as of this morning I can confirm your findings. Just received the drm:i915_gem_idle_work_handler error with Dave's 4.11.2-200.fdo99295.fc25.x86_64. So separate issues then, I guess.
I tried the latest 4.10.17-200 today and had the same thing. I logged into my xps 13, and left it alone and came back about 20 min later and input was frozen. ssh'ed in and the only relevant message the journal was: May 26 11:12:01 hostname kernel: [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle no oopses at this point. I did a 'shutdown -r now' to reboot it and it successfully rebooted but did log a couple of slow path's and drm timeouts. Will attach those to this report.
Created attachment 1282701 [details] relevant dmesg sections kernel messages during shutdown/reboot after "hung" has occurred.
A "me too" on a Lenovo ThinkPad T470 i5-7200U running 4.11.3-200.fc25.x86_64 and GNOME Xorg. After leaving the screen locked for 2 hours I found the display did not come back, however the machine responded on the network. Relevant line in the journal: Lip 07 11:48:01 yakul-local kernel: [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle No oopses except a probably unrelated message: Lip 07 09:34:44 yakul-local kernel: Uhhuh. NMI received for unknown reason 2c on CPU 0. Lip 07 09:34:44 yakul-local kernel: Do you have a strange power saving mode enabled? Lip 07 09:34:44 yakul-local kernel: Dazed and confused, but trying to continue
I'm still looking for useful information. So far, I've confirmed that X11 has this problem, just like Wayland, and that while the error text "*ERROR* Timeout waiting for engines to idle" was added in 4.10, the error itself seems to be present in 4.9. Under older kernels, I see a NULL pointer dereference instead of the error: Jun 06 23:02:08 hurricane.ee.washington.edu kernel: [drm] GPU HANG: ecode 9:0:0xfffffffe, in Xorg [1653], reason: Hang on render ring, action: reset Jun 06 23:02:08 hurricane.ee.washington.edu kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Jun 06 23:02:08 hurricane.ee.washington.edu kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Jun 06 23:02:08 hurricane.ee.washington.edu kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Jun 06 23:02:08 hurricane.ee.washington.edu kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Jun 06 23:02:08 hurricane.ee.washington.edu kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error Jun 06 23:02:08 hurricane.ee.washington.edu kernel: drm/i915: Resetting chip after gpu hang Jun 06 23:02:08 hurricane.ee.washington.edu kernel: [drm] RC6 on Jun 06 23:02:08 hurricane.ee.washington.edu kernel: [drm] GuC firmware load skipped Jun 06 23:02:18 hurricane.ee.washington.edu kernel: drm/i915: Resetting chip after gpu hang Jun 06 23:02:18 hurricane.ee.washington.edu kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000070 Jun 06 23:02:18 hurricane.ee.washington.edu kernel: IP: [<ffffffffc03158f3>] reset_common_ring+0xc3/0x170 [i915] Today I booted my system with the kernel arg "i915.enable_rc6=0". I expect to know tomorrow if that resolves the problem. Any additional testing from other users would be useful as well.
Confirmed. Booting with kernel arg "i915.enable_rc6=0" appears to effectively solve this problem.
Created attachment 1288224 [details] kernel messages Booting with enable_rc6=0 changes the behavior for me. When the screen "locks" keyboard input is not hung. I'm able to ctrl-alt-fX out to a virtual console and hit ctrl-alt-del to reboot even though I don't see anything until just before the reboot. During this time, I get a bunch of kernel errors (see attached output).
Andy, can you post the content of /proc/cmdline? Asking because the correct arg is "i915.enable_rc6=0" The i915 prefix is required in order for enable_rc6 to be interpreted as an argument to the i915 module. If you're booting with just "enable_rc6=0", that's not going to fix the problem.
I'm not doing it via the kernel command line. I'm doing it via module configuration: $ cat /etc/modprobe.d/i915-local.conf options i915 enable_rc6=0 And then rebuilt my initramfs via dracut to ensure the module parameters take affect at boot time. i915 is loaded as a module on fedora. I've confirmed that enable_rc6 is set to 0 a couple of ways. kernel messages show: [ 1.596148] Setting dangerous option enable_rc6 - tainting kernel and $ cat /sys/class/drm/card0/power/rc6_enable 0
Hi, I'm running ArchLinux on a Dell Latitude E7270 Laptop. I have the same issue under Kernel 4.11.9. I "solved" my randomly freeze with "i915.enable_rc6=0" Here is my history: https://bbs.archlinux.org/viewtopic.php?pid=1727435
4.12.4-300.fc26.x86_64 from updates-testing seems to work okay too.
i was wrong. I didn't wait long enough. Still occurred with 4.12.4 Jul 30 23:01:01 host kernel: [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle
I'm having the same issues with a VAIO Z, Kernel 4.11.3, when the screen goes off i'm unable to get it back after some time. The machine answers to pings and ssh logins but even though ssh logs me in, i'm never able to get a proper shell, so i have always to power off/on the laptop. The message host kernel: [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle appears in the logs. I haven't tried using "i915.enable_rc6=0" because i think its not the ideal solution, cause the GPU would run on high power all the time destroying battery life.
I'm inclined to think that this is the same bug that affects GT3 and GT4 parts, but I'm seeing it on GT2 (HD Graphics 520, Dell Latitude E7470). https://bugs.freedesktop.org/show_bug.cgi?id=94161 https://osdn.net/projects/uclinux-h8/scm/git/linux/commits/d528a6a0f3fd346bd7cc2de611a4149b6ebaab41 I'm going to see if I can build a kernel with NEEDS_WaRsDisableCoarsePowerGating for GT2 as well...
FYI my VAIO Z flip uses an Intel Iris Graphics 550 GPU. If you need more info let me know. Thanks!
Update: I have just updated the BIOS of my VAIO Z Flip from: [ 0.000000] DMI: VAIO Corporation VJZ13B/VAIO, BIOS R1193SA 09/22/2016 to: [ 0.000000] DMI: VAIO Corporation VJZ13B/VAIO, BIOS R1197SA 06/05/2017 I will see if this brings more stability and maybe fix the issue. I will report back later.
Created attachment 1308296 [details] Set NEEDS_WaRsDisableCoarsePowerGating for Skylake GT2 GPUs It's much too early to conclude that this resolves the problem, but I'm using the attached patch with 4.12.4-300.fc26.x86_64. RC6 is on and my laptop is not currently locking up during screen blanking.
That didn't resolve the problem. Looking back over the comments here, I also see several Kaby Lake systems, where NEEDS_WaRsDisableCoarsePowerGating is targeted at Skylake systems. So, it looks like there's a bigger problem with i915 rc6 support.
Bad news, after updating the BIOS of my laptop to the latest version and Upgrading to Kernel 4.12.4 i'm still experiencing the issue. Same message in the logs: Aug 06 11:26:02 localhost.localdomain kernel: [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle Screen is blank, unable to unblank it, keys when pressed are lit for a moment and then turn off, unable to ssh into the machine though login asks for the password. This is definitely a bug in i915 rc6 support. Just in case my VAIO Z flip is using a skylake CPU: model name : Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz Kernel: [root@localhost]# uname -a Linux localhost.localdomain 4.12.4 #1 SMP Sat Aug 5 11:00:30 UYT 2017 x86_64 x86_64 x86_64 GNU/Linux i915 module parameters used: [root@localhost ]# systool -vm i915 Module = "i915" Attributes: coresize = "1277952" initsize = "0" initstate = "live" refcnt = "21" srcversion = "9F705B72B03F193BC3EF19B" taint = "" uevent = <store method only> Parameters: alpha_support = "N" disable_display = "N" disable_power_well = "1" edp_vswing = "0" enable_cmd_parser = "Y" enable_dc = "-1" enable_dp_mst = "Y" enable_dpcd_backlight= "N" enable_execlists = "1" enable_fbc = "0" enable_guc_loading = "0" enable_guc_submission= "0" enable_gvt = "N" enable_hangcheck = "Y" enable_ips = "1" enable_ppgtt = "3" enable_psr = "1" enable_rc6 = "1" error_capture = "Y" fastboot = "N" force_reset_modeset_test= "N" guc_firmware_path = "(null)" guc_log_level = "-1" huc_firmware_path = "(null)" inject_load_failure = "0" invert_brightness = "0" load_detect_test = "N" lvds_channel_mode = "0" lvds_use_ssc = "-1" mmio_debug = "0" modeset = "-1" nuclear_pageflip = "N" panel_ignore_lid = "1" prefault_disable = "N" reset = "Y" semaphores = "0" use_mmio_flip = "0" vbt_sdvo_panel_type = "-1" verbose_state_checks= "Y" Again setting i915.emable_rc6 to 0 is NOT an option as it destroys battery life. I hope this bug can be fixed cause its been a long time since i915 rc6 bugs have been around for skylake and kabylake CPUs :(
running lenovo x270 with i7-7500U and Intel HD 620, Fedora 26, Kernel 4.12.5 (testing) - and issue is happening for me.
Issue happened again, same message in the logs: Aug 12 12:33:43 localhost.localdomain kernel: [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle
My experience with this issue: On my Lenovo e470 with intel hd620 and nvidia 940mx I have experienced same problem, with aforementioned i915 timeout error, with main repo kernel versions on Fedora 26. The issue was also present on Arch Linux, across 4.xx to 4.12 (latest) kernel versions. On Arch it was persistent across DE's like Gnome and XFCE (hence i doubt it's wayland/xorg issue). It was present regardless of drivers; I have tried intel-xorg, as well as modesetting. As for Arch issue is non-existent on i3wm with xfce4-power-manager, even with rc6 enabled (and bunch of other stuff through TLP power manager). I have never experienced issue with i3wm + xfce4-power manager. It's affecting me on bigger DE's, regardless of running wayland or xorg, regardless of running GDM or LightDM. What is baffling me is that this issue is non-existent on my other laptop: Lenovo L570 with only intel (hd620) graphics. This computer is running Fedora 26 with i915.enable_rc6 returning 1. It has literally never happened on this one. Both laptops had their BIOS/UEFI constantly updated to newest possible versions, and, to this day, both behave in described manner.
(In reply to Biszkhopt from comment #33) > My experience with this issue: > > On my Lenovo e470 with intel hd620 and nvidia 940mx I have experienced same > problem, with aforementioned i915 timeout error, with main repo kernel > versions on Fedora 26. The issue was also present on Arch Linux, across 4.xx > to 4.12 (latest) kernel versions. On Arch it was persistent across DE's like > Gnome and XFCE (hence i doubt it's wayland/xorg issue). It was present > regardless of drivers; I have tried intel-xorg, as well as modesetting. > > As for Arch issue is non-existent on i3wm with xfce4-power-manager, even > with rc6 enabled (and bunch of other stuff through TLP power manager). I > have never experienced issue with i3wm + xfce4-power manager. It's affecting > me on bigger DE's, regardless of running wayland or xorg, regardless of > running GDM or LightDM. > > What is baffling me is that this issue is non-existent on my other laptop: > Lenovo L570 with only intel (hd620) graphics. This computer is running > Fedora 26 with i915.enable_rc6 returning 1. It has literally never happened > on this one. > > Both laptops had their BIOS/UEFI constantly updated to newest possible > versions, and, to this day, both behave in described manner. Can you post the output of systool -vm i915 from both machines?? I have also been thinking that somehow Desktop environments as Cinnamon or GNOME could be causing this issue.. Thanks
(In reply to Biszkhopt from comment #33) > As for Arch issue is non-existent on i3wm with xfce4-power-manager, even > with rc6 enabled (and bunch of other stuff through TLP power manager). I > have never experienced issue with i3wm + xfce4-power manager. It's affecting > me on bigger DE's, regardless of running wayland or xorg, regardless of > running GDM or LightDM. I experience this issue on Arch Linux with xmonad and xfce4-power-manager (last time I checked without enable_rc=0 was some late 4.12-rc kernel). Will report back when I have checked on current 4.13-rc. Hardware it a Thinkpad T470 (Intel(R) HD Graphics 620).
I just checked and apparently this bug hasn't been reported upstream at all. Might make sense to report it there so: https://bugs.freedesktop.org/show_bug.cgi?id=102224
(In reply to Andy Wang from comment #36) > I just checked and apparently this bug hasn't been reported upstream at all. > Might make sense to report it there so: > https://bugs.freedesktop.org/show_bug.cgi?id=102224 Added a post in https://bugs.freedesktop.org/show_bug.cgi?id=102224 Thanks
UPDATE: It seems the issue has disappeared! I modified 3 i915 module options and after 3 days of testing including leaving the laptop on overnight i haven't experience the problems again. Battery consumption has been great, around 2.5 watts when idle. The module options modified were: enable_guc_loading = "1" enable_guc_submission= "1" disable_power_well = "0" for the guc module options make sure you have installed the latest firmware from https://01.org/linuxgraphics/downloads/firmware. In your dmesg after booting you will see these messages: [ 2.303462] Setting dangerous option enable_guc_loading - tainting kernel [ 2.303463] Setting dangerous option enable_guc_submission - tainting kernel [ 2.340111] [drm] GuC submission enabled (firmware i915/skl_guc_ver6_1.bin [version 6.1]) These are the GRUB boot options used: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.12.4 root=UUID=d9310d7b-9422-463c-89ec-e1431caba3c4 ro nosplash quiet noiswmd i915.enable_rc6=1 i915.enable_psr=1 i915.disable_power_well=0 i915.enable_guc_loading=1 i915.enable_guc_submission=1 i915.enable_fbc=1 pcie_aspm=force resume=/dev/nvme0n1p6 Again this has worked on a VAIO Z Flip model name : Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz Kernel: [root@localhost]# uname -a Linux localhost.localdomain 4.12.4 #1 SMP Sat Aug 5 11:00:30 UYT 2017 x86_64 x86_64 x86_64 GNU/Linux Please give it a try and let me know if it fixes your issues
> Can you post the output of systool -vm i915 from both machines?? I have also > been thinking that somehow Desktop environments as Cinnamon or GNOME could > be causing this issue.. > > Thanks I currently have only my e470 on me, so i can post output from this machine; i will have access to l570 in roughly two weeks, so then i can post that one. E470: kernel: 4.12.6-1-ARCH systool -vm i915 output: Module = "i915" Attributes: coresize = "1445888" initsize = "0" initstate = "live" refcnt = "16" taint = "" uevent = <store method only> Parameters: alpha_support = "Y" disable_display = "N" disable_power_well = "1" edp_vswing = "0" enable_cmd_parser = "Y" enable_dc = "-1" enable_dp_mst = "Y" enable_dpcd_backlight= "N" enable_execlists = "1" enable_fbc = "1" enable_guc_loading = "0" enable_guc_submission= "0" enable_gvt = "N" enable_hangcheck = "Y" enable_ips = "1" enable_ppgtt = "3" enable_psr = "0" enable_rc6 = "1" error_capture = "Y" fastboot = "N" force_reset_modeset_test= "N" guc_firmware_path = "(null)" guc_log_level = "-1" huc_firmware_path = "(null)" inject_load_failure = "0" invert_brightness = "0" load_detect_test = "N" lvds_channel_mode = "0" lvds_use_ssc = "-1" mmio_debug = "0" modeset = "-1" nuclear_pageflip = "N" panel_ignore_lid = "1" prefault_disable = "N" reset = "Y" semaphores = "0" use_mmio_flip = "0" vbt_sdvo_panel_type = "-1" verbose_state_checks= "Y" Sections: .altinstr_aux = "0xffffffffc0c8efb4" .altinstr_replacement= "0xffffffffc0c8ee61" .altinstructions = "0xffffffffc0cd0e84" .bss = "0xffffffffc0ce3100" .data..cacheline_aligned= "0xffffffffc0ce2700" .data..read_mostly = "0xffffffffc0ce0760" .data.unlikely = "0xffffffffc0ce06f0" .data = "0xffffffffc0cd8a60" .exit.text = "0xffffffffc0c8effc" .fixup = "0xffffffffc0c8f019" .gnu.linkonce.this_module= "0xffffffffc0ce2dc0" .init.text = "0xffffffffc0ae4000" .note.gnu.build-id = "0xffffffffc0c90000" .parainstructions = "0xffffffffc0ca6a20" .ref.data = "0xffffffffc0ce0980" .rodata.str1.1 = "0xffffffffc0ca6d1c" .rodata.str1.8 = "0xffffffffc0cb2c68" .rodata = "0xffffffffc0c900a0" .smp_locks = "0xffffffffc0ccdb4c" .strtab = "0xffffffffc0b05e90" .symtab = "0xffffffffc0ae6000" .text = "0xffffffffc0ba5000" .text.unlikely = "0xffffffffc0c8f06e" __bug_table = "0xffffffffc0ccdec0" __ex_table = "0xffffffffc0cd6d10" __jump_table = "0xffffffffc0cd8000" __kcrctab_gpl = "0xffffffffc0c90080" __ksymtab_gpl = "0xffffffffc0c90030" __ksymtab_strings = "0xffffffffc0cd6cb8" __mcount_loc = "0xffffffffc0cd12f0" __param = "0xffffffffc0cd66a0" __tracepoints_ptrs = "0xffffffffc0cd6d70" __tracepoints = "0xffffffffc0ce1a60" __tracepoints_strings= "0xffffffffc0cd6f00" _ftrace_events = "0xffffffffc0ce07e0"
(In reply to Kr4t0s from comment #38) > UPDATE: > > It seems the issue has disappeared! I modified 3 i915 module options and > after 3 days of testing including leaving the laptop on overnight i haven't > experience the problems again. Battery consumption has been great, around > 2.5 watts when idle. > > The module options modified were: > > enable_guc_loading = "1" > enable_guc_submission= "1" > disable_power_well = "0" > > for the guc module options make sure you have installed the latest firmware > from https://01.org/linuxgraphics/downloads/firmware. In your dmesg after > booting you will see these messages: > > [ 2.303462] Setting dangerous option enable_guc_loading - tainting kernel > [ 2.303463] Setting dangerous option enable_guc_submission - tainting > kernel > [ 2.340111] [drm] GuC submission enabled (firmware > i915/skl_guc_ver6_1.bin [version 6.1]) > > These are the GRUB boot options used: > > [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.12.4 > root=UUID=d9310d7b-9422-463c-89ec-e1431caba3c4 ro nosplash quiet noiswmd > i915.enable_rc6=1 i915.enable_psr=1 i915.disable_power_well=0 > i915.enable_guc_loading=1 i915.enable_guc_submission=1 i915.enable_fbc=1 > pcie_aspm=force resume=/dev/nvme0n1p6 > > Again this has worked on a VAIO Z Flip > model name : Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz > > Kernel: > > [root@localhost]# uname -a > Linux localhost.localdomain 4.12.4 #1 SMP Sat Aug 5 11:00:30 UYT 2017 x86_64 > x86_64 x86_64 GNU/Linux > > Please give it a try and let me know if it fixes your issues Thank you for the tip! I was not happy with disabling RC6 so I've been running my ThinkPad T470 with these options instead. A couple of days in and the blanked screen issue did not appear yet. Will keep testing for a while longer.
(In reply to Tomislav Ivek from comment #40) > (In reply to Kr4t0s from comment #38) > > UPDATE: > > > > It seems the issue has disappeared! I modified 3 i915 module options and > > after 3 days of testing including leaving the laptop on overnight i haven't > > experience the problems again. Battery consumption has been great, around > > 2.5 watts when idle. > > > > The module options modified were: > > > > enable_guc_loading = "1" > > enable_guc_submission= "1" > > disable_power_well = "0" > > > > for the guc module options make sure you have installed the latest firmware > > from https://01.org/linuxgraphics/downloads/firmware. In your dmesg after > > booting you will see these messages: > > > > [ 2.303462] Setting dangerous option enable_guc_loading - tainting kernel > > [ 2.303463] Setting dangerous option enable_guc_submission - tainting > > kernel > > [ 2.340111] [drm] GuC submission enabled (firmware > > i915/skl_guc_ver6_1.bin [version 6.1]) > > > > These are the GRUB boot options used: > > > > [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.12.4 > > root=UUID=d9310d7b-9422-463c-89ec-e1431caba3c4 ro nosplash quiet noiswmd > > i915.enable_rc6=1 i915.enable_psr=1 i915.disable_power_well=0 > > i915.enable_guc_loading=1 i915.enable_guc_submission=1 i915.enable_fbc=1 > > pcie_aspm=force resume=/dev/nvme0n1p6 > > > > Again this has worked on a VAIO Z Flip > > model name : Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz > > > > Kernel: > > > > [root@localhost]# uname -a > > Linux localhost.localdomain 4.12.4 #1 SMP Sat Aug 5 11:00:30 UYT 2017 x86_64 > > x86_64 x86_64 GNU/Linux > > > > Please give it a try and let me know if it fixes your issues > > Thank you for the tip! I was not happy with disabling RC6 so I've been > running my ThinkPad T470 with these options instead. A couple of days in and > the blanked screen issue did not appear yet. Will keep testing for a while > longer. No problem. Since the last time i updated this thread i have not experienced this issue, nearly a week without issues :)
I've been running for about a week with boot options: "i915.enable_guc_loading=1 i915.enable_guc_submission=1" This configuration seems stable. I have not measured power consumption to see if the system is actually entering low power states. I worry that this bug won't actually be solved, and the future of the i915 driver is using binary blobs to manage power, or do power management in software with frequent hangs/crashes.
(In reply to Gordon Messmer from comment #42) > I've been running for about a week with boot options: > "i915.enable_guc_loading=1 i915.enable_guc_submission=1" > > This configuration seems stable. I have not measured power consumption to > see if the system is actually entering low power states. > > I worry that this bug won't actually be solved, and the future of the i915 > driver is using binary blobs to manage power, or do power management in > software with frequent hangs/crashes. One of the developers upstream asked for debug info: https://bugs.freedesktop.org/show_bug.cgi?id=102224 I would encourage you to provide that if you can. I haven't had the opportunity.
(In reply to Gordon Messmer from comment #42) > I've been running for about a week with boot options: > "i915.enable_guc_loading=1 i915.enable_guc_submission=1" > > This configuration seems stable. I have not measured power consumption to > see if the system is actually entering low power states. > > I worry that this bug won't actually be solved, and the future of the i915 > driver is using binary blobs to manage power, or do power management in > software with frequent hangs/crashes. The system does enter low power states with those i915 options. Seems pretty stable for me, weeks without an issue.
Upgrading to a 4.13-rc kernel has completely solved this issue for me, without trying any other options suggested in this ticket; no hangs for 3 weeks now. Power management is working fine, power consumption is significantly lower than with enable_rc=0.
(In reply to Matthias Schiffer from comment #45) > Upgrading to a 4.13-rc kernel has completely solved this issue for me, > without trying any other options suggested in this ticket; no hangs for 3 > weeks now. Power management is working fine, power consumption is > significantly lower than with enable_rc=0. Would be interesting to see what changes were made to the 4.13.x kernel in relationship to i915 in comparison to the 4.12.x kernel series..
After finding the enable_guc_loading option stable, I disabled that option and added the debugging options requested by the Intel devs in the freedesktop.org ticket. I think that was on the 1st or 2nd of this month. Since then, I'm still unable to reproduce the original problem under kernel 4.12 or 4.11. If the problem recurs, I'll provide additional information. In the mean time, I wonder if loading the GuC firmware introduced a persistent change. If the problem were solved by a firmware update, that would explain why I can no longer reproduce the problem.
UPDATE: I have found an issue with kernel 4.13 stable, for some reason the GPU becomes stuck Powered ON at 100% for no reason. Cpu and load are low, only way to notice is the heat coming from the laptop and checking powertop/Idle stats. This is dangerous as it can kill the battery or maybe even degrade the life of the GPU. I have reverted back to my older kernel 4.12.4. Can anyone confirm this?
(In reply to Gordon Messmer from comment #47) > If the problem recurs, I'll provide additional information. I ran 4.11.11-300.fc26.x86_64 with "drm.debug=0x1e log_bug_len=2M" for a few weeks and was not able to reproduce the problem. Yesterday I removed those options, and today I got the blank-screen hang and "*ERROR* Timeout waiting for engines to idle" error message. Seems the failure might not manifest while debugging is enabled.
This message is a reminder that Fedora 25 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 25. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '25'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 25 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
I still see that on FC27 on a clevo notebook, will try the above workaround to check if issue persists
I am also experiencing this on FC27 4.13.16-302.fc27.x86_64 (also experienced with earlier 4.13.x kernels). My system is a Lenovo T470 with Intel integrated graphics.
Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.
It is possible that this bug is the same as or closely related to a bug I reported here on 5th March 2018 (found on Clevo W515LU laptop): https://bugzilla.redhat.com/show_bug.cgi?id=1551373 I reported the bug as fixed on 21st March 2018 (very impressive speed): since kernel-4.15.10-300.fc27.x86_64