Bug 1440988 - Laptop does not recover from blanked screen
Summary: Laptop does not recover from blanked screen
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 25
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-10 22:54 UTC by Aaron Sowry
Modified: 2018-03-30 09:25 UTC (History)
25 users (show)

(edit)
Clone Of:
: 1526324 (view as bug list)
(edit)
Last Closed: 2017-12-12 10:21:35 UTC


Attachments (Terms of Use)
full kernel log and system data for this crash on a lenovo X1 carbon (185.09 KB, text/plain)
2017-05-23 22:50 UTC, Inaky Perez-Gonzalez
no flags Details
relevant dmesg sections (3.42 KB, text/plain)
2017-05-26 17:20 UTC, Andy Wang
no flags Details
kernel messages (10.21 KB, text/plain)
2017-06-16 01:31 UTC, Andy Wang
no flags Details
Set NEEDS_WaRsDisableCoarsePowerGating for Skylake GT2 GPUs (1.41 KB, application/mbox)
2017-08-02 15:20 UTC, Gordon Messmer
no flags Details

Description Aaron Sowry 2017-04-10 22:54:00 UTC
Description of problem:
The laptop is an ASUS Zenbook UX303UB, running up-to-date Fedora 25 with the standard GNOME Wayland desktop.

Leaving the desktop idle until screen blank often (usually) renders the machine unresponsive locally. Mouse movements and keypresses do not unblank the screen. Machine is still "up", as I am able to SSH into it.

This is a regression - it has only just started happening recently, and I have been running F25 since alpha. I am not sure which component is responsible, but am reporting against the kernel since the following message is invariably found in the journal when the problem occurs:

kernel: [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle

Version-Release number of selected component (if applicable):
4.10.8-200.fc25.x86_64

How reproducible:
Almost always

Steps to Reproduce:
1. Boot into desktop
2. Wait for screen blank
3. Try to unblank screen via keypresses/mouse movements

Actual results:
Screen remains blank

Expected results:
Screen wakes up

Additional info:

Comment 1 Aaron Sowry 2017-04-15 00:11:09 UTC
I seem to have just encountered this now without the screen being blanked - the laptop simply stops responding to any input. ABRT registered a kernel oops associated with the event, but claims the backtrace "does not contain enough meaningful function frames to be reported". The reason for the crash is given as:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018

Dmesg contains the oops:

[344807.129754] Oops: 0002 [#1] SMP
[344807.129766] Modules linked in: uinput cmac rfcomm fuse ccm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables bnep vfat fat snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_sst_match snd_hda_codec_hdmi intel_rapl arc4 x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_conexant snd_hda_codec_generic snd_soc_core coretemp
[344807.129992]  kvm_intel snd_compress snd_pcm_dmaengine kvm ac97_bus snd_hda_intel iwlmvm snd_hda_codec iTCO_wdt iTCO_vendor_support mac80211 asus_nb_wmi irqbypass crct10dif_pclmul crc32_pclmul asus_wmi sparse_keymap snd_hda_core ghash_clmulni_intel intel_cstate intel_uncore snd_hwdep iwlwifi snd_seq uvcvideo snd_seq_device intel_rapl_perf snd_pcm videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 cfg80211 videobuf2_core videodev btusb hci_uart snd_timer snd btrtl media soundcore i2c_i801 btbcm btqca btintel bluetooth mei_me mei processor_thermal_device joydev intel_soc_dts_iosf shpchp intel_pch_thermal int3403_thermal rfkill intel_lpss_acpi intel_lpss acpi_als pinctrl_sunrisepoint pinctrl_intel kfifo_buf industrialio int3402_thermal tpm_crb int340x_thermal_zone int3406_thermal tpm_tis asus_wireless
[344807.130208]  tpm_tis_core int3400_thermal acpi_thermal_rel acpi_pad tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc hid_multitouch btrfs xor i915 nouveau raid6_pq mxm_wmi ttm i2c_algo_bit drm_kms_helper crc32c_intel drm serio_raw wmi video i2c_hid fjes
[344807.130282] CPU: 1 PID: 15666 Comm: totem Not tainted 4.10.8-200.fc25.x86_64 #1
[344807.130305] Hardware name: ASUSTeK COMPUTER INC. UX303UB/UX303UB, BIOS UX303UB.206 03/02/2016
[344807.130331] task: ffff9c8b762d8000 task.stack: ffffc0d186178000
[344807.130368] RIP: 0010:gen8_ppgtt_alloc_page_directories.isra.36+0x115/0x250 [i915]
[344807.130393] RSP: 0018:ffffc0d18617b880 EFLAGS: 00010246
[344807.130410] RAX: ffff9c8a33969280 RBX: 0000000000000003 RCX: 0000000000000003
[344807.130433] RDX: 0000000000000000 RSI: ffff9c8b745d5000 RDI: ffff9c8b744d8000
[344807.130457] RBP: ffffc0d18617b8d8 R08: 0000000000000000 R09: 0000000000000000
[344807.130480] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9c8b63766000
[344807.130502] R13: ffff9c8ae2599f10 R14: 00000000fc379000 R15: 0000000000800000
[344807.130525] FS:  00007fe2f6214a80(0000) GS:ffff9c8b83c80000(0000) knlGS:0000000000000000
[344807.130550] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[344807.130568] CR2: 0000000000000018 CR3: 00000001fa9a6000 CR4: 00000000003406e0
[344807.130590] Call Trace:
[344807.130615]  gen8_alloc_va_range_3lvl+0xfb/0x9e0 [i915]
[344807.130634]  ? sg_free_table+0x5c/0x70
[344807.130647]  ? sg_next+0x4/0x30
[344807.130660]  ? swiotlb_map_sg_attrs+0x49/0x110
[344807.130688]  gen8_alloc_va_range+0x23d/0x470 [i915]
[344807.130719]  i915_vma_bind+0x7e/0x170 [i915]
[344807.130747]  __i915_vma_do_pin+0x2f1/0x4a0 [i915]
[344807.130776]  i915_gem_execbuffer_reserve_vma.isra.30+0x144/0x1b0 [i915]
[344807.130809]  i915_gem_execbuffer_reserve.isra.31+0x44a/0x480 [i915]
[344807.130841]  i915_gem_do_execbuffer.isra.37+0x652/0x1820 [i915]
[344807.130861]  ? ___slab_alloc+0x294/0x540
[344807.130880]  ? enqueue_entity+0x113/0x6b0
[344807.130915]  i915_gem_execbuffer2+0xc5/0x240 [i915]
[344807.130950]  drm_ioctl+0x21b/0x4c0 [drm]
[344807.131001]  ? i915_gem_execbuffer+0x310/0x310 [i915]
[344807.131028]  ? pick_next_task_fair+0x324/0x4d0
[344807.131053]  do_vfs_ioctl+0xa3/0x5f0
[344807.131074]  SyS_ioctl+0x79/0x90
[344807.131095]  entry_SYSCALL_64_fastpath+0x1a/0xa9
[344807.131119] RIP: 0033:0x7fe2edfb7787
[344807.131138] RSP: 002b:00007ffd5eaac918 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[344807.131183] RAX: ffffffffffffffda RBX: 000055dc72621040 RCX: 00007fe2edfb7787
[344807.131216] RDX: 00007ffd5eaac960 RSI: 00000000c0406469 RDI: 000000000000000c
[344807.131247] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[344807.131280] R10: 0000000000000550 R11: 0000000000000246 R12: 00007fe2f40c63d8
[344807.131312] R13: 00007ffd5eaabcc0 R14: 000055dc7224c4e0 R15: 00007fe2f40bf6a0
[344807.131345] Code: e6 48 8b 90 20 03 00 00 48 8b b8 d8 02 00 00 48 8b 52 08 48 83 ca 03 e8 aa cc ff ff 48 8b 45 b0 48 8b 4d c8 48 8b 10 48 8b 45 d0 <4c> 89 24 ca 48 0f ab 08 0f 1f 44 00 00 e9 53 ff ff ff 65 8b 05 
[344807.131492] RIP: gen8_ppgtt_alloc_page_directories.isra.36+0x115/0x250 [i915] RSP: ffffc0d18617b880
[344807.131534] CR2: 0000000000000018
[344807.140522] ---[ end trace 8c9f3becf22cb14e ]---

Comment 2 Philippe Malinge 2017-04-19 18:22:17 UTC
I use a Dell Precision 5520 with Core i7 7820HQ,
Fedora 25
kernel 4.10.10-200.fc25.x86_64,

when I go to Gnome Settings -> Details, the graphical environment hangs.

In journal : same message "[drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle"

Thanks for you help.

Comment 3 Robert Holmes 2017-04-23 07:38:23 UTC
This seems to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1441906

Comment 4 Tal Zarfati 2017-05-05 18:53:37 UTC
(In reply to Aaron Sowry from comment #1)
> I seem to have just encountered this now without the screen being blanked -
> the laptop simply stops responding to any input. ABRT registered a kernel
> oops associated with the event, but claims the backtrace "does not contain
> enough meaningful function frames to be reported". The reason for the crash
> ...
I don't think it is - I see this error all the time, but it comes without the stack trace - the only related info in dmesg is that 'i915_gem_idle_work_handler' message.

So because of that, I don't think it's a duplication of https://bugzilla.redhat.com/show_bug.cgi?id=1441906

Comment 5 Andy Wang 2017-05-22 10:14:54 UTC
I'm seeing the same behavior as the initial bug report on this bug and I'm not seeing anything like 1441906. I have no abrt report of the "hang" nor do I see the kernel panic from 1441906.  As the commenter in #4 states, I think this and 1441906 are potentially separate issues.

Comment 6 Andy Wang 2017-05-22 10:19:08 UTC
I forgot to add I'm seeing the behavior on an XPS 13 (9360) with the qhd+ screen in wayland.  I just switched to Xorg to see if the problem occurs there.

Comment 7 Aaron Sowry 2017-05-23 03:45:51 UTC
The backtrace I supplied as part of comment #1 does look very much like RH 1441906, and this is the one I seem to be encountering now. I haven't seen the "Timeout waiting for engines to idle" message for a while now. Kernel 4.10.15-200.fc25.x86_64. No idea if this is the same bug or not, but the identical timing/behaviour of them seems to suggest a dupe IMO.

Comment 8 Andy Wang 2017-05-23 14:57:04 UTC
Interesting.  I've never seen that oops before but I can reproduce the blank screen issue simply by leaving my XPS 13 up and running for about 45 min.  Come back and it's unresponsive but still reachable by network.

Comment 9 Inaky Perez-Gonzalez 2017-05-23 22:49:34 UTC
Also seeing this in a Lenovo X1 Carbon with Wayland on F25; attached stack trace on lenovo-x1.txt on 4.10.13-200

Comment 10 Inaky Perez-Gonzalez 2017-05-23 22:50 UTC
Created attachment 1281766 [details]
full kernel log and system data for this crash on a lenovo X1 carbon

Comment 11 Gordon Messmer 2017-05-25 14:32:31 UTC
I'm watching both this bug and #1441906.  I'm pretty sure they're separate issues.  I'm able to reproduce this bug, the drm:i915_gem_idle_work_handler, using kernel 4.11.2-200.fdo99295.fc25.x86_64, which was provided as a potential solution for #1441906.

Comment 12 Aaron Sowry 2017-05-25 20:47:46 UTC
Gordon - as of this morning I can confirm your findings. Just received the drm:i915_gem_idle_work_handler error with Dave's 4.11.2-200.fdo99295.fc25.x86_64. So separate issues then, I guess.

Comment 13 Andy Wang 2017-05-26 17:19:31 UTC
I tried the latest 4.10.17-200 today and had the same thing.  I logged into my xps 13, and left it alone and came back about 20 min later and input was frozen.

ssh'ed in and the only relevant message the journal was:
May 26 11:12:01 hostname kernel: [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle

no oopses at this point.

I did a 'shutdown -r now' to reboot it and it successfully rebooted but did log a couple of slow path's and drm timeouts.  Will attach those to this report.

Comment 14 Andy Wang 2017-05-26 17:20 UTC
Created attachment 1282701 [details]
relevant dmesg sections

kernel messages during shutdown/reboot after "hung" has occurred.

Comment 15 Tomislav Ivek 2017-06-07 20:53:57 UTC
A "me too" on a Lenovo ThinkPad T470 i5-7200U running 4.11.3-200.fc25.x86_64 and GNOME Xorg. 
After leaving the screen locked for 2 hours I found the display did not come back, however the machine responded on the network. Relevant line in the journal:
Lip 07 11:48:01 yakul-local kernel: [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle

No oopses except a probably unrelated message:
Lip 07 09:34:44 yakul-local kernel: Uhhuh. NMI received for unknown reason 2c on CPU 0.
Lip 07 09:34:44 yakul-local kernel: Do you have a strange power saving mode enabled?
Lip 07 09:34:44 yakul-local kernel: Dazed and confused, but trying to continue

Comment 16 Gordon Messmer 2017-06-07 22:57:23 UTC
I'm still looking for useful information.  So far, I've confirmed that X11 has this problem, just like Wayland, and that while the error text "*ERROR* Timeout waiting for engines to idle" was added in 4.10, the error itself seems to be present in 4.9.  Under older kernels, I see a NULL pointer dereference instead of the error:

Jun 06 23:02:08 hurricane.ee.washington.edu kernel: [drm] GPU HANG: ecode 9:0:0xfffffffe, in Xorg [1653], reason: Hang on render ring, action: reset
Jun 06 23:02:08 hurricane.ee.washington.edu kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Jun 06 23:02:08 hurricane.ee.washington.edu kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Jun 06 23:02:08 hurricane.ee.washington.edu kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Jun 06 23:02:08 hurricane.ee.washington.edu kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Jun 06 23:02:08 hurricane.ee.washington.edu kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Jun 06 23:02:08 hurricane.ee.washington.edu kernel: drm/i915: Resetting chip after gpu hang
Jun 06 23:02:08 hurricane.ee.washington.edu kernel: [drm] RC6 on
Jun 06 23:02:08 hurricane.ee.washington.edu kernel: [drm] GuC firmware load skipped
Jun 06 23:02:18 hurricane.ee.washington.edu kernel: drm/i915: Resetting chip after gpu hang
Jun 06 23:02:18 hurricane.ee.washington.edu kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
Jun 06 23:02:18 hurricane.ee.washington.edu kernel: IP: [<ffffffffc03158f3>] reset_common_ring+0xc3/0x170 [i915] 

Today I booted my system with the kernel arg "i915.enable_rc6=0".  I expect to know tomorrow if that resolves the problem.  Any additional testing from other users would be useful as well.

Comment 17 Gordon Messmer 2017-06-08 14:10:45 UTC
Confirmed.  Booting with kernel arg "i915.enable_rc6=0" appears to effectively solve this problem.

Comment 18 Andy Wang 2017-06-16 01:31 UTC
Created attachment 1288224 [details]
kernel messages

Booting with enable_rc6=0 changes the behavior for me.  When the screen "locks" keyboard input is not hung.  I'm able to ctrl-alt-fX out to a virtual console and hit ctrl-alt-del to reboot even though I don't see anything until just before the reboot.  During this time, I get a bunch of kernel errors (see attached output).

Comment 19 Gordon Messmer 2017-06-16 02:37:16 UTC
Andy, can you post the content of /proc/cmdline?  Asking because the correct arg is "i915.enable_rc6=0"

The i915 prefix is required in order for enable_rc6 to be interpreted as an argument to the i915 module.  If you're booting with just "enable_rc6=0", that's not going to fix the problem.

Comment 20 Andy Wang 2017-06-16 03:48:25 UTC
I'm not doing it via the kernel command line.  I'm doing it via module configuration:
$ cat /etc/modprobe.d/i915-local.conf 
options i915 enable_rc6=0

And then rebuilt my initramfs via dracut to ensure the module parameters take affect at boot time.  i915 is loaded as a module on fedora.

I've confirmed that enable_rc6 is set to 0 a couple of ways. 
kernel messages show:
[    1.596148] Setting dangerous option enable_rc6 - tainting kernel
and
$ cat /sys/class/drm/card0/power/rc6_enable 
0

Comment 21 nachopro 2017-07-30 05:25:58 UTC
Hi, I'm running ArchLinux on a Dell Latitude E7270 Laptop.

I have the same issue under Kernel 4.11.9.

I "solved" my randomly freeze with "i915.enable_rc6=0"

Here is my history: https://bbs.archlinux.org/viewtopic.php?pid=1727435

Comment 22 Andy Wang 2017-07-31 03:16:39 UTC
4.12.4-300.fc26.x86_64 from updates-testing seems to work okay too.

Comment 23 Andy Wang 2017-07-31 04:47:43 UTC
i was wrong.  I didn't wait long enough. Still occurred with 4.12.4
Jul 30 23:01:01 host kernel: [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle

Comment 24 Kr4t0s 2017-08-01 12:08:27 UTC
I'm having the same issues with a VAIO Z, Kernel 4.11.3, when the screen goes off i'm unable to get it back after some time. The machine answers to pings and ssh logins but even though ssh logs me in, i'm never able to get a proper shell, so i have always to power off/on the laptop. The message 

host kernel: [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle

appears in the logs. I haven't tried using "i915.enable_rc6=0" because i think its not the ideal solution, cause the GPU would run on high power all the time
destroying battery life.

Comment 25 Gordon Messmer 2017-08-01 16:50:24 UTC
I'm inclined to think that this is the same bug that affects GT3 and GT4 parts, but I'm seeing it on GT2 (HD Graphics 520, Dell Latitude E7470).

https://bugs.freedesktop.org/show_bug.cgi?id=94161
https://osdn.net/projects/uclinux-h8/scm/git/linux/commits/d528a6a0f3fd346bd7cc2de611a4149b6ebaab41

I'm going to see if I can build a kernel with NEEDS_WaRsDisableCoarsePowerGating for GT2 as well...

Comment 26 Kr4t0s 2017-08-01 17:43:18 UTC
FYI my VAIO Z flip uses an Intel ‎Iris Graphics 550 GPU. If you need more info let me know.
Thanks!

Comment 27 Kr4t0s 2017-08-01 18:33:25 UTC
Update: I have just updated the BIOS of my VAIO Z Flip from:

[    0.000000] DMI: VAIO Corporation VJZ13B/VAIO, BIOS R1193SA 09/22/2016

to:

[    0.000000] DMI: VAIO Corporation VJZ13B/VAIO, BIOS R1197SA 06/05/2017

I will see if this brings more stability and maybe fix the issue. I will report back later.

Comment 28 Gordon Messmer 2017-08-02 15:20 UTC
Created attachment 1308296 [details]
Set NEEDS_WaRsDisableCoarsePowerGating for Skylake GT2 GPUs

It's much too early to conclude that this resolves the problem, but I'm using the attached patch with 4.12.4-300.fc26.x86_64.  RC6 is on and my laptop is not currently locking up during screen blanking.

Comment 29 Gordon Messmer 2017-08-03 16:42:13 UTC
That didn't resolve the problem.  Looking back over the comments here, I also see several Kaby Lake systems, where NEEDS_WaRsDisableCoarsePowerGating is targeted at Skylake systems.  So, it looks like there's a bigger problem with i915 rc6 support.

Comment 30 Kr4t0s 2017-08-06 14:55:50 UTC
Bad news, after updating the BIOS of my laptop to the latest version and Upgrading to Kernel 4.12.4 i'm still experiencing the issue. Same message in the logs:

Aug 06 11:26:02 localhost.localdomain kernel: [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle

Screen is blank, unable to unblank it, keys when pressed are lit for a moment and then turn off, unable to ssh into the machine though login asks for the password.

This is definitely a bug in i915 rc6 support. Just in case my VAIO Z flip is using a skylake CPU:

model name	: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz

Kernel:

[root@localhost]# uname -a
Linux localhost.localdomain 4.12.4 #1 SMP Sat Aug 5 11:00:30 UYT 2017 x86_64 x86_64 x86_64 GNU/Linux


i915 module parameters used:

[root@localhost ]# systool -vm i915
Module = "i915"

  Attributes:
    coresize            = "1277952"
    initsize            = "0"
    initstate           = "live"
    refcnt              = "21"
    srcversion          = "9F705B72B03F193BC3EF19B"
    taint               = ""
    uevent              = <store method only>

  Parameters:
    alpha_support       = "N"
    disable_display     = "N"
    disable_power_well  = "1"
    edp_vswing          = "0"
    enable_cmd_parser   = "Y"
    enable_dc           = "-1"
    enable_dp_mst       = "Y"
    enable_dpcd_backlight= "N"
    enable_execlists    = "1"
    enable_fbc          = "0"
    enable_guc_loading  = "0"
    enable_guc_submission= "0"
    enable_gvt          = "N"
    enable_hangcheck    = "Y"
    enable_ips          = "1"
    enable_ppgtt        = "3"
    enable_psr          = "1"
    enable_rc6          = "1"
    error_capture       = "Y"
    fastboot            = "N"
    force_reset_modeset_test= "N"
    guc_firmware_path   = "(null)"
    guc_log_level       = "-1"
    huc_firmware_path   = "(null)"
    inject_load_failure = "0"
    invert_brightness   = "0"
    load_detect_test    = "N"
    lvds_channel_mode   = "0"
    lvds_use_ssc        = "-1"
    mmio_debug          = "0"
    modeset             = "-1"
    nuclear_pageflip    = "N"
    panel_ignore_lid    = "1"
    prefault_disable    = "N"
    reset               = "Y"
    semaphores          = "0"
    use_mmio_flip       = "0"
    vbt_sdvo_panel_type = "-1"
    verbose_state_checks= "Y"

Again setting i915.emable_rc6 to 0 is NOT an option as it destroys battery life. I hope this bug can be fixed cause its been a long time since i915 rc6 bugs have been around for skylake and kabylake CPUs :(

Comment 31 Anderson Silva 2017-08-11 20:16:35 UTC
running lenovo x270 with i7-7500U and Intel HD 620, Fedora 26, Kernel 4.12.5 (testing) - and issue is happening for me.

Comment 32 Kr4t0s 2017-08-12 16:42:59 UTC
Issue happened again, same message in the logs:

Aug 12 12:33:43 localhost.localdomain kernel: [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle

Comment 33 Biszkhopt 2017-08-13 19:19:30 UTC
My experience with this issue:

On my Lenovo e470 with intel hd620 and nvidia 940mx I have experienced same problem, with aforementioned i915 timeout error, with main repo kernel versions on Fedora 26. The issue was also present on Arch Linux, across 4.xx to 4.12 (latest) kernel versions. On Arch it was persistent across DE's like Gnome and XFCE (hence i doubt it's wayland/xorg issue). It was present regardless of drivers; I have tried intel-xorg, as well as modesetting.

As for Arch issue is non-existent on i3wm with xfce4-power-manager, even with rc6 enabled (and bunch of other stuff through TLP power manager). I have never experienced issue with i3wm + xfce4-power manager. It's affecting me on bigger DE's, regardless of running wayland or xorg, regardless of running GDM or LightDM.

What is baffling me is that this issue is non-existent on my other laptop: Lenovo L570 with only intel (hd620) graphics. This computer is running Fedora 26 with i915.enable_rc6 returning 1. It has literally never happened on this one.

Both laptops had their BIOS/UEFI constantly updated to newest possible versions, and, to this day, both behave in described manner.

Comment 34 Kr4t0s 2017-08-14 12:44:52 UTC
(In reply to Biszkhopt from comment #33)
> My experience with this issue:
> 
> On my Lenovo e470 with intel hd620 and nvidia 940mx I have experienced same
> problem, with aforementioned i915 timeout error, with main repo kernel
> versions on Fedora 26. The issue was also present on Arch Linux, across 4.xx
> to 4.12 (latest) kernel versions. On Arch it was persistent across DE's like
> Gnome and XFCE (hence i doubt it's wayland/xorg issue). It was present
> regardless of drivers; I have tried intel-xorg, as well as modesetting.
> 
> As for Arch issue is non-existent on i3wm with xfce4-power-manager, even
> with rc6 enabled (and bunch of other stuff through TLP power manager). I
> have never experienced issue with i3wm + xfce4-power manager. It's affecting
> me on bigger DE's, regardless of running wayland or xorg, regardless of
> running GDM or LightDM.
> 
> What is baffling me is that this issue is non-existent on my other laptop:
> Lenovo L570 with only intel (hd620) graphics. This computer is running
> Fedora 26 with i915.enable_rc6 returning 1. It has literally never happened
> on this one.
> 
> Both laptops had their BIOS/UEFI constantly updated to newest possible
> versions, and, to this day, both behave in described manner.

Can you post the output of systool -vm i915 from both machines?? I have also been thinking that somehow Desktop environments as Cinnamon or GNOME could be causing this issue..

Thanks

Comment 35 Matthias Schiffer 2017-08-14 17:25:47 UTC
(In reply to Biszkhopt from comment #33)
> As for Arch issue is non-existent on i3wm with xfce4-power-manager, even
> with rc6 enabled (and bunch of other stuff through TLP power manager). I
> have never experienced issue with i3wm + xfce4-power manager. It's affecting
> me on bigger DE's, regardless of running wayland or xorg, regardless of
> running GDM or LightDM.

I experience this issue on Arch Linux with xmonad and xfce4-power-manager (last time I checked without enable_rc=0 was some late 4.12-rc kernel). Will report back when I have checked on current 4.13-rc.

Hardware it a Thinkpad T470 (Intel(R) HD Graphics 620).

Comment 36 Andy Wang 2017-08-14 20:43:16 UTC
I just checked and apparently this bug hasn't been reported upstream at all.  Might make sense to report it there so:
https://bugs.freedesktop.org/show_bug.cgi?id=102224

Comment 37 Kr4t0s 2017-08-14 22:33:26 UTC
(In reply to Andy Wang from comment #36)
> I just checked and apparently this bug hasn't been reported upstream at all.
> Might make sense to report it there so:
> https://bugs.freedesktop.org/show_bug.cgi?id=102224

Added a post in https://bugs.freedesktop.org/show_bug.cgi?id=102224

Thanks

Comment 38 Kr4t0s 2017-08-16 11:45:00 UTC
UPDATE: 

It seems the issue has disappeared! I modified 3 i915 module options and after 3 days of testing including leaving the laptop on overnight i haven't experience the problems again. Battery consumption has been great, around
2.5 watts when idle.
 
The module options modified were:

enable_guc_loading  = "1"
enable_guc_submission= "1"
disable_power_well  = "0"

for the guc module options make sure you have installed the latest firmware from https://01.org/linuxgraphics/downloads/firmware. In your dmesg after booting you will see these messages:

[    2.303462] Setting dangerous option enable_guc_loading - tainting kernel
[    2.303463] Setting dangerous option enable_guc_submission - tainting kernel
[    2.340111] [drm] GuC submission enabled (firmware i915/skl_guc_ver6_1.bin [version 6.1])

These are the GRUB boot options used:

[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.12.4 root=UUID=d9310d7b-9422-463c-89ec-e1431caba3c4 ro nosplash quiet noiswmd i915.enable_rc6=1 i915.enable_psr=1 i915.disable_power_well=0 i915.enable_guc_loading=1 i915.enable_guc_submission=1 i915.enable_fbc=1 pcie_aspm=force resume=/dev/nvme0n1p6

Again this has worked on a VAIO Z Flip 
model name	: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz

Kernel:

[root@localhost]# uname -a
Linux localhost.localdomain 4.12.4 #1 SMP Sat Aug 5 11:00:30 UYT 2017 x86_64 x86_64 x86_64 GNU/Linux

Please give it a try and let me know if it fixes your issues

Comment 39 Biszkhopt 2017-08-16 12:06:07 UTC
> Can you post the output of systool -vm i915 from both machines?? I have also
> been thinking that somehow Desktop environments as Cinnamon or GNOME could
> be causing this issue..
> 
> Thanks

I currently have only my e470 on me, so i can post output from this machine; i will have access to l570 in roughly two weeks, so then i can post that one.

E470:

kernel: 4.12.6-1-ARCH

systool -vm i915 output:

Module = "i915"

  Attributes:
    coresize            = "1445888"
    initsize            = "0"
    initstate           = "live"
    refcnt              = "16"
    taint               = ""
    uevent              = <store method only>

  Parameters:
    alpha_support       = "Y"
    disable_display     = "N"
    disable_power_well  = "1"
    edp_vswing          = "0"
    enable_cmd_parser   = "Y"
    enable_dc           = "-1"
    enable_dp_mst       = "Y"
    enable_dpcd_backlight= "N"
    enable_execlists    = "1"
    enable_fbc          = "1"
    enable_guc_loading  = "0"
    enable_guc_submission= "0"
    enable_gvt          = "N"
    enable_hangcheck    = "Y"
    enable_ips          = "1"
    enable_ppgtt        = "3"
    enable_psr          = "0"
    enable_rc6          = "1"
    error_capture       = "Y"
    fastboot            = "N"
    force_reset_modeset_test= "N"
    guc_firmware_path   = "(null)"
    guc_log_level       = "-1"
    huc_firmware_path   = "(null)"
    inject_load_failure = "0"
    invert_brightness   = "0"
    load_detect_test    = "N"
    lvds_channel_mode   = "0"
    lvds_use_ssc        = "-1"
    mmio_debug          = "0"
    modeset             = "-1"
    nuclear_pageflip    = "N"
    panel_ignore_lid    = "1"
    prefault_disable    = "N"
    reset               = "Y"
    semaphores          = "0"
    use_mmio_flip       = "0"
    vbt_sdvo_panel_type = "-1"
    verbose_state_checks= "Y"

  Sections:
    .altinstr_aux       = "0xffffffffc0c8efb4"
    .altinstr_replacement= "0xffffffffc0c8ee61"
    .altinstructions    = "0xffffffffc0cd0e84"
    .bss                = "0xffffffffc0ce3100"
    .data..cacheline_aligned= "0xffffffffc0ce2700"
    .data..read_mostly  = "0xffffffffc0ce0760"
    .data.unlikely      = "0xffffffffc0ce06f0"
    .data               = "0xffffffffc0cd8a60"
    .exit.text          = "0xffffffffc0c8effc"
    .fixup              = "0xffffffffc0c8f019"
    .gnu.linkonce.this_module= "0xffffffffc0ce2dc0"
    .init.text          = "0xffffffffc0ae4000"
    .note.gnu.build-id  = "0xffffffffc0c90000"
    .parainstructions   = "0xffffffffc0ca6a20"
    .ref.data           = "0xffffffffc0ce0980"
    .rodata.str1.1      = "0xffffffffc0ca6d1c"
    .rodata.str1.8      = "0xffffffffc0cb2c68"
    .rodata             = "0xffffffffc0c900a0"
    .smp_locks          = "0xffffffffc0ccdb4c"
    .strtab             = "0xffffffffc0b05e90"
    .symtab             = "0xffffffffc0ae6000"
    .text               = "0xffffffffc0ba5000"
    .text.unlikely      = "0xffffffffc0c8f06e"
    __bug_table         = "0xffffffffc0ccdec0"
    __ex_table          = "0xffffffffc0cd6d10"
    __jump_table        = "0xffffffffc0cd8000"
    __kcrctab_gpl       = "0xffffffffc0c90080"
    __ksymtab_gpl       = "0xffffffffc0c90030"
    __ksymtab_strings   = "0xffffffffc0cd6cb8"
    __mcount_loc        = "0xffffffffc0cd12f0"
    __param             = "0xffffffffc0cd66a0"
    __tracepoints_ptrs  = "0xffffffffc0cd6d70"
    __tracepoints       = "0xffffffffc0ce1a60"
    __tracepoints_strings= "0xffffffffc0cd6f00"
    _ftrace_events      = "0xffffffffc0ce07e0"

Comment 40 Tomislav Ivek 2017-08-21 14:51:48 UTC
(In reply to Kr4t0s from comment #38)
> UPDATE: 
> 
> It seems the issue has disappeared! I modified 3 i915 module options and
> after 3 days of testing including leaving the laptop on overnight i haven't
> experience the problems again. Battery consumption has been great, around
> 2.5 watts when idle.
>  
> The module options modified were:
> 
> enable_guc_loading  = "1"
> enable_guc_submission= "1"
> disable_power_well  = "0"
> 
> for the guc module options make sure you have installed the latest firmware
> from https://01.org/linuxgraphics/downloads/firmware. In your dmesg after
> booting you will see these messages:
> 
> [    2.303462] Setting dangerous option enable_guc_loading - tainting kernel
> [    2.303463] Setting dangerous option enable_guc_submission - tainting
> kernel
> [    2.340111] [drm] GuC submission enabled (firmware
> i915/skl_guc_ver6_1.bin [version 6.1])
> 
> These are the GRUB boot options used:
> 
> [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.12.4
> root=UUID=d9310d7b-9422-463c-89ec-e1431caba3c4 ro nosplash quiet noiswmd
> i915.enable_rc6=1 i915.enable_psr=1 i915.disable_power_well=0
> i915.enable_guc_loading=1 i915.enable_guc_submission=1 i915.enable_fbc=1
> pcie_aspm=force resume=/dev/nvme0n1p6
> 
> Again this has worked on a VAIO Z Flip 
> model name	: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz
> 
> Kernel:
> 
> [root@localhost]# uname -a
> Linux localhost.localdomain 4.12.4 #1 SMP Sat Aug 5 11:00:30 UYT 2017 x86_64
> x86_64 x86_64 GNU/Linux
> 
> Please give it a try and let me know if it fixes your issues

Thank you for the tip! I was not happy with disabling RC6 so I've been running my ThinkPad T470 with these options instead. A couple of days in and the blanked screen issue did not appear yet. Will keep testing for a while longer.

Comment 41 Kr4t0s 2017-08-21 15:36:27 UTC
(In reply to Tomislav Ivek from comment #40)
> (In reply to Kr4t0s from comment #38)
> > UPDATE: 
> > 
> > It seems the issue has disappeared! I modified 3 i915 module options and
> > after 3 days of testing including leaving the laptop on overnight i haven't
> > experience the problems again. Battery consumption has been great, around
> > 2.5 watts when idle.
> >  
> > The module options modified were:
> > 
> > enable_guc_loading  = "1"
> > enable_guc_submission= "1"
> > disable_power_well  = "0"
> > 
> > for the guc module options make sure you have installed the latest firmware
> > from https://01.org/linuxgraphics/downloads/firmware. In your dmesg after
> > booting you will see these messages:
> > 
> > [    2.303462] Setting dangerous option enable_guc_loading - tainting kernel
> > [    2.303463] Setting dangerous option enable_guc_submission - tainting
> > kernel
> > [    2.340111] [drm] GuC submission enabled (firmware
> > i915/skl_guc_ver6_1.bin [version 6.1])
> > 
> > These are the GRUB boot options used:
> > 
> > [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.12.4
> > root=UUID=d9310d7b-9422-463c-89ec-e1431caba3c4 ro nosplash quiet noiswmd
> > i915.enable_rc6=1 i915.enable_psr=1 i915.disable_power_well=0
> > i915.enable_guc_loading=1 i915.enable_guc_submission=1 i915.enable_fbc=1
> > pcie_aspm=force resume=/dev/nvme0n1p6
> > 
> > Again this has worked on a VAIO Z Flip 
> > model name	: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz
> > 
> > Kernel:
> > 
> > [root@localhost]# uname -a
> > Linux localhost.localdomain 4.12.4 #1 SMP Sat Aug 5 11:00:30 UYT 2017 x86_64
> > x86_64 x86_64 GNU/Linux
> > 
> > Please give it a try and let me know if it fixes your issues
> 
> Thank you for the tip! I was not happy with disabling RC6 so I've been
> running my ThinkPad T470 with these options instead. A couple of days in and
> the blanked screen issue did not appear yet. Will keep testing for a while
> longer.

No problem. Since the last time i updated this thread i have not experienced this issue, nearly a week without issues :)

Comment 42 Gordon Messmer 2017-08-30 18:38:30 UTC
I've been running for about a week with boot options:
"i915.enable_guc_loading=1 i915.enable_guc_submission=1"

This configuration seems stable.  I have not measured power consumption to see if the system is actually entering low power states.

I worry that this bug won't actually be solved, and the future of the i915 driver is using binary blobs to manage power, or do power management in software with frequent hangs/crashes.

Comment 43 Andy Wang 2017-08-30 20:03:59 UTC
(In reply to Gordon Messmer from comment #42)
> I've been running for about a week with boot options:
> "i915.enable_guc_loading=1 i915.enable_guc_submission=1"
> 
> This configuration seems stable.  I have not measured power consumption to
> see if the system is actually entering low power states.
> 
> I worry that this bug won't actually be solved, and the future of the i915
> driver is using binary blobs to manage power, or do power management in
> software with frequent hangs/crashes.

One of the developers upstream asked for debug info:
https://bugs.freedesktop.org/show_bug.cgi?id=102224

I would encourage you to provide that if you can.  I haven't had the opportunity.

Comment 44 Kr4t0s 2017-08-31 14:10:04 UTC
(In reply to Gordon Messmer from comment #42)
> I've been running for about a week with boot options:
> "i915.enable_guc_loading=1 i915.enable_guc_submission=1"
> 
> This configuration seems stable.  I have not measured power consumption to
> see if the system is actually entering low power states.
> 
> I worry that this bug won't actually be solved, and the future of the i915
> driver is using binary blobs to manage power, or do power management in
> software with frequent hangs/crashes.

The system does enter low power states with those i915 options. Seems pretty stable for me, weeks without an issue.

Comment 45 Matthias Schiffer 2017-08-31 14:17:00 UTC
Upgrading to a 4.13-rc kernel has completely solved this issue for me, without trying any other options suggested in this ticket; no hangs for 3 weeks now. Power management is working fine, power consumption is significantly lower than with enable_rc=0.

Comment 46 Kr4t0s 2017-09-01 12:39:52 UTC
(In reply to Matthias Schiffer from comment #45)
> Upgrading to a 4.13-rc kernel has completely solved this issue for me,
> without trying any other options suggested in this ticket; no hangs for 3
> weeks now. Power management is working fine, power consumption is
> significantly lower than with enable_rc=0.

Would be interesting to see what changes were made to the 4.13.x kernel in relationship to i915 in comparison to the 4.12.x kernel series..

Comment 47 Gordon Messmer 2017-09-10 17:03:25 UTC
After finding the enable_guc_loading option stable, I disabled that option and added the debugging options requested by the Intel devs in the freedesktop.org ticket.  I think that was on the 1st or 2nd of this month.  Since then, I'm still unable to reproduce the original problem under kernel 4.12 or 4.11.

If the problem recurs, I'll provide additional information.  In the mean time, I wonder if loading the GuC firmware introduced a persistent change.  If the problem were solved by a firmware update, that would explain why I can no longer reproduce the problem.

Comment 48 Kr4t0s 2017-09-14 15:24:59 UTC
UPDATE: I have found an issue with kernel 4.13 stable, for some reason the GPU becomes stuck Powered ON at 100% for no reason. Cpu and load are low, only way to notice is the heat coming from the laptop and checking powertop/Idle stats. This is dangerous as it can kill the battery or maybe even degrade the life of the GPU.
I have reverted back to my older kernel 4.12.4.
Can anyone confirm this?

Comment 49 Gordon Messmer 2017-09-21 18:38:10 UTC
(In reply to Gordon Messmer from comment #47)
> If the problem recurs, I'll provide additional information.

I ran 4.11.11-300.fc26.x86_64 with "drm.debug=0x1e log_bug_len=2M" for a few weeks and was not able to reproduce the problem.

Yesterday I removed those options, and today I got the blank-screen hang and "*ERROR* Timeout waiting for engines to idle" error message.

Seems the failure might not manifest while debugging is enabled.

Comment 50 Fedora End Of Life 2017-11-16 19:34:52 UTC
This message is a reminder that Fedora 25 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 25. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '25'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 25 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 51 Matteo Brancaleoni 2017-11-19 09:14:51 UTC
I still see that on FC27 on a clevo notebook, will try the above workaround to check if issue persists

Comment 52 Justin Chiu 2017-12-08 12:00:55 UTC
I am also experiencing this on FC27 4.13.16-302.fc27.x86_64 (also experienced with earlier 4.13.x kernels). My system is a Lenovo T470 with Intel integrated graphics.

Comment 53 Fedora End Of Life 2017-12-12 10:21:35 UTC
Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 54 aaronsloman 2018-03-30 09:25:41 UTC
It is possible that this bug is the same as or closely related to a bug I reported here on 5th March 2018 (found on Clevo W515LU laptop):
https://bugzilla.redhat.com/show_bug.cgi?id=1551373

I reported the bug as fixed on 21st March 2018 (very impressive speed):
since kernel-4.15.10-300.fc27.x86_64


Note You need to log in before you can comment on or make changes to this bug.