Description of problem: xorg crashes with the following message in the log: Fatal server error: Failed to submit batchbuffer: Input/output error Then the computer locks up. Version-Release number of selected component (if applicable): kernel-PAE-2.6.32.9-70.fc12.i686 How reproducible: always Steps to Reproduce: 1. boot into xorg Actual results: System crashes Expected results: No crash Additional info: xorg-x11-server-Xorg-1.7.5.901-4.fc12 #lspci | grep VGA 00:02.0 VGA compatible controller: Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 03) This appears in /var/log/messages: Mar 13 01:23:06 localhost kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Mar 13 01:23:07 localhost kernel: render error detected, EIR: 0x00000000 Mar 13 01:23:07 localhost kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 2080 at 2079) Mar 13 01:23:07 localhost abrt[1979]: can't read /proc/1570/exe link The system is a Dell Inspiron 1100. The system is stable using the kernel-PAE-2.6.31.12-174.2.22.fc12.i686 kernel. I will be happy to provide any additional details the developers require. There is no xorg.conf on this system. The system also crashed with 2.6.32.9-67 in the same way. I'm not sure this is worth anything, but the behavior of the video changed with the switch to 2.6.32. In the 2.6.31 kernel the gdm greeter would be the wrong resolution, although the xfce desktop would reset the screen to the correct resolution upon login. In 2.6.32 the gdm greeter starts with the native lcd resolution.
In 2.6.31 I get this message in the kernel log: render error detected, EIR: 0x00000010 [drm:i915_handle_error] *ERROR* EIR stuck: 0x00000010, masking render error detected, EIR: 0x00000010
With kernel 2.6.32.10-90.fc12.i686 $ dmesg [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung render error detected, EIR: 0x00000000 [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 5701 at 5700) Works fine on kernel 2.6.31...
Also on x86_64 with kernel 2.6.32.10-90.fc12.x86_64: $ dmesg [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung render error detected, EIR: 0x00000000 [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 712871 at 712843)
Same thing on kernel 2.6.32.11-99.fc12.i686: $ cat messages Apr 14 08:06:06 kathleen kernel: Linux version 2.6.32.11-99.fc12.i686 (mockbuild.fedoraproject.org) (gcc version 4.4.3 20100127 (Red Hat 4.4.3-4) (GCC) ) #1 SMP Mon Apr 5 16:32:08 EDT 2010 ... With Apr 14 09:00:00 kathleen kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Apr 14 09:00:00 kathleen kernel: render error detected, EIR: 0x00000000 Apr 14 09:00:00 kathleen kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 15227 at 15226) ...
The same symptoms occur with: kernel-2.6.32.11-99.fc12.x86_64 xorg-x11-server-Xorg-1.7.6-1.fc12.x86_64 Bill
Started seeing this on Apr 6 on a Fujitsu P7120 with kernel-PAE-2.6.32.11-90.fc12.i686 through kernel-PAE-2.6.32.11-99.fc12.i686 xorg-x11-server-Xorg-1.7.6-1.fc12.i686 Apr 22 09:06:10 localhost kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Apr 22 09:06:10 localhost kernel: render error detected, EIR: 0x00000000 Apr 22 09:06:10 localhost kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 386846 at 386826)
Moving to F13 beta has solved this problem! 2.6.33.2-57.fc13.x86_64 xorg-x11-server-Xorg-1.8.0-6.fc13.x86_64 Bill
Still happening on Asus EeePC 901, running: kernel-PAE-2.6.32.11-99.fc12.i686 xorg-x11-server-Xorg-1.7.6-3.fc12.i686 xorg-x11-drv-intel-2.9.1-1.fc12.i686 $ lspci 00:02.0 VGA compatible controller: Intel Corporation Mobile 945GME Express Integrated Graphics Controller (rev 03) $ lspci -nvvv 00:02.0 0300: 8086:27ae (rev 03) (prog-if 00 [VGA controller]) Subsystem: 1043:830f Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 16 Region 0: Memory at f7f00000 (32-bit, non-prefetchable) [size=512K] Region 1: I/O ports at dc80 [size=8] Region 2: Memory at d0000000 (32-bit, prefetchable) [size=256M] Region 3: Memory at f7ec0000 (32-bit, non-prefetchable) [size=256K] Expansion ROM at <unassigned> [disabled] Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit- Address: 00000000 Data: 0000 Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: i915 Kernel modules: i915
Additional info: for me it happens after I leave the machine idling for at least half an hour.
Additional info: For me it happens only after i wake my pc after a "suspend to ram", i can use for a few minutes or hours and i get crash, not happen if i dont use suspend. Log: May 10 20:55:54 zengarden kernel: render error detected, EIR: 0x00000000 May 10 20:55:54 zengarden kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 6654648 at 6654643) May 10 20:56:48 zengarden abrt: Kerneloops: Reported 11 kernel oopses to Abrt May 10 20:56:48 zengarden abrtd: Directory 'kerneloops-1273535808-11' creation detected May 10 20:56:48 zengarden abrtd: Getting local universal unique identification May 10 20:56:48 zengarden abrtd: New crash /var/cache/abrt/kerneloops-1273535808-11, processing May 10 20:56:48 zengarden abrtd: Can't load '/usr/lib/abrt/libRunApp.so': /usr/lib/abrt/libRunApp.so: cannot open shared object file: No such file or directory May 10 20:56:48 zengarden abrtd: Activation of plugin 'RunApp' was not successful: Plugin 'RunApp' is not registered May 10 20:56:48 zengarden abrtd: Directory 'kerneloops-1273535808-10' creation detected May 10 20:56:48 zengarden abrtd: Getting local universal unique identification May 10 20:56:48 zengarden abrtd: New crash /var/cache/abrt/kerneloops-1273535808-10, processing May 10 20:56:48 zengarden abrtd: Can't load '/usr/lib/abrt/libRunApp.so': /usr/lib/abrt/libRunApp.so: cannot open shared object file: No such file or directory May 10 20:56:48 zengarden abrtd: Activation of plugin 'RunApp' was not successful: Plugin 'RunApp' is not registered May 10 20:56:48 zengarden abrtd: Directory 'kerneloops-1273535808-9' creation detected May 10 20:56:48 zengarden abrtd: Getting local universal unique identification May 10 20:56:48 zengarden abrtd: Crash is in database already (dup of /var/cache/abrt/kerneloops-1273535808-10) May 10 20:56:48 zengarden abrtd: Deleting crash kerneloops-1273535808-9 (dup of kerneloops-1273535808-10), sending dbus signal May 10 20:56:48 zengarden abrtd: Directory 'kerneloops-1273535808-8' creation detected May 10 20:56:48 zengarden abrtd: Getting local universal unique identification May 10 20:56:48 zengarden abrtd: Crash is in database already (dup of /var/cache/abrt/kerneloops-1273535808-11) May 10 20:56:48 zengarden abrtd: Deleting crash kerneloops-1273535808-8 (dup of kerneloops-1273535808-11), sending dbus signal May 10 20:56:48 zengarden abrtd: Directory 'kerneloops-1273535808-7' creation detected May 10 20:56:48 zengarden abrtd: Getting local universal unique identification May 10 20:56:48 zengarden abrtd: Crash is in database already (dup of /var/cache/abrt/kerneloops-1273535808-10) May 10 20:56:48 zengarden abrtd: Deleting crash kerneloops-1273535808-7 (dup of kerneloops-1273535808-10), sending dbus signal May 10 20:56:48 zengarden abrtd: Directory 'kerneloops-1273535808-6' creation detected May 10 20:56:48 zengarden abrtd: Getting local universal unique identification May 10 20:56:48 zengarden abrtd: Crash is in database already (dup of /var/cache/abrt/kerneloops-1273535808-10) May 10 20:56:48 zengarden abrtd: Deleting crash kerneloops-1273535808-6 (dup of kerneloops-1273535808-10), sending dbus signal May 10 20:56:48 zengarden abrtd: Directory 'kerneloops-1273535808-5' creation detected May 10 20:56:48 zengarden abrtd: Getting local universal unique identification May 10 20:56:48 zengarden abrtd: Crash is in database already (dup of /var/cache/abrt/kerneloops-1273535808-10) May 10 20:56:48 zengarden abrtd: Deleting crash kerneloops-1273535808-5 (dup of kerneloops-1273535808-10), sending dbus signal May 10 20:56:48 zengarden abrtd: Directory 'kerneloops-1273535808-4' creation detected May 10 20:56:48 zengarden abrtd: Getting local universal unique identification May 10 20:56:48 zengarden abrtd: Crash is in database already (dup of /var/cache/abrt/kerneloops-1273535808-10) May 10 20:56:48 zengarden abrtd: Deleting crash kerneloops-1273535808-4 (dup of kerneloops-1273535808-10), sending dbus signal May 10 20:56:48 zengarden abrtd: Directory 'kerneloops-1273535808-3' creation detected May 10 20:56:48 zengarden abrtd: Getting local universal unique identification May 10 20:56:48 zengarden abrtd: Crash is in database already (dup of /var/cache/abrt/kerneloops-1273535808-11) May 10 20:56:48 zengarden abrtd: Deleting crash kerneloops-1273535808-3 (dup of kerneloops-1273535808-11), sending dbus signal May 10 20:56:48 zengarden abrtd: Directory 'kerneloops-1273535808-2' creation detected May 10 20:56:48 zengarden abrtd: Getting local universal unique identification May 10 20:56:48 zengarden abrtd: Crash is in database already (dup of /var/cache/abrt/kerneloops-1273535808-11) May 10 20:56:48 zengarden abrtd: Deleting crash kerneloops-1273535808-2 (dup of kerneloops-1273535808-11), sending dbus signal May 10 20:56:48 zengarden abrtd: Directory 'kerneloops-1273535808-1' creation detected May 10 20:56:48 zengarden abrtd: Getting local universal unique identification May 10 20:56:48 zengarden abrtd: New crash /var/cache/abrt/kerneloops-1273535808-1, processing May 10 20:56:48 zengarden abrtd: Can't load '/usr/lib/abrt/libRunApp.so': /usr/lib/abrt/libRunApp.so: cannot open shared object file: No such file or directory May 10 20:56:48 zengarden abrtd: Activation of plugin 'RunApp' was not successful: Plugin 'RunApp' is not registered
Me too :( # rpm -q kernel kernel-2.6.32.10-90.fc12.i686 # lspci | grep VGA 00:02.0 VGA compatible controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
Also happens on F13 (2.6.33.3-85.fc13.x86_64 and 2.6.33.4-95.fc13.x86_64 from koji tested) on Lenovo T60, so far I've three identical looking crashes like this: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung render error detected, EIR: 0x00000000 [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 1211489 at 1211486) ------------[ cut here ]------------ WARNING: at drivers/gpu/drm/i915/i915_gem_tiling.c:332 i915_gem_set_tiling+0x148/0x199 [i915]() Hardware name: 6369Y13 failed to reset object for tiling switch Modules linked in: nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss tun fuse rfcomm sco bridge stp llc bnep l2cap sunrpc cpufreq_ondemand acpi_cpufreq freq_table xt_physdev ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm uinput btusb snd_hda_codec_analog bluetooth snd_hda_intel snd_hda_codec arc4 ecb snd_hwdep snd_seq iTCO_wdt iTCO_vendor_support thinkpad_acpi iwl3945 snd_seq_device i2c_i801 iwlcore irda e1000e snd_pcm microcode crc_ccitt mac80211 snd_timer snd cfg80211 soundcore rfkill snd_page_alloc yenta_socket rsrc_nonstatic i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: nf_nat] Pid: 1652, comm: Xorg Not tainted 2.6.33.3-85.fc13.x86_64 #1 Call Trace: [<ffffffff8104b558>] warn_slowpath_common+0x77/0x8f [<ffffffff8104b5bd>] warn_slowpath_fmt+0x3c/0x3e [<ffffffffa007896e>] ? i915_gem_object_wait_rendering+0x34/0x36 [i915] [<ffffffffa007db11>] i915_gem_set_tiling+0x148/0x199 [i915] [<ffffffffa002b19b>] drm_ioctl+0x254/0x365 [drm] [<ffffffffa007d9c9>] ? i915_gem_set_tiling+0x0/0x199 [i915] [<ffffffff8132c3e4>] ? might_fault+0x1c/0x1e [<ffffffff8132c586>] ? input_event_to_user+0x6f/0x81 [<ffffffff8110e0ab>] vfs_ioctl+0x2d/0xa1 [<ffffffff8110e614>] do_vfs_ioctl+0x47e/0x4c4 [<ffffffff8109643e>] ? audit_syscall_exit+0x12b/0x147 [<ffffffff8110e6ab>] sys_ioctl+0x51/0x74 [<ffffffff81009db3>] ? int_check_syscall_exit_work+0x34/0x3d [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b ---[ end trace c0dbc87fd1ade41c ]---
Same thing on kernel 2.6.31.12-174.2.22.fc12. Went back to 2.3.31.11-99 and it works again.
This bug also appears in Fedora 12, latest kernel 2.6.32.12-115.fc12.i686 on a LunaPier board lspci: lspci -s 00:02.* -vnn 00:02.0 VGA compatible controller [0300]: Intel Corporation Pineview Integrated Graphics Controller [8086:a011] (prog-if 00 [VGA controller]) Subsystem: Intel Corporation Pineview Integrated Graphics Controller [8086:a011] Flags: bus master, fast devsel, latency 0, IRQ 25 Memory at fea00000 (32-bit, non-prefetchable) [size=512K] I/O ports at d000 [size=8] Memory at d0000000 (32-bit, prefetchable) [size=256M] Memory at fe900000 (32-bit, non-prefetchable) [size=1M] Expansion ROM at <unassigned> [disabled] Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [d0] Power Management version 2 Kernel driver in use: i915 Kernel modules: i915 00:02.1 Display controller [0380]: Intel Corporation Pineview Integrated Graphics Controller [8086:a012] Subsystem: Intel Corporation Device [8086:a011] Flags: bus master, fast devsel, latency 0 Memory at fe880000 (32-bit, non-prefetchable) [size=512K] Capabilities: [d0] Power Management version 2 Partial log: May 20 22:28:40 localhost kernel: [drm] Big FIFO is enabled May 20 22:28:40 localhost kernel: [drm:i915_gem_madvise_ioctl] *ERROR* Attempted i915_gem_madvise_ioctl() on a pinned object May 20 22:28:40 localhost kernel: [drm] Big FIFO is disabled May 20 22:28:40 localhost kernel: [drm] Big FIFO is disabled May 20 22:28:40 localhost kernel: [drm] Big FIFO is disabled May 20 22:28:43 localhost rtkit-daemon[1759]: Sucessfully made thread 1757 of process 1757 (/usr/bin/pulseaudio) owned by '42' high priority at nice level -11. May 20 22:28:43 localhost rtkit-daemon[1759]: Sucessfully made thread 1763 of process 1757 (/usr/bin/pulseaudio) owned by '42' RT at priority 5. May 20 22:28:44 localhost rtkit-daemon[1759]: Sucessfully made thread 1764 of process 1757 (/usr/bin/pulseaudio) owned by '42' RT at priority 5. May 20 22:28:45 localhost kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung May 20 22:28:45 localhost kernel: render error detected, EIR: 0x00000000 May 20 22:28:45 localhost kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 66 at 65) May 20 22:28:45 localhost kernel: [drm] Big FIFO is enabled May 20 22:28:45 localhost kernel: [drm] Big FIFO is enabled May 20 22:28:45 localhost kernel: [drm] Big FIFO is enabled May 20 22:28:45 localhost kernel: [drm] Big FIFO is enabled May 20 22:28:45 localhost kernel: [drm] Big FIFO is enabled May 20 22:28:45 localhost kernel: [drm] Big FIFO is disabled May 20 22:28:45 localhost kernel: [drm] Big FIFO is disabled May 20 22:28:45 localhost kernel: [drm] Big FIFO is disabled May 20 22:28:46 localhost gdm-simple-slave[1686]: WARNING: Child process -1709 was already dead. May 20 22:28:46 localhost gdm-simple-slave[1686]: WARNING: Unable to kill D-Bus daemon May 20 22:28:46 localhost gdm-binary[1668]: WARNING: GdmDisplay: display lasted 0.481828 seconds May 20 22:28:47 localhost gdm-binary[1668]: WARNING: GdmDisplay: display lasted 0.464761 seconds May 20 22:28:47 localhost gdm-binary[1668]: WARNING: GdmDisplay: display lasted 0.474511 seconds
just booted to kernel 2.6.32.12-115.fc12.x86_64 and after a few minutes got blank screen in Xorg... rebooted and logs contain: May 24 14:09:26 puga kernel: Linux version 2.6.32.12-115.fc12.x86_64 (mockbuild.fedoraproject.org) (gcc version 4.4.3 20100127 (Red Hat 4.4.3-4) (GCC) ) #1 SMP Fri Apr 30 19:46:25 UTC 2010 ... May 24 14:13:35 puga kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung May 24 14:13:35 puga kernel: render error detected, EIR: 0x00000000 May 24 14:13:35 puga kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 23471 at 23468) May 24 14:13:36 puga abrt[2191]: saved core dump of pid 1662 (/usr/bin/Xorg) to /var/cache/abrt/ccpp-1274696016-1662.new/coredump (12759040 bytes) If I boot into older kernel 2.6.31.12-174.2.19.fc12.x86_64 - everything works ok'ay. PS: btw, here's another bug-report which looks same - #571058 .
Can maintainers please tell, is it fixed in FC13 or not, since I can't upgrade my fc12 to fc13 to get a non-working system. Thanks.
I just installed F12 on my wife's desktop, and I'm hitting this very bug. It seems to be random when the machine locks up. It first happened when my wife brought it out of resume. Then it happened to her again while she was checking her email. X freezes and she loses wireless (I have a PCI wireless card), but I have a serial console on the box and the machine is still active. I just can't get it out of X, or restart the network. Xorg.0 shows nothing particular, but I get this in kernel log: May 25 06:13:20 localhost kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung May 25 06:13:20 localhost kernel: render error detected, EIR: 0x00000000 May 25 06:13:20 localhost kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 331648 at 331645) and X crashed. It seems that abrtd saved some data if anyone is interested, including a coredump of Xorg.
Steven, you can try to install an older fc12 kernel, i.e. 2.6.31.12-174.2.19.fc12, and check will Xorg crash or not.
F13 is certainly affected, I've gotten GPU hangups with at least 2.6.33.3-85.fc13 and 2.6.33.4-95.fc13. Going back to 2.6.32.11-99.fc12 appears to cure it (no hangs for several days, whereas the f13-kernels hang several times a day)
https://bugzilla.kernel.org/show_bug.cgi?id=15659 seems relevant.
Anatoly, I can do that, but I need to remove the wireless card I have. It triggers a bug on boot up (and resume). This bug for that kernel: https://bugzilla.redhat.com/show_bug.cgi?id=501109 This is fine while it is in my office. But since I was an idiot and never wired up my wife's office with CAT5 when I built it, this can only be a temporary solution, as my wife needs the wireless PCI card. It does not lock up right away. But I always do get an error message for i915 on resume: May 24 11:29:08 localhost kernel: render error detected, EIR: 0x00000010 May 24 11:29:08 localhost kernel: [drm:i915_handle_error] *ERROR* EIR stuck: 0x00000010, masking May 24 11:29:08 localhost kernel: render error detected, EIR: 0x00000010 I can see if I get it with the older kernel.
I forgot to mention that I do not get the error with vanilla 2.6.34. I built it last night with the f12 .33 kernel config (and hitting default for all new options). I have not run it long enough to know if the video hangs or not. I just can say that I do not get the error message with that kernel.
Anatoly, I went back to 2.6.31.5-127.fc12.i686.PAE and I still get the EIR stuck error on resume. I don't know if it will lock up or not. I'll keep it running for while and see.
I'm seeing this bug as well on F13 i686 2.6.33.4-95, but did not see it on any F12 2.6.32 kernels. It's gotten to the point now where this is my third attempt to add to this bug report. My system is almost completely unusable for stretches of more than 5 minutes. X doesn't crash but it hangs indefinitely not allowing me to click anywhere. The mouse cursor still moves though. uname -a: Linux icarus.tenpointone.com 2.6.33.4-95.fc13.i686.PAE #1 SMP Thu May 13 05:38:26 UTC 2010 i686 i686 i386 GNU/Linux lspci -vnn output. I have an IBM T60: 00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller [8086:27a2] (rev 03) (prog-if 00 [VGA controller]) Subsystem: Lenovo ThinkPad T60/R60 series [17aa:201a] Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at ee100000 (32-bit, non-prefetchable) [size=512K] I/O ports at 1800 [size=8] Memory at d0000000 (32-bit, prefetchable) [size=256M] Memory at ee200000 (32-bit, non-prefetchable) [size=256K] Expansion ROM at <unassigned> [disabled] Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [d0] Power Management version 2 Kernel driver in use: i915 Kernel modules: i915 00:02.1 Display controller [0380]: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller [8086:27a6] (rev 03) Subsystem: Lenovo ThinkPad T60/R60 series [17aa:201a] Flags: fast devsel Memory at ee180000 (32-bit, non-prefetchable) [size=512K] Capabilities: [d0] Power Management version 2 Here's the bug, which mentions my hardware device 6369Y11. This is the LCD: May 26 11:33:51 icarus ntpd[2591]: 0.0.0.0 c615 05 clock_sync May 26 11:35:24 icarus kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung May 26 11:35:24 icarus kernel: ------------[ cut here ]------------ May 26 11:35:24 icarus kernel: WARNING: at mm/highmem.c:453 debug_kmap_atomic+0xad/0x12a() May 26 11:35:24 icarus kernel: Hardware name: 6369Y11 May 26 11:35:24 icarus kernel: Modules linked in: fuse rfcomm sco bridge stp llc bnep l2cap sunrpc cpufreq_ondemand acpi_cpufreq ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 uinput arc4 ecb iwlagn snd_hda_codec_analog iwlcore snd_hda_intel mac80211 snd_hda_codec btusb snd_hwdep snd_seq snd_seq_device bluetooth iTCO_wdt cfg80211 iTCO_vendor_support snd_pcm thinkpad_acpi snd_timer snd_page_alloc e1000e rfkill snd i2c_i801 soundcore microcode aes_i586 aes_generic xts gf128mul dm_crypt yenta_socket rsrc_nonstatic i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan] May 26 11:35:24 icarus kernel: Pid: 0, comm: swapper Not tainted 2.6.33.4-95.fc13.i686.PAE #1 May 26 11:35:24 icarus kernel: Call Trace: May 26 11:35:24 icarus kernel: [<c043d629>] warn_slowpath_common+0x65/0x7c May 26 11:35:24 icarus kernel: [<c04b1391>] ? debug_kmap_atomic+0xad/0x12a May 26 11:35:24 icarus kernel: [<c043d64d>] warn_slowpath_null+0xd/0x10 May 26 11:35:24 icarus kernel: [<c04b1391>] debug_kmap_atomic+0xad/0x12a May 26 11:35:24 icarus kernel: [<c042a893>] kmap_atomic_prot+0x5c/0x10c May 26 11:35:24 icarus kernel: [<c04c91ca>] ? __kmalloc+0x103/0x10f May 26 11:35:24 icarus kernel: [<c042a957>] kmap_atomic+0x14/0x16 May 26 11:35:24 icarus kernel: [<f802607f>] i915_error_object_create+0x9f/0xfa [i915] May 26 11:35:24 icarus kernel: [<f80263ee>] i915_handle_error+0x314/0x813 [i915] May 26 11:35:24 icarus kernel: [<f802698c>] i915_hangcheck_elapsed+0x9f/0xdf [i915] May 26 11:35:24 icarus kernel: [<c04486d5>] run_timer_softirq+0x163/0x1e6 May 26 11:35:24 icarus kernel: [<f80268ed>] ? i915_hangcheck_elapsed+0x0/0xdf [i915] May 26 11:35:24 icarus kernel: [<c0442a05>] __do_softirq+0xac/0x152 May 26 11:35:24 icarus kernel: [<c0442adc>] do_softirq+0x31/0x3c May 26 11:35:24 icarus kernel: [<c0442bf0>] irq_exit+0x29/0x5c May 26 11:35:24 icarus kernel: [<c041d687>] smp_apic_timer_interrupt+0x6f/0x7d May 26 11:35:24 icarus kernel: [<c07832fd>] apic_timer_interrupt+0x31/0x38 May 26 11:35:24 icarus kernel: [<c045007b>] ? flush_work+0x6c/0x85 May 26 11:35:24 icarus kernel: [<c05ffd95>] ? acpi_idle_enter_bm+0x251/0x282 May 26 11:35:24 icarus kernel: [<c06d874a>] cpuidle_idle_call+0x6d/0xbf May 26 11:35:24 icarus kernel: [<c0407a78>] cpu_idle+0x91/0xad May 26 11:35:24 icarus kernel: [<c077e51f>] start_secondary+0x1f5/0x233 May 26 11:35:24 icarus kernel: ---[ end trace a101ffe8d1c9a3e8 ]---
Brent, Do you see this all the time or just when you come out of resume? I have a T60 too running f12 with no issues. Thanks for the report, I'll avoid upgrading ;-) Actually, why are you running a 32bit distro on it? My T60 is x86_64 (of course I can run either), but it is more efficient to run 64bit, especially because you don't use highmem (which sucks).
I've been running the 2.6.31.5-127.fc12.i686.PAE kernel for a while now, and I've suspend / resumed several times. I only got the EIR Stuck message once. I checked, I only get the error on boot up, but not after each resume like I did before. Interesting, when booting back into 2.6.32.12-115.fc12.i686.PAE I do not get the error on bootup, but I do get the EIR stuck error on each resume. Not sure if that means anything or not, but I figured I report it.
I should actually clarify that I was running F12 on x86_64, but I decided to switch back to i686 for this release to see if I could get away from jumping through flaming hoops for some app incompatibilities I was having. And the T60 only recognizes 3GB of memory due to a BIOS bug that IBM decided not to fix. I've put this 2.6.33.4-95 kernel through suspend/resume and it hasn't crashed doing that yet, although everytime I open up anything that stresses video (flash websites, ogg video) it's been crashing fairly consistently. I believe code has been added to the driver to detect when you have unsaved work you don't want to lose, because that seems to cause a crash every time under that circumstance.
I wonder if this could be a i915 + i686 bug. Basically, if it worked under f12 on x86_64 but not under f13 on i686, you can not totally blame it on f13. There's a huge difference here. The way IO memory is shared with modules is vastly different on i686 and x86_64. In i686 it is shared, in x86_64 it is not. Clarification... It does not always crash on suspend and resume (in fact, it hardly does crash then). But I've noticed that I could use the system rather intensely before a suspend, but after a resume, X can lockup after a while. I'm not 100% on this. It may lockup without suspending too, I just have yet to see that. The lockup happens to also be quite random. That is, all that is done on the box is email and web, and the browser is not even doing much at all.
I've reverted to Fedora 12 2.6.32.12-115.fc12.i686.PAE, and where my system was locking up every 5-10 minutes on F13 i686, it hasn't locked up in the last few hours on the same hardware using i686. I believe this bug was introduced in the new kernels and isn't specific to i686. I'll repost if I have any stability issues, but I'm not expecting any.
(In reply to comment #28) > I wonder if this could be a i915 + i686 bug. Basically, if it worked under f12 > on x86_64 but not under f13 on i686, you can not totally blame it on f13. > There's a huge difference here. The way IO memory is shared with modules is > vastly different on i686 and x86_64. In i686 it is shared, in x86_64 it is not. It's not limited to i686, I get these hangs on F13 x86_64 constantly. Switching to 2.6.32.11-99.fc12.x86_64 kernel "fixes" it - there were never such hangs on F12 while this system was running it, 2.6.32.11-99.fc12 was likely to be the latest kernel I tried there. > Clarification... It does not always crash on suspend and resume (in fact, it > hardly does crash then). But I've noticed that I could use the system rather > intensely before a suspend, but after a resume, X can lockup after a while. > > I'm not 100% on this. It may lockup without suspending too, I just have yet to > see that. > > The lockup happens to also be quite random. That is, all that is done on the > box is email and web, and the browser is not even doing much at all. The hangups are indeed quite random (no idea how to reproduce), but occur often enough to make the system rather unusable. And at least for me, there's no suspend/resume involved. The system doesn't stay up long enough to be suspended at the end of the day.
I have fully working 2.6.31.12-174.2.19.fc12.x86_64 here without this "GPU hang" bug. PS: my machine is a HP workstation, so doesn't make suspend/resume.
Note, after pulling the wireless pci card, I have yet been able to make X lock up. I added this comment also to https://bugzilla.redhat.com/show_bug.cgi?id=501109. Maybe I will be drilling a hole in the wall to my wife's office and adding a CAT5 connection.
Nevermind. Actually doing work on it (playing with photos) I got X to lock up without the wireless card. Same Hangcheck timer elapsed ... GPU hung reder error detected. Happened when I put f-shot into full screen mode. Will try to repeat.
Well, I'm am going to try F13, IFF there is some way to boot kernel 2.6.31.12-174.2.22.fc12.i686 in case it goes south. Do you think this will work: - cp -a /boot /boot.fc12 - upgrade from fc12 to fc13 - boot - wait for gnome-screensaver to blank the screen (my repeat by...) - Good: done, success - Bad: - cp /boot.fc12/*31.12* /boot - vi /etc/grub and add the lines: title Fedora (2.6.31.12-174.2.22.fc12.i686) root (hd0,1) kernel /vmlinuz-2.6.31.12-174.2.22.fc12.i686 ro root=UUID=af94033c-ba1c-4f01-80ff-0b15c1577c7d rhgb quiet SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us initrd /initramfs-2.6.31.12-174.2.22.fc12.i686.img - boot Anybody know???
I'm trying to debug this too. Since I noticed that ftrace is enabled, the following can be done to help debug this. As root user: # mount -t debugfs nodev /sys/kernel/debug # cd /sys/kernel/debug/tracing # echo 1 > events/i915/enable # echo :mod:i915 > set_ftrace_filter # echo i915_handle_error:traceoff >> set_ftrace_filter Note the '>>" on the second echo into set_ftrace_filter. Then when X crashes, you should still be able to ssh into the box. Then do: # echo /sys/kernel/debug/trace > trace.out and attach the trace.out file here (maybe compress it first). It should give a bit more information to what is happening. If you trigger another error in i915 that disables tracing, just do: # echo 1 > /sys/kernel/debug/tracing/tracing_on and that will restart the tracing.
Created attachment 417381 [details] trace output of X lockup I just triggered the lockup while tracing. Attached is the trace.out.bz2. The trace, enabled functions in the i915 module, the i915 trace events and stopped on the i915_handle_error, which happened as X locked up.
In comment 35 I forgot to say at the end of those echos: # echo function > current_tracer
Created attachment 417403 [details] Trace output of error after suspend This is the trace output of functions i915 and drm and drm_kms_helper during a suspend and resume. It stopped at the ERROR EIR stuck that I get. I forgot to enable the i915 events. If those are needed I could run it again.
Still happens in 2.6.33.5-112.fc13.x86_64: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung render error detected, EIR: 0x00000000 ------------[ cut here ]------------ WARNING: at drivers/gpu/drm/i915/i915_gem_tiling.c:332 i915_gem_set_tiling+0x148/0x199 [i915]() Hardware name: 6369Y13 failed to reset object for tiling switch Modules linked in: nls_utf8 tun fuse ipt_MASQUERADE iptable_nat nf_nat rfcomm sco bridge stp llc bnep l2cap sunrpc cpufreq_ondemand acpi_cpufreq freq_table xt_physdev ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm uinput snd_hda_codec_analog arc4 snd_hda_intel ecb snd_hda_codec snd_hwdep snd_seq iwl3945 snd_seq_device iwlcore snd_pcm iTCO_wdt iTCO_vendor_support mac80211 snd_timer btusb thinkpad_acpi i2c_i801 snd_page_alloc bluetooth cfg80211 rfkill snd irda e1000e microcode crc_ccitt soundcore yenta_socket rsrc_nonstatic i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan] Pid: 1670, comm: Xorg Not tainted 2.6.33.5-112.fc13.x86_64 #1 Call Trace: [<ffffffff8104b54c>] warn_slowpath_common+0x77/0x8f [<ffffffff8104b5b1>] warn_slowpath_fmt+0x3c/0x3e [<ffffffffa0078ac3>] ? i915_gem_object_wait_rendering+0x34/0x36 [i915] [<ffffffffa007de61>] i915_gem_set_tiling+0x148/0x199 [i915] [<ffffffffa002b19b>] drm_ioctl+0x254/0x365 [drm] [<ffffffffa007dd19>] ? i915_gem_set_tiling+0x0/0x199 [i915] [<ffffffff81101575>] ? do_sync_write+0xbf/0xfc [<ffffffff8110de43>] vfs_ioctl+0x2d/0xa1 [<ffffffff8110e3ac>] do_vfs_ioctl+0x47e/0x4c4 [<ffffffff8110e443>] sys_ioctl+0x51/0x74 [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b Jun 2 11:25:38 dhcp102 kernel: ---[ end trace 16015add6996231d ]---
Note, I've installed vanilla 2.6.34 and I have not had the problem since.
I've reproduced this on F13 (2.6.33.5-112.fc13.x86_64) using x11perf -copywinwin500 It's also happening intermittently during normal usage. Any movement on this? It's making F13 pretty unusable.
I have recompiled the 2.6.34-20 kernel that I pulled from koji and it crashes the same as these stock f12 and f13 kernels. I am doing another compile of the 2.6.34-20 but without the drm_gem_object_alloc-i915_gem_alloc_object.patch to see if it that patch contains the bug causing this.
Regarding my last Comment 14 I am still using the same HW but moved on to F13, kernel 2.6.33.5-112.fc13. I finally could log into GNOME. F13 is more stable. I ran x11perf test as suggested by Mike Bonnet (Comment 41) and then had intermittent crashes. I realized that something 'eats' up the memory... I updated this morning to kernel 2.6.33.5-124.fc13.i686 and I am currently running the same x11perf test. So far, F13 is stable and has not crashed. I checked DRM related kernel patches which could have solved the bug and found this patch drm-i915-fix-non-ironlake-965-class-crashes.patch
It just happened to me again, F13 with kernel kernel-PAE-2.6.33.5-124.fc13.i686 Hardware is a Lenovo 3000 V100 laptop, the graphic card is: 00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)
This is still reproduceable on kernel-2.6.33.5-124.fc13.x86_64 with the x11perf command above: Jun 16 20:09:01 maunalani kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Jun 16 20:09:01 maunalani kernel: render error detected, EIR: 0x00000000 Jun 16 20:09:01 maunalani kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 1069028 at 1069027)
Can someone tell me, is it possible to use FC13 installation with older fc12 (last-) working kernel? Thanks.
Anatoly, I'm doing exactly that as a workaround until this issue is resolved. Running a fully up-to-date F13 userspace with kernel-2.6.32.12-115.fc12.x86_64. I haven't noticed any issues running with the F12 kernel, and it doesn't experience the lockups.
My system (Atom N450, ICH8) has be running for more than 42 hours with the latest F13 kernel (2.6.33.5-124.fc13.i686). NO crashes so far. x11perf -copywinwin500 used to crash my system after 3-4 runs on previous .33 kernels. All .32 kernels on F12 failed to boot into GNOME login screen. I could see the mouse cursor and the login screen for a fraction of a second and then the screen went black. Mike & Anatoly, could you give me your HW specs? I am going to do some tests with other CPU/chip-sets.
Lenovo T60, product ID: 6369Y11 (info below was grabbed while running the 2.6.32.12-115.fc12.x86_64 kernel) # cat /proc/cpuinfo | grep 'model name' model name : Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz model name : Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz # lspci -vvv -s 00:02.0 00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03) (prog-if 00 [VGA controller]) Subsystem: Lenovo ThinkPad T60/R60 series Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 16 Region 0: Memory at ee100000 (32-bit, non-prefetchable) [size=512K] Region 1: I/O ports at 1800 [size=8] Region 2: Memory at d0000000 (32-bit, prefetchable) [size=256M] Region 3: Memory at ee200000 (32-bit, non-prefetchable) [size=256K] Expansion ROM at <unassigned> [disabled] Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit- Address: 00000000 Data: 0000 Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: i915 Kernel modules: i915 Let me know if you need any more information, or want me to run any tests/newer kernels. Thanks for looking into this!
2.6.33.5-124.fc13.x86_64 crashed on me last night. ------------[ cut here ]------------ WARNING: at drivers/gpu/drm/i915/i915_gem_tiling.c:332 i915_gem_set_tiling+0x148/0x199 [i915]() Hardware name: 6369CTO failed to reset object for tiling switch Modules linked in: vfat fat usb_storage hidp fuse ipt_MASQUERADE iptable_nat nf_nat rfcomm sco bridge stp llc bnep l2cap sunrpc cpufreq_ondemand acpi_cpufreq freq_table xt_physdev ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 uinput snd_hda_codec_analog snd_hda_intel snd_hda_codec arc4 snd_hwdep snd_seq thinkpad_acpi snd_seq_device ecb snd_pcm iwl3945 iwlcore snd_timer mac80211 snd cfg80211 soundcore iTCO_wdt snd_page_alloc iTCO_vendor_support e1000e btusb bluetooth i2c_i801 rfkill joydev microcode aes_x86_64 aes_generic xts gf128mul dm_crypt yenta_socket rsrc_nonstatic i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: kvm] Pid: 1931, comm: Xorg Not tainted 2.6.33.5-124.fc13.x86_64 #1 Call Trace: [<ffffffff8104b54c>] warn_slowpath_common+0x77/0x8f [<ffffffff8104b5b1>] warn_slowpath_fmt+0x3c/0x3e [<ffffffffa0078ac3>] ? i915_gem_object_wait_rendering+0x34/0x36 [i915] [<ffffffffa007de61>] i915_gem_set_tiling+0x148/0x199 [i915] [<ffffffffa002b19b>] drm_ioctl+0x254/0x365 [drm] [<ffffffffa007dd19>] ? i915_gem_set_tiling+0x0/0x199 [i915] [<ffffffff81101571>] ? do_sync_write+0xbf/0xfc [<ffffffff8110de3f>] vfs_ioctl+0x2d/0xa1 [<ffffffff8110e3a8>] do_vfs_ioctl+0x47e/0x4c4 [<ffffffff8110e43f>] sys_ioctl+0x51/0x74 [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b Lenovo T60 widescreen product number 6369CTO $ cat /proc/cpuinfo | grep 'model name' model name : Intel(R) Core(TM)2 CPU T5500 @ 1.66GHz model name : Intel(R) Core(TM)2 CPU T5500 @ 1.66GHz $ lspci -vvv -s 00:02.0 00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03) (prog-if 00 [VGA controller]) Subsystem: Lenovo ThinkPad T60/R60 series Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 16 Region 0: Memory at ee100000 (32-bit, non-prefetchable) [size=512K] Region 1: I/O ports at 1800 [size=8] Region 2: Memory at d0000000 (32-bit, prefetchable) [size=256M] Region 3: Memory at ee200000 (32-bit, non-prefetchable) [size=256K] Expansion ROM at <unassigned> [disabled] Capabilities: <access denied> Kernel driver in use: i915 Kernel modules: i915
upgraded to fc13 last Friday, 1 (working) day of active usage (gnome-terminal, firefox, rhythmbox) no crashes so far. "x11perf -copywinwin500" doesn't crash. Currently running 2.6.33.3-85.fc13.x86_64 #1 SMP Thu May 6 18:09:49 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux kernel, I have kernel-2.6.33.5-124.fc13.x86_64 rpm installed (after running "yum update", but haven't booted to it yet). my public Smolt hardware profile is at http://www.smolts.org/client/show/pub_0b811287-8a93-4afb-8142-70b3738f67f0 Ask if you will need more info, thanks.
Any news about this bug? There are a couple people who stopped by the Fedora booth at the Summit asking about progress. I hadn't seen the comment about using an F12 kernel -- we'll pass that on, thanks.
Sorry for the delay... Here is what I have so far: I tested several different boards with similar chip-sets and CPUs with F13 32/64 bit. PCM-9362 (Atom N450, ICH8M) 2-chip design (LunaPier) DS: http://download.advantech.com//ProductFile/1-EUPFE4/PCM-9362_DS%2804.22.10%29.pdf GMB-945GC (Intel Celeron, 945GC, ICH7) 3-chip design DS: http://download.advantech.com//ProductFile/1-F7FZL3/GMB-945GC_DS_update.pdf PCM-9590 (Core2 Duo T7200, 945GME, ICH7M) 3-chip design DS: http://download.advantech.com//ProductFile/1-F0S4BU/PCM-9590_DS_updated.pdf I stress tested the Xserver with Mike's suggestion while using 'x11perf -copywinwin500' and also did all other *500 tests like this for i in $(x11perf 2>&1 | grep 500 | awk '{print $1}');do x11perf $i done > x11perf.log 2>&1 All F13 installation were updated with the latest available (stable) updates. To avoid downloading ~200MB of updates (compared to the official F13 LiveCD) I re-built F13 32/64 bit LiveCDs Test with F13 on all above boards: PCM-9362: F13, kernel-2.6.33.5-124.fc13.i686 (32bit). It did not crash within 48 hours GMB-945GC: F13, kernel-2.6.33.5-124.fc13.i686 (32bit). It did not crash within 8 hours PCM-9590: with regards to Mike Bonnet's and Adam Hough's specs, I tried to get as close as possible to them cat /proc/cpuinfo | grep 'model name' model name : Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz model name : Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz lspci -s 00:02.0 -vnn 00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller [8086:27a2] (rev 03) (prog-if 00 [VGA controller]) Subsystem: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller [8086:27a2] Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at fde80000 (32-bit, non-prefetchable) [size=512K] I/O ports at ff00 [size=8] Memory at d0000000 (32-bit, prefetchable) [size=256M] Memory at fdf80000 (32-bit, non-prefetchable) [size=256K] Expansion ROM at <unassigned> [disabled] Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [d0] Power Management version 2 Kernel driver in use: i915 Kernel modules: i915 F13, kernel-2.6.33.5-124.fc13.i686 (32bit). It did not crash within 48 hours F13, kernel, 2.6.33.5-124.fc13.x86_64 (64bit). It did not crash within 47 hours I even created new LiveCD spins while running the above tests However, 2.6.33.3-85.fc13.x86_64 (F13 officially released LiveCD) crashed after about 15 hours. Anatoly, I finally found a board which matches your Q35. This will be my next test target. After all, F13 is currently stable with regards to those listed test boards. However, some are still having issues with kernel-2.6.33.5-124.fc13. I am under the impression that perhaps the bug could be related to multiple SW packages and not only the kernel itself - just a thought... As for F12, I need to re-spin a LiveCD and re-run the tests on those boards.
An update regarding F12 PCM-9590, see spec of my previous post F12, kernel 2.6.32.14-127.fc12.i686 (32bit) seems to work. I need to run it for some time (x11perf) I will also test it with F12 64bit.
I'm one of the persons that still have crashes. My smolt profile is http://www.smolts.org/client/show/pub_3a4bcaeb-fcda-423a-9f49-dbca00f26f18 The last crash appened after more than three days of use. Sometimes the kernel keeps running for a couple of days, other times it crashes after a couple of hours. hope it helps.
This is the *last* kernel that works here: http://www.smolts.org/client/show/pub_69ac12a7-4d9b-4abd-860b-64a3ab444654 All others from the FC12 series won't work with a hangcheck error when I come out (or go in??) of the screensaver.
*** Bug 596246 has been marked as a duplicate of this bug. ***
Profile of PCM-9590 on smolts http://www.smolts.org/client/show/pub_80f0f588-9f72-4115-a511-58412dcbc288
If it can be useful: I was running a PAE kernel. I switched to a non PAE one, and I even tried the latest F12 non PAE kernel. All these kernels keep crashing.
Same thing with: kernel-2.6.32.16-141.fc12.i686 and all updates. Can we get this fixed???
I'm seeing the same error with: kernel-2.6.33.6-147.fc13.i686 Jul 13 18:43:18 quingu kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Jul 13 18:43:18 quingu kernel: render error detected, EIR: 0x00000000 Jul 13 18:43:19 quingu kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 50017 at 50011
https://bugzilla.kernel.org/show_bug.cgi?id=15659 is fixed. So probably it's good time to backport this git commit to fc12 kernel or release updated version for people who can't rebuild kernel src.rpm themselves? I can't test this patch myself, since upgraded to fc13 and the problem disappeared with my hardware. Thanks.
(In reply to comment #62) > https://bugzilla.kernel.org/show_bug.cgi?id=15659 is fixed. So probably it's > good time to backport this git commit to fc12 kernel or release updated version > for people who can't rebuild kernel src.rpm themselves? I can't test this patch > myself, since upgraded to fc13 and the problem disappeared with my hardware. > Thanks. That commit is in 2.6.32.16-150, which was just submitted for updates-testing.
In 2.6.33.6-147.fc13.i686.PAE #1 SMP Tue Jul 6 22:24:44 UTC 2010 i686 i686 i386 GNU/Linux, similar Aug 1 23:14:00 p180g-f13 kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Aug 1 23:14:00 p180g-f13 kernel: render error detected, EIR: 0x00000000 Aug 1 23:14:00 p180g-f13 kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 19607 at 19605) Aug 1 23:14:00 p180g-f13 kernel: ------------[ cut here ]------------ Aug 1 23:14:00 p180g-f13 kernel: WARNING: at drivers/gpu/drm/i915/i915_gem_tiling.c:332 i915_gem_set_tiling+0x128/0x166 [i915]() Aug 1 23:14:00 p180g-f13 kernel: Hardware name: Aug 1 23:14:00 p180g-f13 kernel: failed to reset object for tiling switch Aug 1 23:14:00 p180g-f13 kernel: Modules linked in: fuse ipt_MASQUERADE xt_mark xt_MARK iptable_mangle nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp i$ Aug 1 23:14:00 p180g-f13 kernel: Pid: 1824, comm: Xorg Not tainted 2.6.33.6-147.fc13.i686.PAE #1 Aug 1 23:14:00 p180g-f13 kernel: Call Trace: Aug 1 23:14:00 p180g-f13 kernel: [<c043d69d>] warn_slowpath_common+0x65/0x7c Aug 1 23:14:00 p180g-f13 kernel: [<f019df5a>] ? i915_gem_set_tiling+0x128/0x166 [i915] Aug 1 23:14:00 p180g-f13 kernel: [<c043d6e8>] warn_slowpath_fmt+0x24/0x27 Aug 1 23:14:00 p180g-f13 kernel: [<f019df5a>] i915_gem_set_tiling+0x128/0x166 [i915] Aug 1 23:14:00 p180g-f13 kernel: [<f0105840>] drm_ioctl+0x237/0x317 [drm] Aug 1 23:14:00 p180g-f13 kernel: [<f019de32>] ? i915_gem_set_tiling+0x0/0x166 [i915] Aug 1 23:14:00 p180g-f13 kernel: [<c05723af>] ? file_has_perm+0x87/0xa1 Aug 1 23:14:00 p180g-f13 kernel: [<c04dadf9>] vfs_ioctl+0x27/0x91 Aug 1 23:14:00 p180g-f13 kernel: [<f0105609>] ? drm_ioctl+0x0/0x317 [drm] Aug 1 23:14:00 p180g-f13 kernel: [<c04db39a>] do_vfs_ioctl+0x48e/0x4cc Aug 1 23:14:00 p180g-f13 kernel: [<c0572635>] ? selinux_file_ioctl+0x3e/0x41 Aug 1 23:14:00 p180g-f13 kernel: [<c04db419>] sys_ioctl+0x41/0x61 Aug 1 23:14:00 p180g-f13 kernel: [<c07830fc>] syscall_call+0x7/0xb Aug 1 23:14:00 p180g-f13 kernel: [<c0780000>] ? rcu_init_percpu_data.clone.0+0x88/0xa2 Aug 1 23:14:00 p180g-f13 kernel: ---[ end trace 0c3de01e459e46d9 ]---
(In reply to comment #64) > In 2.6.33.6-147.fc13.i686.PAE #1 SMP Tue Jul 6 22:24:44 UTC 2010 i686 i686 i386 > GNU/Linux, similar > This is fixed in 2.6.32.16-150.fc12 and 2.6.33.6-147.2.4.fc13. But someone should confirm that...
2.6.32.16-150.fc12 confirmed! Works good, lasts a long time... Thanks!
2.6.32.16-150.fc12 does not work with Pineview (Atom N450, ICH8) Apr 21 15:31:08 pcm3362 kernel: [drm:drm_mode_rmfb] *ERROR* tried to remove a fb that we didn't own Apr 21 15:31:08 pcm3362 kernel: [drm:drm_mode_rmfb] *ERROR* tried to remove a fb that we didn't own Apr 21 15:31:09 pcm3362 kernel: [drm] Big FIFO is enabled Apr 21 15:31:09 pcm3362 kernel: [drm:i915_gem_madvise_ioctl] *ERROR* Attempted i915_gem_madvise_ioctl() on a pinned object Apr 21 15:31:09 pcm3362 kernel: [drm] Big FIFO is disabled Apr 21 15:31:09 pcm3362 kernel: [drm] Big FIFO is disabled Apr 21 15:31:09 pcm3362 kernel: [drm] Big FIFO is disabled Apr 21 15:31:15 pcm3362 rtkit-daemon[1583]: Sucessfully made thread 1581 of process 1581 (/usr/bin/pulseaudio) owned by '42' high priority at nice level -11. Apr 21 15:31:16 pcm3362 kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Apr 21 15:31:16 pcm3362 kernel: render error detected, EIR: 0x00000000 Apr 21 15:31:16 pcm3362 kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 59 at 58) Apr 21 15:31:17 pcm3362 abrt[1593]: saved core dump of pid 1513 (/usr/bin/Xorg) to /var/cache/abrt/ccpp-1019428276-1513.new/coredump (4898816 bytes) Apr 21 15:31:17 pcm3362 abrtd: Directory 'ccpp-1019428276-1513' creation detected Apr 21 15:31:17 pcm3362 kernel: [drm] Big FIFO is enabled Apr 21 15:31:17 pcm3362 kernel: [drm] Big FIFO is enabled Apr 21 15:31:17 pcm3362 abrtd: Crash is in database already (dup of /var/cache/abrt/ccpp-1019426824-1589) Apr 21 15:31:17 pcm3362 abrtd: Deleting crash ccpp-1019428276-1513 (dup of ccpp-1019426824-1589), sending dbus signal Apr 21 15:31:17 pcm3362 kernel: [drm] Big FIFO is enabled Apr 21 15:31:17 pcm3362 kernel: [drm] Big FIFO is enabled Apr 21 15:31:17 pcm3362 kernel: [drm] Big FIFO is enabled Apr 21 15:31:17 pcm3362 kernel: [drm] Big FIFO is disabled Apr 21 15:31:17 pcm3362 kernel: [drm] Big FIFO is disabled Apr 21 15:31:17 pcm3362 kernel: [drm] Big FIFO is disabled Apr 21 15:31:18 pcm3362 gdm-simple-slave[1512]: WARNING: Child process -1533 was already dead. Apr 21 15:31:18 pcm3362 gdm-simple-slave[1512]: WARNING: Unable to kill D-Bus daemon Apr 21 15:31:18 pcm3362 abrt[1600]: not dumping repeating crash in '/usr/bin/Xorg' Booting a freshly 'baked' LiveCD with kernel 2.6.32.16-150.fc12 works fine. The only difference here is that the LiveCD logs-in automatically. After installing the OS and a reboot, I can see the login screen for a fraction of a second and then I get a black screen. I cannot switch to a terminal but I can access the OS through SSH. Enabling 'AutomaticLogin' in /etc/gdm/custom.conf seems to workaround this issue. kernel 2.6.32.16-150.fc12 works fine with Intel 945GME chipset. I did not experience issues with kernel 2.6.33.6-147.fc13 and 2.6.33.6-147.2.4.fc13 with Atom N450 (Pineview) or i945GME
I can confirm that this bug still exists in: -bash-4.0# uname -a Linux client-192.168.0.244 2.6.32.16-150.fc12.i686 #1 SMP Sat Jul 24 05:31:53 UTC 2010 i686 i686 i386 GNU/Linux on a: -bash-4.0# lspci | grep VGA 00:02.0 VGA compatible controller: Intel Corporation Mobile 945GME Express Integrated Graphics Controller (rev 03) with this in the log: -bash-4.0# tail -n 3 /var/log/messages Aug 4 07:57:35 client-192 kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Aug 4 07:57:35 client-192 kernel: render error detected, EIR: 0x00000000 Aug 4 07:57:35 client-192 kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 15258 at 15256) The board is a D945GSEJT. The crash happened after using the system for about 2 minutes.
This seems fixed for me in 2.6.33.6-147.2.4.fc13.i686.PAE (fedora13). # lspci | grep VGA 00:02.0 VGA compatible controller: Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 03) Thanks!
Not resolved for me.... /var/log/messages ================= Aug 4 08:38:57 client1 kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Aug 4 08:38:57 client1 kernel: render error detected, EIR: 0x00000000 Aug 4 08:38:57 client1 kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 31647 at 31643) cat /proc/cpuinfo | grep 'model name' ===================================== model name : Intel(R) Pentium(R) 4 CPU 2.40GHz uname -a ======== Linux client1 2.6.33.6-147.2.4.fc13.i686 #1 SMP Fri Jul 23 17:27:40 UTC 2010 i686 i686 i386 GNU/Linux lspci | grep VGA ================ lspci -s 00:02.* -vnn 00:02.0 VGA compatible controller [0300]: Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device [8086:2562] (rev 01) (prog-if 00 [VGA controller]) Subsystem: Dell Device [1028:0126] Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at e8000000 (32-bit, prefetchable) [size=128M] Memory at ff680000 (32-bit, non-prefetchable) [size=512K] Expansion ROM at <unassigned> [disabled] Capabilities: [d0] Power Management version 1 Kernel driver in use: i915 Kernel modules: i915
2.6.33.6-147.2.4.fc13.x86_64 appears to fix this issue for me as well
2.6.33.6-147.2.4.fc13.x86_64 seems to have fixed it here too, been running stable for a full day now which is far more than the previous attempts (up to and including 2.6.33.5-112.fc13.x86_64) ever survived.
I installed 2.6.33.6-147.2.4 on July 29 and have not have X / intel driver crash on me.
A new report from me on the same machine (as was prev. used with this ticket/bugzilla entry): 1) kernel 2.6.33.3-85.fc13.x86_64, uptime 50 days, crashed/freeze with /var/log/messages: Aug 12 13:50:39 puga kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Aug 12 13:50:39 puga kernel: render error detected, EIR: 0x00000000 Aug 12 13:50:39 puga kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 25696948 at 25696946) done a reboot: 2) same kernel 2.6.33.3-85.fc13.x86_64, uptime 5 minutes, crashed/freeze with the messages: Aug 12 14:01:11 puga kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Aug 12 14:01:11 puga kernel: render error detected, EIR: 0x00000000 Aug 12 14:01:11 puga kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 13149 at 13147) 3) updated kernel by running "yum update kernel" to 2.6.33.6-147.2.4.fc13.x86_64, uptime 7 minutes, crashed/freeze with messages: Aug 12 14:14:23 puga kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 13256 at 13255) Aug 12 14:14:23 puga kernel: [drm:i915_gem_do_execbuffer] *ERROR* Failed to pin buffer 1 of 2, total 83902464 bytes: -5 Aug 12 14:14:23 puga kernel: [drm:i915_gem_do_execbuffer] *ERROR* 594 objects [6 pinned], 433385472 object bytes [24612864 pinned], 192401408/260308992 gtt bytes
forgot to mention in the last comment, that after updating kernel with yum in third(3) try, I'm also (re)booted into it and had a crash. running x86_64 FC13.
It still exists, but seems occur less frequently. Linux p180g-f13 2.6.33.6-147.2.4.fc13.i686.PAE #1 SMP Fri Jul 23 17:21:06 UTC 2010 i686 i686 i386 GNU/Linux Aug 12 23:14:59 p180g-f13 kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Aug 12 23:14:59 p180g-f13 kernel: render error detected, EIR: 0x00000000 Aug 12 23:14:59 p180g-f13 kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 2255 at 2253) (In reply to comment #64) > In 2.6.33.6-147.fc13.i686.PAE #1 SMP Tue Jul 6 22:24:44 UTC 2010 i686 i686 i386 > GNU/Linux, similar > > > Aug 1 23:14:00 p180g-f13 kernel: [drm:i915_hangcheck_elapsed] *ERROR* > Hangcheck timer elapsed... GPU hung > Aug 1 23:14:00 p180g-f13 kernel: render error detected, EIR: 0x00000000 > Aug 1 23:14:00 p180g-f13 kernel: [drm:i915_do_wait_request] *ERROR* > i915_do_wait_request returns -5 (awaiting 19607 at 19605) > Aug 1 23:14:00 p180g-f13 kernel: ------------[ cut here ]------------ > Aug 1 23:14:00 p180g-f13 kernel: WARNING: at > drivers/gpu/drm/i915/i915_gem_tiling.c:332 i915_gem_set_tiling+0x128/0x166 > [i915]() > Aug 1 23:14:00 p180g-f13 kernel: Hardware name: > Aug 1 23:14:00 p180g-f13 kernel: failed to reset object for tiling switch > Aug 1 23:14:00 p180g-f13 kernel: Modules linked in: fuse ipt_MASQUERADE > xt_mark xt_MARK iptable_mangle nf_nat_irc nf_conntrack_irc nf_nat_ftp > nf_conntrack_ftp i$ > Aug 1 23:14:00 p180g-f13 kernel: Pid: 1824, comm: Xorg Not tainted > 2.6.33.6-147.fc13.i686.PAE #1 > Aug 1 23:14:00 p180g-f13 kernel: Call Trace: > Aug 1 23:14:00 p180g-f13 kernel: [<c043d69d>] warn_slowpath_common+0x65/0x7c > Aug 1 23:14:00 p180g-f13 kernel: [<f019df5a>] ? > i915_gem_set_tiling+0x128/0x166 [i915] > Aug 1 23:14:00 p180g-f13 kernel: [<c043d6e8>] warn_slowpath_fmt+0x24/0x27 > Aug 1 23:14:00 p180g-f13 kernel: [<f019df5a>] i915_gem_set_tiling+0x128/0x166 > [i915] > Aug 1 23:14:00 p180g-f13 kernel: [<f0105840>] drm_ioctl+0x237/0x317 [drm] > Aug 1 23:14:00 p180g-f13 kernel: [<f019de32>] ? i915_gem_set_tiling+0x0/0x166 > [i915] > Aug 1 23:14:00 p180g-f13 kernel: [<c05723af>] ? file_has_perm+0x87/0xa1 > Aug 1 23:14:00 p180g-f13 kernel: [<c04dadf9>] vfs_ioctl+0x27/0x91 > Aug 1 23:14:00 p180g-f13 kernel: [<f0105609>] ? drm_ioctl+0x0/0x317 [drm] > Aug 1 23:14:00 p180g-f13 kernel: [<c04db39a>] do_vfs_ioctl+0x48e/0x4cc > Aug 1 23:14:00 p180g-f13 kernel: [<c0572635>] ? selinux_file_ioctl+0x3e/0x41 > Aug 1 23:14:00 p180g-f13 kernel: [<c04db419>] sys_ioctl+0x41/0x61 > Aug 1 23:14:00 p180g-f13 kernel: [<c07830fc>] syscall_call+0x7/0xb > Aug 1 23:14:00 p180g-f13 kernel: [<c0780000>] ? > rcu_init_percpu_data.clone.0+0x88/0xa2 > Aug 1 23:14:00 p180g-f13 kernel: ---[ end trace 0c3de01e459e46d9 ]---
-147 had improved the situation, but I'm still seeing this pretty frequently (3 times today). $ uname -a Linux maunalani.home 2.6.33.6-147.2.4.fc13.x86_64 #1 SMP Fri Jul 23 17:14:44 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux Aug 13 17:28:18 maunalani kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Aug 13 17:28:18 maunalani kernel: render error detected, EIR: 0x00000000 Aug 13 17:28:18 maunalani kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 709664 at 709656) Seems to happen a lot when browsing sites that use flash. The Amazon Web Services console was doing it today. I was also seeing a lot of visual corruption before the hangs.
Sorry, that should have been "I thought -147 had improved the situation, but...".
I got hit by this with the following in syslog: 22:52:53,674 ERR kernel:[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung 22:52:53,691 ERR kernel:render error detected, EIR: 0x00000000 22:52:53,698 ERR kernel:[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 7366 at 7364) while in anaconda running an upgrade to Fedora 13. It happened in the middle of packages installation. Luckily anaconda continued "in blind" with a frozen display (while a keyboard was still operational). Kernel used by this anaconda is 2.6.33.3-85.fc13.i686. So far nothing like that in a "normal operation". A different kernel though - 2.6.33.6-147.2.4.fc13.i686.
I also have the same problem with 2.6.33.6-147.2.4.fc13.i686 and Intel 82852/855GM Integrated Graphics Device.
(In reply to comment #76) > It still exists, but seems occur less frequently. Ditto. After a long troublefree period (including numerous suspend+wakeups and all) I got this again yesterday (kernel-2.6.33.6-147.2.4.fc13.x86_64): Aug 24 10:19:38 dhcp102 kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Aug 24 10:19:38 dhcp102 kernel: render error detected, EIR: 0x00000000 Aug 24 10:19:38 dhcp102 kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 5229943 at 5229941) Now to try with kernel-2.6.33.8-149.fc13.x86_64...
I get this with kernel 2.6.34.6-47.fc13.i686 Sep 4 08:58:02 octogonapus kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Sep 4 08:58:02 octogonapus kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 8983455 at 8983452) Sep 4 08:58:02 octogonapus kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Sep 4 08:58:02 octogonapus kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 8983460 at 8983452) Sep 4 08:58:03 octogonapus kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Sep 4 08:58:03 octogonapus kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung S I'm happy to send more detail if requested.
Is this issue resolved?
I have not gotten anything from any recent crashes which have just been hard locks which require a hard reboot. On 2010-09-06 [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 420291 at 420290) I have not had any crashes yet on 2.6.34.7-56.fc13.x86_64 but hav not been using it that long.
This is still happening, working on a Asis s5200N the i915 is pretty unstable on F13. Seems to be the exact some issue. Any extra info I can provide?
The issue of X hanging is still happening, though when it happened tonight it had the messages below in the Xorg log. Maybe I have two issues. [mi] EQ overflowing. The server is probably stuck in an infinite loop. Backtrace: ...
(In reply to comment #86) > The issue of X hanging is still happening, though when it happened tonight it > had the messages below in the Xorg log. Maybe I have two issues. > > [mi] EQ overflowing. The server is probably stuck in an infinite loop. Please, ignore this message ... it is a red herring, Xserver makes this message whenever it feels nervous and it doesn't mean much anything these days.
(In reply to comment #87) > (In reply to comment #86) > > The issue of X hanging is still happening, though when it happened tonight it > > had the messages below in the Xorg log. Maybe I have two issues. > > > > [mi] EQ overflowing. The server is probably stuck in an infinite loop. > > Please, ignore this message ... it is a red herring, Xserver makes this message > whenever it feels nervous and it doesn't mean much anything these days. Agreed, but the next line "Backtrace" and the stuff that follows is an X crash and that hoses up the whole system :(
I had this bug on F13/2.6.33.x and 2.6.34.x i686.PAE kernels, and now on F14/2.6.35.x i686.PAE kernels too. GPU freeze usualy after a few hours to few days of running. I was also seeing some visual corruption before the hangs. In /var/log/messages appears lines as: Oct 13 23:30:15 ws22 kernel: [ 4284.741008] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Oct 13 23:30:15 ws22 kernel: [ 4284.741161] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 123250 at 123249) Oct 13 23:30:15 ws22 kernel: [ 4284.824008] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Oct 13 23:30:15 ws22 kernel: [ 4284.824053] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 123264 at 123249) Oct 13 23:30:15 ws22 kernel: [ 4285.179129] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Oct 13 23:30:16 ws22 kernel: [ 4285.786008] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Oct 13 23:30:16 ws22 kernel: [ 4285.786055] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 123278 at 123249) Oct 13 23:30:16 ws22 kernel: [ 4285.861003] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Oct 13 23:30:16 ws22 kernel: [ 4286.317129] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Oct 13 23:30:17 ws22 kernel: [ 4286.907011] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung "/var/log/Xorg.0.log" ends as: [ 4324.202] (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error. (these lines appears in this log more frequently, usually one to two between other records. After display freeze, there is several tens these lines in log. HW is Asus P5E-VM HDMI mainboard with G35 chipset, "lspci -vvvnn" output: 00:02.0 VGA compatible controller [0300]: Intel Corporation 82G35 Express Integrated Graphics Controller [8086:2982] (rev 03) (prog-if 00 [VGA controller]) Subsystem: ASUSTeK Computer Inc. Device [1043:8276] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 44 Region 0: Memory at fe800000 (32-bit, non-prefetchable) [size=1M] Region 2: Memory at d0000000 (64-bit, prefetchable) [size=256M] Region 4: I/O ports at cc00 [size=8] Expansion ROM at <unassigned> [disabled] Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: fee0300c Data: 4191 Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: i915 Kernel modules: i915 00:02.1 Display controller [0380]: Intel Corporation 82G35 Express Integrated Graphics Controller [8086:2983] (rev 03) Subsystem: ASUSTeK Computer Inc. Device [1043:8276] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Region 0: Memory at fe900000 (32-bit, non-prefetchable) [size=1M] Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
(In reply to comment #88) > Agreed, but the next line "Backtrace" and the stuff that follows is an X crash > and that hoses up the whole system :( Yes, of course, silly me for skipping over this one! Please, could you attach here the /var/log/Xorg.0.log with Backtrace: in it?
Created attachment 455208 [details] Xorg.1.log after crash Was running `ls` inside a KDE4 konsole immediately before crash. Desktop effects enabled and functional. Mouse still moved after crash but everything else frozen. SSH access worked fine to get the log. Nothing about the X crash logged in /var/log/messages.
(In reply to comment #91) > Created attachment 455208 [details] > Xorg.1.log after crash Hmmm... just realized I got this crash while running an older kernel. Let me try the latest and see if the problem goes away or changes.
OK, so I upgraded kernel and within a few minutes got another crash, this time I got kicked back to a [dead] text console with only this at the end of the Xorg.1.log: Fatal server error: Failed to submit batchbuffer: Input/output error Please consult the Fedora Project support at http://bodhi.fedoraproject.org/ for help. Please also check the log file at "/var/log/Xorg.1.log" for additional information. Hmmm... looks like that may be bug 571525.
Sorry for all the comments... just discovered that /var/log/messages has this at the end: Oct 22 20:48:51 client-192 kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Oct 22 20:48:51 client-192 kernel: render error detected, EIR: 0x00000000 Oct 22 20:48:51 client-192 kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 87048 at 87043) Currently running: Linux client-192.168.0.5 2.6.32.21-168.fc12.i686.PAE #1 SMP Wed Sep 15 16:18:39 UTC 2010 i686 i686 i386 GNU/Linux
I have the same problem. Here are my logs. It happens to me when using a lot of flash in firefox. Only a reboot helps me. Killing all processes doesn't help. No app that requires graphics memory can be started. kernel-2.6.34.7-56.fc13.i686 xorg-x11-drivers-7.3-14.fc13.i686 npviewer.bin[16718]: segfault at 0 ip 01186741 sp bfb16d10 error 4 in libflashplayer.so[de7000+b2c000] npviewer.bin[16894]: segfault at 418 ip 01035dd6 sp bf884138 error 6 in libflashplayer.so[de7000+b2c000] [drm:i915_gem_do_execbuffer] *ERROR* Failed to pin buffer 2 of 3, total 58724352 bytes, 0 fences: -28 [drm:i915_gem_do_execbuffer] *ERROR* 555 objects [5 pinned], 179732480 object bytes [43139072 pinned], 43139072/100794368 gtt bytes npviewer.bin[16929]: segfault at 0 ip 01186741 sp bf932450 error 4 in libflashplayer.so[de7000+b2c000] npviewer.bin[16948]: segfault at 0 ip 011d4741 sp bf94b760 error 4 in libflashplayer.so[e35000+b2c000] npviewer.bin[18431]: segfault at 418 ip 01035dd6 sp bf91f328 error 6 in libflashplayer.so[de7000+b2c000] npviewer.bin[19955]: segfault at b730604c ip 01186757 sp bfe9eb60 error 4 in libflashplayer.so[de7000+b2c000] npviewer.bin[20123]: segfault at 418 ip 01035dd6 sp bf8e73f8 error 6 in libflashplayer.so[de7000+b2c000] npviewer.bin[20329]: segfault at 418 ip 01035dd6 sp bfb77098 error 6 in libflashplayer.so[de7000+b2c000] npviewer.bin[8400]: segfault at 418 ip 01043dd6 sp bfb40d88 error 6 in libflashplayer.so[df5000+b2c000] npviewer.bin[8523]: segfault at b74e904c ip 01194757 sp bf870690 error 4 in libflashplayer.so[df5000+b2c000] npviewer.bin[8547]: segfault at 0 ip 01194741 sp bfdb1360 error 4 in libflashplayer.so[df5000+b2c000] npviewer.bin[8616]: segfault at 0 ip 011f9741 sp bff1f620 error 4 in libflashplayer.so[e5a000+b2c000] npviewer.bin[8656]: segfault at 418 ip 01043dd6 sp bfa94e78 error 6 in libflashplayer.so[df5000+b2c000] npviewer.bin[8707]: segfault at 418 ip 01043dd6 sp bfe1ce38 error 6 in libflashplayer.so[df5000+b2c000] npviewer.bin[8894]: segfault at 418 ip 01043dd6 sp bfa9b708 error 6 in libflashplayer.so[df5000+b2c000] npviewer.bin[8950]: segfault at b758804c ip 01194757 sp bff07470 error 4 in libflashplayer.so[df5000+b2c000] npviewer.bin[9018]: segfault at b73a604c ip 01194757 sp bfdcf600 error 4 in libflashplayer.so[df5000+b2c000] npviewer.bin[9048]: segfault at b73e304c ip 01194757 sp bfbd7f00 error 4 in libflashplayer.so[df5000+b2c000] npviewer.bin[9067]: segfault at 418 ip 01106dd6 sp bfe78e58 error 6 in libflashplayer.so[eb8000+b2c000] npviewer.bin[9093]: segfault at b748304c ip 01194757 sp bf931f60 error 4 in libflashplayer.so[df5000+b2c000] npviewer.bin[9391]: segfault at 0 ip 01194741 sp bfafeeb0 error 4 in libflashplayer.so[df5000+b2c000] npviewer.bin[9686]: segfault at 418 ip 01043dd6 sp bfff4bc8 error 6 in libflashplayer.so[df5000+b2c000] npviewer.bin[9742]: segfault at 418 ip 01043dd6 sp bfde83c8 error 6 in libflashplayer.so[df5000+b2c000] npviewer.bin[9860]: segfault at b734904c ip 01194757 sp bf9653c0 error 4 in libflashplayer.so[df5000+b2c000] npviewer.bin[9879]: segfault at 418 ip 01043dd6 sp bf826ef8 error 6 in libflashplayer.so[df5000+b2c000] npviewer.bin[9897]: segfault at 0 ip 01194741 sp bf9b0750 error 4 in libflashplayer.so[df5000+b2c000] npviewer.bin[10262]: segfault at b731804c ip 01194757 sp bfa32120 error 4 in libflashplayer.so[df5000+b2c000] npviewer.bin[10283]: segfault at 418 ip 01043dd6 sp bfbb7278 error 6 in libflashplayer.so[df5000+b2c000] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 4188907 at 4188904) ------------[ cut here ]------------ WARNING: at drivers/gpu/drm/i915/i915_gem_tiling.c:337 i915_gem_set_tiling+0x156/0x1ad [i915]() Hardware name: DW137A-ABA A445W failed to reset object for tiling switch Modules linked in: vfat fat sit tunnel4 aes_i586 aes_generic fuse tun ipv6 p4_clockmod arc4 ecb zd1211rw snd_intel8x0 snd_ac97_codec ac97_bus mac80211 snd_seq cfg80211 snd_seq_device snd_pcm snd_timer 8139too snd iTCO_wdt iTCO_vendor_support 8139cp serio_raw ppdev parport_pc mii i2c_i801 parport soundcore rfkill joydev snd_page_alloc microcode usb_storage firewire_ohci firewire_core crc_itu_t i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan] Pid: 1331, comm: Xorg Tainted: G W 2.6.34.7-56.fc13.i686 #1 Call Trace: [<c0438822>] warn_slowpath_common+0x6a/0x81 [<f7dc4570>] ? i915_gem_set_tiling+0x156/0x1ad [i915] [<c0438877>] warn_slowpath_fmt+0x29/0x2c [<f7dc4570>] i915_gem_set_tiling+0x156/0x1ad [i915] [<f7d33ad8>] drm_ioctl+0x26d/0x359 [drm] [<f7dc441a>] ? i915_gem_set_tiling+0x0/0x1ad [i915] [<c04095cf>] ? restore_i387_fxsave+0x68/0x79 [<c04dc69d>] vfs_ioctl+0x2c/0x96 [<f7d3386b>] ? drm_ioctl+0x0/0x359 [drm] [<c04dcc33>] do_vfs_ioctl+0x488/0x4c6 [<c0409ba4>] ? restore_i387_xstate+0x1a9/0x1e0 [<c04d16f3>] ? fsnotify_access+0x54/0x5f [<c0479599>] ? audit_syscall_entry+0x118/0x13a [<c04dccb7>] sys_ioctl+0x46/0x66 [<c079093c>] syscall_call+0x7/0xb ---[ end trace 6e81fd0f0f850f59 ]--- npviewer.bin[10320]: segfault at b753b054 ip 01194733 sp bfde1600 error 4 in libflashplayer.so[df5000+b2c000] wlan0: deauthenticating from 00:1c:10:92:20:bc by local choice (reason=3)
This message is a reminder that Fedora 12 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 12. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '12'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 12's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 12 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
I have this problem with FC13.
Can you please update the OS on this to FC13?
Which problem are you all having? The original bug was about messages like this: render error detected, EIR: 0x00000010 [drm:i915_handle_error] *ERROR* EIR stuck: 0x00000010, masking render error detected, EIR: 0x00000010 For me, this has been solved for a couple of months. Based on the past couple posts, I think it's time to close this bug, which has been solved, and open new bugs for new problems.
I see - I'll open a new one. The one I and at least many others have been experiencing is a different one, but with the same consequences.
Thank you.