Bug 570517 - intel crash with KMS on resume/startup ... "Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error."
intel crash with KMS on resume/startup ... "Failed to submit batch buffer, ex...
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-intel (Show other bugs)
13
All Linux
low Severity medium
: ---
: ---
Assigned To: Adam Jackson
Fedora Extras Quality Assurance
card_845G
: Triaged
: 572772 573462 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-03-04 10:47 EST by Jason Long
Modified: 2011-06-27 11:05 EDT (History)
16 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-06-27 11:05:09 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Xorg server log file (148.14 KB, text/plain)
2010-03-04 10:55 EST, Jason Long
no flags Details
Kernel messages (dmesg) (28.16 KB, text/plain)
2010-03-04 10:58 EST, Jason Long
no flags Details
Xorg.0.log (287.63 KB, text/plain)
2010-04-15 07:23 EDT, Antal KICSI
no flags Details
Output of dmesg (31.45 KB, text/plain)
2010-04-15 07:25 EDT, Antal KICSI
no flags Details
content of /var/log/messages after activating desktop effects (120.53 KB, text/plain)
2010-04-19 06:51 EDT, Antal KICSI
no flags Details

  None (edit)
Description Jason Long 2010-03-04 10:47:44 EST
Description of problem:

When selecting "Enable Desktop Effects", all windows disappear and never come back. I can still move the mouse pointer, and switch vts (e.g. with Ctrl+Atl+F2), but the X server is pretty much unusable.


Hardware:

Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device rev 3

http://www.smolts.org/client/show/pub_da300dbf-46a7-4008-bb51-a07d472fc6bd


Version-Release number of relevant components:

xorg-x11-drv-intel-2.10.0-4.fc13.i686
xorg-x11-server-Xorg-1.7.99.901-8.20100223.fc13.i686
libdrm-2.4.18-0.1.fc13.i686
kernel-PAE-2.6.33-1.fc13.i686 



How reproducible:
very much. It happened 4 times out of 4 tries.


Steps to Reproduce:
1. desktop effects are not enabled
2. menu -> System -> Preferences -> Desktop Effects
3. click Compiz

  
Actual results:
The Gnome panels disappear. All open windows disappear. The icons on the desktop disappear. The desktop wallpaper is still visible, and the mouse pointer is still visible. I can move the mouse pointer. When I move the mouse pointer to certain locations on the screen, the cursor changes depending on what was at that point on the screen (i.e. it looks like the X server still knows where all the windows were and acts as if they were still there).


Expected results:
Open windows may briefly flash, but they should come back with desktop effects enabled.


Additional info:
KMS is enabled (i.e. "nomodeset" is NOT specified)
The file /etc/X11/xorg.conf does not exist.
Comment 1 Jason Long 2010-03-04 10:55:23 EST
Created attachment 397834 [details]
Xorg server log file

This is the Xorg log file from startup to after enabling desktop events.

I clicked Compiz at approx. timestamp 127. It appears the server performed a card scan/initialization at that time.

Then the log file just started filling up with lines like this

intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
Comment 2 Jason Long 2010-03-04 10:58:40 EST
Created attachment 397836 [details]
Kernel messages (dmesg)

I'm attaching the complete output of dmesg, after trying to enable desktop effects.

These lines (from the attachment) occurred shortly AFTER I clicked the "Compiz" button.

[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
render error detected, EIR: 0x00000000
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 616
 at 615)
compiz[2210]: segfault at 2eeaa0 ip 002eeaa0 sp bff1cf10 error 4 in libXfixes.so
.3.1.0[397000+4000]
Comment 3 Jason Long 2010-03-04 11:16:21 EST
Correction.

in comment 1, that should say timestamp 160. It was timestamp 160 when I tried to enable desktop effects.

Also,
in comment 2, the kernel logs...

the first two messages appear almost immediately. The compiz-segfault message occurs 40 seconds later.
Comment 4 Adam Williamson 2010-03-04 12:37:33 EST
Thanks for the comprehensive report.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 5 Antal KICSI 2010-03-11 15:35:39 EST
I have the same problem with F13-Alpha-i686-Live.iso

Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device
(rev 01)
Comment 6 Matěj Cepl 2010-03-15 15:18:05 EDT
*** Bug 573462 has been marked as a duplicate of this bug. ***
Comment 7 Matěj Cepl 2010-03-15 15:21:59 EDT
*** Bug 572772 has been marked as a duplicate of this bug. ***
Comment 8 Matěj Cepl 2010-03-15 15:27:59 EDT
*** Bug 537494 has been marked as a duplicate of this bug. ***
Comment 9 Andrew Duggan 2010-03-15 15:57:58 EDT
(In reply to comment #8)
> *** Bug 537494 has been marked as a duplicate of this bug. ***    

I disagree that 537494 a duplicate of this.   The first item it the description of 537494 is about hibernate/thaw with KMS not a compiz / X crash.  I can trigger 537494 without X running even, just by booting to single-user mode and running pm-hibernate and then thawing. In the ancestor bug of 537494 (500983), the segfault is in "ld-linux.so.2[4704]" See comment 18 of 500983. If you really must close 537494, it would be more honest to close it with a wontfix.
Comment 10 Bojan Smojver 2010-03-15 17:08:10 EDT
(In reply to comment #9)
 
> I disagree that 537494 a duplicate of this.

Yes. Matej, based on what did you conclude that bug #537494 is a duplicate of this bug? Bug #537494 is _completely_ different.
Comment 11 Matěj Cepl 2010-03-15 17:44:51 EDT
(In reply to comment #10)
> Yes. Matej, based on what did you conclude that bug #537494 is a duplicate of
> this bug? Bug #537494 is _completely_ different.    

There seem to be mix of different issues in that one bug. But when looking at attachment 399319 [details] (which is from the original reporter, so hopefully relevant), take a look at the end ... all those "Failed submit batch buffer" errors, which are what we cover here.

> If you really must close 537494, it would be more honest to close it with
> a wontfix.    

Nonsense, I never close as duplicate something which I would like to WONTFIX. I know how to use WONTFIX and I don't hesitate to use it when I want it ... that isn't the case here.
Comment 12 Andrew Duggan 2010-03-15 18:22:08 EDT
(In reply to comment #11)

> take a look at the end ... all those "Failed submit batch buffer" errors, which
> are what we cover here.
> 

What does that have to do with userspace segfaults on thaw?  If bug #537494 is really the same how can I trigger it without X running.  
 
> > If you really must close 537494, it would be more honest to close it with
> > a wontfix.    
> 
> Nonsense, I never close as duplicate something which I would like to WONTFIX. I
> know how to use WONTFIX and I don't hesitate to use it when I want it ... that
> isn't the case here.    

I wasn't questioning your expertise with respect to bugzilla, which I know is considerable, I was questioning your understanding of the issues at stake in bug #537494.  

Bottom line -- are you willing to keep this bug open even when the Jason's problem is fixed, but the symptoms of bug #537494 and its ancestor #500983 are still present, meaning hibernate/thaw still don't work with KMS?
Comment 13 Bojan Smojver 2010-03-15 18:30:30 EDT
(In reply to comment #11)

> There seem to be mix of different issues in that one bug. But when looking at
> attachment 399319 [details] (which is from the original reporter, so hopefully
> relevant), take a look at the end ... all those "Failed submit batch buffer"
> errors, which are what we cover here.

These are all from F-13 alpha, because I got asked to install that and see if
it's still the same there. I cannot replicate the behaviour of F-12 exactly,
because hibernate/thaw doesn't work in F-13 yet.

Bug #537494 is about F-12. I haven't noticed any "failed to submit batch
buffer" errors there, but maybe the verbosity of X in F-12 is different or
something. Or, it could be a different problem.

As Andrew pointed out, he can replicate bug #537494 on his hardware (which is
essentially the same as mine) using single user mode only - no X at all. I
haven't really tried, but given the symptoms, I do not doubt it. Just look at
that photo I attached - programs are getting signal 11 on thaw.
Comment 14 Witek Mozga 2010-03-16 10:44:08 EDT
I guess this bug is also duplicated here: https://bugzilla.redhat.com/show_bug.cgi?id=530179 (here I tried to enable compiz)

However I`m not sure if it's duplicated here https://bugzilla.redhat.com/show_bug.cgi?id=573462 because (here I wasn`t able to start X at all)

Anyway both issues might be much related.
Comment 15 Bojan Smojver 2010-03-17 18:20:49 EDT
Matej,

Can we please get a reply regarding bug #537494?
Comment 16 Matěj Cepl 2010-03-20 19:40:41 EDT
(In reply to comment #13)
> Bug #537494 is about F-12. I haven't noticed any "failed to submit batch
> buffer" errors there, but maybe the verbosity of X in F-12 is different or
> something. Or, it could be a different problem.

Yes, you are right. My mistake. Reopening bug 537494
Comment 17 Bojan Smojver 2010-04-08 02:11:13 EDT
Just updated my F-13 instance to the latest, including -24 kernel. Still seeing artefacts on the screen.
Comment 18 Bojan Smojver 2010-04-11 19:06:23 EDT
Time for 2.11.0?
Comment 19 Bojan Smojver 2010-04-13 03:51:13 EDT
Just checked and artefacts are only present when compiz is used. With metacity, there is no trouble.
Comment 20 Bojan Smojver 2010-04-13 06:17:19 EDT
The artefacts problem appears to be also described in bug #581424.
Comment 21 Antal KICSI 2010-04-15 07:19:08 EDT
I still have the same problem stated by Jason Long in comment#1 with F13-Beta-i686-Live.iso.

My graphic device is : Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 01)
Comment 22 Antal KICSI 2010-04-15 07:23:29 EDT
Created attachment 406742 [details]
Xorg.0.log

Xorg.0.log with F13-Beta-i686-Live.iso after activating Desktop Effects using default configuration (KMS enabled, no xorg.conf).
Comment 23 Antal KICSI 2010-04-15 07:25:55 EDT
Created attachment 406743 [details]
Output of dmesg

Output of dmesg with F13-Beta-i686-Live.iso after activating Desktop Effects using default configuration (KMS enabled, no xorg.conf). The X server is unusable, I get these files via ssh.
Comment 24 Fedora Update System 2010-04-16 13:54:53 EDT
xorg-x11-drv-intel-2.11.0-1.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/xorg-x11-drv-intel-2.11.0-1.fc13
Comment 25 Adam Williamson 2010-04-16 14:20:25 EDT
A new version of the Intel driver, xorg-x11-drv-intel-2.11.0-1.fc13, has just been added: https://admin.fedoraproject.org/updates/xorg-x11-drv-intel-2.11.0-1.fc13 . Can you please test with this version and see if the bug is reproducible? If you have an installed Fedora 13, you can download and install the driver from the Koji link. If you are testing with live images, the nightly live images from 2010-04-17 (or possibly 2010-04-18) onwards should include this version: http://alt.fedoraproject.org/pub/alt/nightly-composes/desktop/ . This comment is being added to all open Fedora 13 intel bugs, please ignore if it does not make sense in the context of your bug.
Comment 26 Rodrigo Ayala 2010-04-17 22:30:52 EDT
Hi!, I have installed Fedora 13 Beta, and I've installed the "xorg-x11-drv-intel-2.11.0-1.fc13" package, but the problem still persists when I enable Compiz!... 

Sorry, it doesn't work the version of the Tntel driver, at least for me.
Comment 27 Antal KICSI 2010-04-19 06:51:31 EDT
Created attachment 407547 [details]
content of /var/log/messages after activating desktop effects

I tested the new version of the Intel driver with the desktop-i386-20100418.17.iso
livecd. After activating desktop effects the sistem do not become unusable, but  compiz crashed, and the desktop was restored. In the same time an abrt message appeard with the following content :

Package:    	compiz-0.8.6-1.fc13
Latest Crash:	Mon 19 Apr 2010 10:34:51 AM 
Command:    	compiz --ignore-desktop-hints glib gconf gnomecompat --replace
Reason:     	Process /usr/bin/compiz was killed by signal 11 (SIGSEGV)
Comment:    	None
Bug Reports:	

I attached the content of /var/log/messages, in Xorg.0.log is no sign of any error.

I also tested the Intel driver on installed system but there the results are the same like in Rodrigo Ayala's comment#26.
Comment 28 Bojan Smojver 2010-04-20 05:30:13 EDT
This combination:

kernel-PAE-2.6.33.2-56.fc13.i686
xorg-x11-drv-intel-2.11.0-1.fc13.i686

Still leaving artefacts with compiz. Hardware here:

http://www.smolts.org/client/show/pub_15c8a2a8-ccd6-4aeb-99b7-14fa34899e63
Comment 29 Bojan Smojver 2010-04-20 05:45:17 EDT
(In reply to comment #28)
 
> Still leaving artefacts with compiz. Hardware here:

Actually, it's worse than that - sometimes menu and windows become completely unusable. They appear for a fraction of a second when items are supposed to be highlighted and then disappear behind the background.
Comment 30 Fedora Update System 2010-04-20 09:21:48 EDT
xorg-x11-drv-intel-2.11.0-1.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update xorg-x11-drv-intel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/xorg-x11-drv-intel-2.11.0-1.fc13
Comment 31 Fedora Update System 2010-04-21 17:51:54 EDT
xorg-x11-drv-intel-2.11.0-1.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update xorg-x11-drv-intel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/xorg-x11-drv-intel-2.11.0-1.fc13
Comment 32 Bojan Smojver 2010-05-04 02:37:50 EDT
With this combo:

kernel-PAE-2.6.33.3-72.fc13.i686
xorg-x11-server-Xorg-1.8.0-8.fc13.i686
xorg-x11-drv-intel-2.11.0-3.fc13.i686

I see no refresh problems nor artefacts on the screen. Excellent!
Comment 33 Jason Long 2010-05-12 11:25:04 EDT
Latest F13 updates, symptoms have changed since my last report (on 2010-03-04).

Now when I activate desktop effects I get the following behavior.


The Gnome panels disappear. All open windows disappear. A moment later, the panel and the open windows reappear, but this time they have *no window decorations* (i.e. no title bar). A bubble appears in the upper right reporting that "a crash in package compiz-0.8.6... has been detected". X seems to function ok without the window manager. If I turn off desktop effects, the window decorations are restored.

The bug reporting tool gives me this:
"Program /usr/bin/compiz terminated with signal 11, Segmentation fault. #0 0x00f87437 in i830_emit_state (intel=0x91824a8) at i830_vtbl.c:440."



This is
Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device
rev 3


xorg-x11-drv-intel-2.11.0-4.fc13.i686
xorg-x11-server-Xorg-1.8.0-12.fc13.i686
kernel-PAE-2.6.33.3-85.fc13.i686



I'll try to get the full backtrace, but maybe this should be filed as a separate bug?
Comment 34 trevik 2010-05-18 09:11:53 EDT
It seems that the bug is away with my configuration. F-13 is updated, driver 
xorg-x11-drv-intel-2.11.0-4.fc13.i686
also bug 572772 disappeared (no actifacts in the screnn).
HW:
    42.354] (--) PCI:*(0:0:2:0) 8086:27ae:1462:0110 Intel Corporation Mobile
945GME Express Integrated Graphics Controller rev 3, Mem @ 0xdfe80000/524288,
0xc0000000/268435456, 0xdff00000/262144, I/O @ 0x0000d0f0/8, BIOS @
0x????????/131072
[    42.354] (--) PCI: (0:0:2:1) 8086:27a6:1462:0110 Intel Corporation Mobile
945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller rev 3, Mem @
0xdfe00000/524288, BIOS @ 0x????????/65536

S/W:
X.Org X Server 1.8.0
Release Date: 2010-04-02
[    41.966] X Protocol Version 11, Revision 0
[    41.967] Build Operating System: x86-01 2.6.18-164.15.1.el5 
[    41.967] Current Operating System: Linux trevnote-lg 2.6.33.3-85.fc13.i686
#1 SMP Thu May 6 18:44:12 UTC 2010 i686
[    41.968] Kernel command line: ro root=/dev/sda4 resume=/dev/sda2
enforcing=0 intel_iommu=off LANG=en_US.UTF-8 KEYTABLE=us
[    41.968] Build Date: 02 May 2010  02:56:54PM
[    41.968] Build ID: xorg-x11-server 1.8.0-12.fc13
Comment 35 Werner Gold 2010-07-04 05:38:08 EDT
Here my X still hangs after some activities with the error message from $subj in the Xorg.log.
[wgold@joe log]$ uname -a
Linux joe 2.6.33.5-124.fc13.i686.PAE #1 SMP Fri Jun 11 09:42:24 UTC 2010 i686 i686 i386 GNU/Linux
[wgold@joe log]$ lspci
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub (rev 03)
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03)
Comment 36 Bojan Smojver 2010-07-04 05:51:11 EDT
(In reply to comment #35)

> [wgold@joe log]$ uname -a
> Linux joe 2.6.33.5-124.fc13.i686.PAE #1 SMP Fri Jun 11 09:42:24 UTC 2010 i686
> i686 i386 GNU/Linux

You don't seem  to be running the .6-142 kernel from koji, that is supposed to have the fix.
Comment 37 Bojan Smojver 2010-07-04 05:54:22 EDT
(In reply to comment #36)
> (In reply to comment #35)
> 
> > [wgold@joe log]$ uname -a
> > Linux joe 2.6.33.5-124.fc13.i686.PAE #1 SMP Fri Jun 11 09:42:24 UTC 2010 i686
> > i686 i386 GNU/Linux
> 
> You don't seem  to be running the .6-142 kernel from koji, that is supposed to
> have the fix.    

OOPS! Commented on the wrong bug, sorry :-(
Comment 38 Jeff Raber 2010-07-12 20:09:57 EDT
Jason Long, From comment 33 it looks like you are now hitting bug 586236 which is a CommonBug.  See: https://fedoraproject.org/wiki/Common_F13_bugs#compiz-i8xx


Ajax, Should this still be ON_QA?
Comment 39 Bug Zapper 2011-06-02 12:19:49 EDT
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 40 Bug Zapper 2011-06-27 11:05:09 EDT
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.