Bug 612063 - xorg: screen went black with only mouse cursor visible (segfault)
xorg: screen went black with only mouse cursor visible (segfault)
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: xorg-x11-server (Show other bugs)
6.0
All Linux
low Severity medium
: rc
: ---
Assigned To: Adam Jackson
Desktop QE
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-07-07 04:34 EDT by Stefan Assmann
Modified: 2011-10-19 13:58 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-10-19 13:58:18 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Xorg.0.log (130.98 KB, text/plain)
2010-07-07 04:34 EDT, Stefan Assmann
no flags Details

  None (edit)
Description Stefan Assmann 2010-07-07 04:34:42 EDT
Created attachment 429997 [details]
Xorg.0.log

Description of problem:
During normal work the both my screens went black. After a second only the mouse cursor reappeared but didn't move anymore. The xorg log shows a segfault. abrt didn't catch this, not sure why...

Version-Release number of selected component (if applicable):
xorg-x11-server-Xorg-1.7.7-2.el6.x86_64
xorg-x11-server-utils-7.4-15.el6.x86_64
xorg-x11-server-common-1.7.7-2.el6.x86_64
xorg-x11-drv-intel-2.11.0-4.el6.x86_64

How reproducible:
happened once since I installed beta 2 (that was 2 days ago)
Comment 1 Stefan Assmann 2010-07-07 04:44:49 EDT
After the incident I rebooted to runlevel 1 to save the log and afterwards changed back to runlevel 5. Now I'm seeing a constant 100% cpu utilization from xorg

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2253 root      20   0  235m  37m  11m R 100.6  1.0  16:32.86 Xorg
Comment 3 Adam Jackson 2010-07-07 16:22:17 EDT
2: /lib64/libpthread.so.0 (0x356fa00000+0xf440) [0x356fa0f440]
3: /usr/bin/Xorg (0x400000+0x227af) [0x4227af]
4: /usr/bin/Xorg (OtherClientGone+0x50) [0x4252f0]
5: /usr/bin/Xorg (FreeClientResources+0xd3) [0x449fc3]
6: /usr/bin/Xorg (CloseDownClient+0x60) [0x43b1b0]

Frame 3 is the segfault, which is:

% echo 0x4227af | eu-addr2line -e xorg-x11-server-debuginfo-1.7.7-2.el6.x86_64/usr/lib/debug/usr/bin/Xorg.debug 
/usr/src/debug/xorg-server-1.7.7/dix/events.c:4078

Which is:

        while (!pChild->nextSib && (pChild != pWin))
            pChild = pChild->parent;

which means the window hierarchy got smashed somewhere along the line.
Comment 4 Stefan Assmann 2010-07-09 07:38:47 EDT
happened again, I thought I had the necessary debuginfo (xorg-x11-server-debuginfo-1.7.7-2.el6.x86_64) installed but it seems I'm still missing something. Here's the backtrace. Any suggestions what debuginfo I should install additionally? debuginfo install script doesn't work on my machine, don't ask why. :)

Backtrace:
0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x4adf38]
1: /usr/bin/Xorg (0x400000+0x629e9) [0x4629e9]
2: /lib64/libpthread.so.0 (0x356fa00000+0xf440) [0x356fa0f440]
3: /usr/bin/Xorg (ReadRequestFromClient+0x1f) [0x46900f]
4: /usr/bin/Xorg (0x400000+0x3f9db) [0x43f9db]
5: /usr/bin/Xorg (0x400000+0x21ffa) [0x421ffa]
6: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x356ee1ec5d]
7: /usr/bin/Xorg (0x400000+0x21bb9) [0x421bb9]
Segmentation fault at address 0x8

Fatal server error:
Caught signal 11 (Segmentation fault). Server aborting

Btw, I was running Beta 1 for quite some time and never had this problem.
Comment 5 Adam Jackson 2010-07-09 14:13:08 EDT
That's not the same crash at all.  That's:

% echo 0x46900f | eu-addr2line -e usr/lib/debug/usr/bin/Xorg.debug 
/usr/src/debug/xorg-server-1.7.7/os/io.c:197

which is

    ConnectionInputPtr oci = oc->input;

In both cases, you're violating some pretty strong invariants; in the first crash you've managed to corrupt the window hierarchy, in the second you've got a client whose network buffer is trashed.  I'm inclined to suspect your hardware at this rate.
Comment 6 Stefan Assmann 2010-07-12 03:34:50 EDT
Hmm... I somewhat doubt it's a hardware error. This is my primary workstation, which is in use for 1,5 years now and until I upgraded to Beta2 I've never seen this behaviour.

Btw, the 2 crashes might be different but what happened was exactly the same. As I said, screen went black, mouse cursor reappeared and then the segfault. Could be caused by some memory corruption.
Comment 7 Adam Jackson 2010-07-12 10:43:03 EDT
Is there anything characteristic about the crashes?  Any particular apps you're running?
Comment 8 Stefan Assmann 2010-07-13 03:10:18 EDT
Hmm nothing unusual. The apps I'm usually running all the time are thunderbird, firefox, xchat, gnote and xterm. IIRC it happened while I was in xterm.
Comment 9 Dave Airlie 2010-07-13 20:14:15 EDT
what kernel is this?

I can plausibly blame this on the iommu stuff possibly.
Comment 10 Stefan Assmann 2010-07-14 03:11:29 EDT
kernel-2.6.32-37.el6.x86_64
Comment 11 Adam Jackson 2010-07-14 10:58:52 EDT
(In reply to comment #9)
> what kernel is this?
> 
> I can plausibly blame this on the iommu stuff possibly.    

If so, booting with iommu=off would fix this.  Stefan, can you try this?
Comment 12 Stefan Assmann 2010-07-15 05:07:37 EDT
I have just updated from beta2 to snap7. Let's see how that behaves. If I'm seeing anymore strange crashes I'll update this bugzilla and also try with intel_iommu=off. Thanks guys! :)
Comment 13 RHEL Product and Program Management 2010-07-15 10:58:47 EDT
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **
Comment 14 Matěj Cepl 2010-07-19 17:27:41 EDT
(In reply to comment #12)
> I have just updated from beta2 to snap7. Let's see how that behaves. If I'm
> seeing anymore strange crashes I'll update this bugzilla and also try with
> intel_iommu=off. Thanks guys! :)    

Let's put NEEDINFO on it.
Comment 15 Stefan Assmann 2010-07-20 02:20:26 EDT
Looks like updating to snap7 made the problem disappear, running my workstation for a couple of days now without any issues. Guess it's fine to close this then.
Comment 16 RHEL Product and Program Management 2011-01-06 23:35:20 EST
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.
Comment 17 Suzanne Yeghiayan 2011-01-07 11:19:27 EST
This request was erroneously denied for the current release of Red Hat
Enterprise Linux.  The error has been fixed and this request has been
re-proposed for the current release.
Comment 18 RHEL Product and Program Management 2011-02-01 01:06:34 EST
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.
Comment 19 RHEL Product and Program Management 2011-02-01 13:27:15 EST
This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.
Comment 20 RHEL Product and Program Management 2011-04-03 22:34:26 EDT
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.
Comment 21 RHEL Product and Program Management 2011-10-07 12:19:00 EDT
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.
Comment 22 Adam Jackson 2011-10-19 13:58:18 EDT
Closing per comment #15, thanks.

Note You need to log in before you can comment on or make changes to this bug.