Bug 618356

Summary: [RHEL6] Desktop hangs/crashes (infinite loop inside EXA core)
Product: Red Hat Enterprise Linux 6 Reporter: Jeff Burke <jburke>
Component: xorg-x11-drv-nouveauAssignee: Ben Skeggs <bskeggs>
Status: CLOSED CURRENTRELEASE QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: low    
Version: 6.0CC: ajschult784, atodorov, borgan, cmeadors, ddumas, jglisse, jhunt, jwest, mgordon, vengmd
Target Milestone: rcKeywords: Reopened, RHELNAK, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 618743 (view as bug list) Environment:
Last Closed: 2013-05-02 18:27:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 782183, 960058    
Attachments:
Description Flags
xorg log
none
full bt none

Description Jeff Burke 2010-07-26 18:35:57 UTC
Description of problem:
When launching emacs. I periodically get a "hang" crash. The system gui is hung, I can't do anything. I am able to ssh into the system.

Version-Release number of selected component (if applicable):
xorg-x11-server-Xorg-1.7.7-21.el6.x86_64

How reproducible:
Intermittant

Steps to Reproduce:
1. Install RHEL6.0-Snapshot-7-Refresh
2. Login into desktop
3. Open terminal window, emacs /tmp/foo &
  
Actual results:
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x469138]
1: /usr/bin/Xorg (mieqEnqueue+0x1f4) [0x4a2fe4]
2: /usr/bin/Xorg (xf86PostMotionEventP+0xc4) [0x4739d4]
3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f0c40a09000+0x524f) [0x7f0c40a0e24f]
4: /usr/bin/Xorg (0x400000+0x74897) [0x474897]
5: /usr/bin/Xorg (0x400000+0x10dad3) [0x50dad3]
6: /lib64/libpthread.so.0 (0x31b4800000+0xf4c0) [0x31b480f4c0]
7: /lib64/libc.so.6 (0x31b4400000+0xf0dce) [0x31b44f0dce]
8: /lib64/libc.so.6 (0x31b4400000+0x7c1f8) [0x31b447c1f8]
9: /lib64/libc.so.6 (__libc_malloc+0x62) [0x31b4479af2]
10: /lib64/libc.so.6 (0x31b4400000+0x6fdbb) [0x31b446fdbb]
11: /lib64/libc.so.6 (0x31b4400000+0x75736) [0x31b4475736]
12: /lib64/libc.so.6 (0x31b4400000+0x78e78) [0x31b4478e78]
13: /lib64/libc.so.6 (__libc_malloc+0x6d) [0x31b4479afd]
14: /usr/bin/Xorg (miRegionCreate+0x23) [0x454be3]
15: /usr/bin/Xorg (miRectsToRegion+0x33) [0x455e43]
16: /usr/bin/Xorg (miChangeClip+0x8e) [0x55491e]
17: /usr/lib64/xorg/modules/libexa.so (0x7f0c42287000+0x2c6d) [0x7f0c42289c6d]
18: /usr/bin/Xorg (0x400000+0xd42b4) [0x4d42b4]
19: /usr/bin/Xorg (SetClipRects+0xbf) [0x4368ef]
20: /usr/bin/Xorg (0x400000+0x297a6) [0x4297a6]
21: /usr/bin/Xorg (0x400000+0x2ab5c) [0x42ab5c]
22: /usr/bin/Xorg (0x400000+0x21ffa) [0x421ffa]
23: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x31b441ec5d]
24: /usr/bin/Xorg (0x400000+0x21bb9) [0x421bb9]


Expected results:
Should continue to operate normally

Additional info:

Comment 1 Jeff Burke 2010-07-26 18:36:59 UTC
nouveau - nVidia Corporation G96 [Quadro FX 580] (rev a1)

Comment 3 Jeff Burke 2010-07-26 18:42:56 UTC
Created attachment 434494 [details]
xorg log

Comment 4 RHEL Program Management 2010-07-26 18:57:38 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 7 Jérôme Glisse 2010-08-16 18:45:22 UTC
So i first thought it was somethings to do with input stuff, but the input function showing in the backtrace are expected in case of GPU lockup. What happens is that the Xorg event queue (EQ) is filling up because the GPU driver stop processing it and we end up with EQ overflowing. Now we need to extract more information from nouveau driver to try to figure out what is going wrong on the GPU side.

(Ben the kernel log seemed empty i will try to get more information out of nouveau tell me if you want me to run any specific tools)

Comment 8 Jérôme Glisse 2010-08-16 21:51:15 UTC
Created attachment 439019 [details]
full bt

half full bt

Comment 9 Ben Skeggs 2010-08-17 03:47:37 UTC
More specifically, those types of backtraces indicate that *something* linked with the X server is stuck, and input events can no longer be processed as a result.

This particular backtrace does not indicate a GPU lockup of any kind however (typical traces showing this will have nouveau_pushbuf_flush() or nouveau_bo_map_range() in it).

I'm still unable to reproduce this FWIW, even on the exact same graphics chip.

Comment 10 Jeff Burke 2010-08-17 18:19:01 UTC
I have updated my system to the latest tree and I am no longer able to reproduce.

Comment 11 Jeff Burke 2010-12-03 14:47:37 UTC
This issue doe still occur. I hadn't seen it in a while but it is still here. My system has crashed twice in the two days with this. It seems if I leave emacs open all the time it happen a lot quicker.

[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x4aaaf8]
1: /usr/bin/Xorg (mieqEnqueue+0x1f4) [0x4a0304]
2: /usr/bin/Xorg (xf86PostMotionEventP+0xc4) [0x47be14]
3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7fa28e780000+0x551f) [0x7fa28e78551f]
4: /usr/bin/Xorg (0x400000+0x6c507) [0x46c507]
5: /usr/bin/Xorg (0x400000+0x11aef3) [0x51aef3]
6: /lib64/libpthread.so.0 (0x7fa2935f6000+0xf520) [0x7fa293605520]
7: /lib64/libc.so.6 (0x7fa292645000+0xf0dce) [0x7fa292735dce]
8: /lib64/libc.so.6 (0x7fa292645000+0x7c138) [0x7fa2926c1138]
9: /lib64/libc.so.6 (__libc_malloc+0x62) [0x7fa2926bea32]
10: /lib64/libc.so.6 (0x7fa292645000+0x6fcfb) [0x7fa2926b4cfb]
11: /lib64/libc.so.6 (0x7fa292645000+0x75676) [0x7fa2926ba676]
12: /lib64/libc.so.6 (0x7fa292645000+0x79108) [0x7fa2926be108]
13: /lib64/libc.so.6 (__libc_malloc+0x6d) [0x7fa2926bea3d]
14: /usr/lib64/xorg/modules/libexa.so (0x7fa28ffbe000+0x73d8) [0x7fa28ffc53d8]
15: /usr/lib64/xorg/modules/libexa.so (0x7fa28ffbe000+0xf9b9) [0x7fa28ffcd9b9]
16: /usr/lib64/xorg/modules/libexa.so (0x7fa28ffbe000+0xc83f) [0x7fa28ffca83f]
17: /usr/lib64/xorg/modules/libexa.so (0x7fa28ffbe000+0xcd24) [0x7fa28ffcad24]
18: /usr/bin/Xorg (0x400000+0xdab47) [0x4dab47]
19: /usr/bin/Xorg (0x400000+0x40a8c) [0x440a8c]
20: /usr/bin/Xorg (0x400000+0x2208a) [0x42208a]
21: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7fa292663c5d]
22: /usr/bin/Xorg (0x400000+0x21c49) [0x421c49]

Comment 12 Ben Skeggs 2011-01-07 02:33:38 UTC
Hey Jeff,

I've noticed a similar bug report upstream recently, and the EXA maintainer believes that it may be related to another EXA bug he also recently fixed.  I've done a scratch build containing the fix.  Are you able to test this and see if your problem goes away?

https://brewweb.devel.redhat.com/taskinfo?taskID=3013796


Thanks!
Ben.

Comment 13 Alexander Todorov 2011-02-15 10:37:19 UTC
Hi all,
I had a similar issue with Xvnc. Let me know if you need it reported separately:

# [mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: Xvnc (xorg_backtrace+0x28) [0x5af748]
1: Xvnc (mieqEnqueue+0x1ec) [0x59817c]
2: Xvnc (_ZN11InputDevice11PointerMoveERKN3rfb5PointE+0xab) [0x50aeab]
3: Xvnc (_ZN14XserverDesktop12pointerEventERKN3rfb5PointEi+0x1f) [0x50852f]
4: Xvnc (_ZN3rfb10SMsgReader16readPointerEventEv+0x12b) [0x53491b]
5: Xvnc (_ZN3rfb16VNCSConnectionST15processMessagesEv+0x38) [0x52b238]
6: Xvnc (_ZN14XserverDesktop13wakeupHandlerEP6fd_seti+0x11e) [0x50877e]
7: Xvnc (0x400000+0x101144) [0x501144]
8: Xvnc (WakeupHandler+0x4b) [0x56920b]
9: Xvnc (WaitForSomething+0x1d6) [0x5ad386]
10: Xvnc (Dispatch+0xb2) [0x563ca2]
11: Xvnc (main+0x355) [0x5005c5]
12: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7fbda1b6bc9d]
13: Xvnc (0x400000+0x3fae9) [0x43fae9]

Comment 14 Jeff Burke 2011-02-22 21:29:51 UTC
Ben,
  Sorry for the late response. Can you drop another scratch build on brew it has been purged. Or has this patch been included in an official build?

Thanks,
Jeff

Comment 15 Ben Skeggs 2011-03-28 23:24:49 UTC
Sorry for the delay with this.  I've submitted a new scratch build for you: https://brewweb.devel.redhat.com/taskinfo?taskID=3211308

Could you give it a go please?

Comment 16 Jeff Burke 2011-04-01 19:01:15 UTC
Ben,
 I have been running the with the following packages for a couple of days now.
xorg-x11-server-debuginfo-1.7.7-30.el6.x86_64
xorg-x11-server-common-1.7.7-30.el6.x86_64
xorg-x11-server-Xorg-1.7.7-30.el6.x86_64

 I have not seen the issue. If I hit the issue I will update the BZ

Comment 18 Jeff Burke 2011-04-12 17:30:41 UTC
Ben,
 I was running your version of xorg-x11-server-Xorg-1.7.7-30.el6 for a couple days. I did not hit any specific issues. I then upgraded the rest of my workstation ot use RHEL6.1 Snap1. After doing that the dual head setup became
very erratic. My right monitor would come and go. My display would lock. 
X would restart.

 I have since downgraded my system from your test version and the problem persisted. I contacted Jerome and he came over to take a look at it but, his
impressions this is a different bug. I am not sure what data to use for a new
bz. Let me know if you can think of anything.

 I have gotten to the point where this new issue is problematic in me doing my day to day job. I have remove the NVIDIA QUADRO from my machine and replaced it with nVidia Corporation G96 [GeForce 9500 GT]. Since doing that my system is stable again.

Comment 20 RHEL Program Management 2011-10-07 16:16:09 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 22 Ben Skeggs 2012-07-11 06:56:24 UTC
Jeff, would you consider the original bug fixed now (as per comment #18)?

Comment 23 Jeff Burke 2012-07-12 12:34:34 UTC
Ben,
 I apologize for the very very late response. Not sure why I did not see this until now. I can't say for sure the issue is resolved. When I had your test package installed I still had issues, but Jerome believed that was a different issue. I took the big hammer approach and swapped out my video card.

Best,
Jeff