Bug 618356
Summary: | [RHEL6] Desktop hangs/crashes (infinite loop inside EXA core) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jeff Burke <jburke> | ||||||
Component: | xorg-x11-drv-nouveau | Assignee: | Ben Skeggs <bskeggs> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Desktop QE <desktop-qa-list> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 6.0 | CC: | ajschult784, atodorov, borgan, cmeadors, ddumas, jglisse, jhunt, jwest, mgordon, vengmd | ||||||
Target Milestone: | rc | Keywords: | Reopened, RHELNAK, Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 618743 (view as bug list) | Environment: | |||||||
Last Closed: | 2013-05-02 18:27:31 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 782183, 960058 | ||||||||
Attachments: |
|
Description
Jeff Burke
2010-07-26 18:35:57 UTC
nouveau - nVidia Corporation G96 [Quadro FX 580] (rev a1) Created attachment 434494 [details]
xorg log
This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** So i first thought it was somethings to do with input stuff, but the input function showing in the backtrace are expected in case of GPU lockup. What happens is that the Xorg event queue (EQ) is filling up because the GPU driver stop processing it and we end up with EQ overflowing. Now we need to extract more information from nouveau driver to try to figure out what is going wrong on the GPU side. (Ben the kernel log seemed empty i will try to get more information out of nouveau tell me if you want me to run any specific tools) Created attachment 439019 [details]
full bt
half full bt
More specifically, those types of backtraces indicate that *something* linked with the X server is stuck, and input events can no longer be processed as a result. This particular backtrace does not indicate a GPU lockup of any kind however (typical traces showing this will have nouveau_pushbuf_flush() or nouveau_bo_map_range() in it). I'm still unable to reproduce this FWIW, even on the exact same graphics chip. I have updated my system to the latest tree and I am no longer able to reproduce. This issue doe still occur. I hadn't seen it in a while but it is still here. My system has crashed twice in the two days with this. It seems if I leave emacs open all the time it happen a lot quicker. [mi] EQ overflowing. The server is probably stuck in an infinite loop. Backtrace: 0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x4aaaf8] 1: /usr/bin/Xorg (mieqEnqueue+0x1f4) [0x4a0304] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xc4) [0x47be14] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7fa28e780000+0x551f) [0x7fa28e78551f] 4: /usr/bin/Xorg (0x400000+0x6c507) [0x46c507] 5: /usr/bin/Xorg (0x400000+0x11aef3) [0x51aef3] 6: /lib64/libpthread.so.0 (0x7fa2935f6000+0xf520) [0x7fa293605520] 7: /lib64/libc.so.6 (0x7fa292645000+0xf0dce) [0x7fa292735dce] 8: /lib64/libc.so.6 (0x7fa292645000+0x7c138) [0x7fa2926c1138] 9: /lib64/libc.so.6 (__libc_malloc+0x62) [0x7fa2926bea32] 10: /lib64/libc.so.6 (0x7fa292645000+0x6fcfb) [0x7fa2926b4cfb] 11: /lib64/libc.so.6 (0x7fa292645000+0x75676) [0x7fa2926ba676] 12: /lib64/libc.so.6 (0x7fa292645000+0x79108) [0x7fa2926be108] 13: /lib64/libc.so.6 (__libc_malloc+0x6d) [0x7fa2926bea3d] 14: /usr/lib64/xorg/modules/libexa.so (0x7fa28ffbe000+0x73d8) [0x7fa28ffc53d8] 15: /usr/lib64/xorg/modules/libexa.so (0x7fa28ffbe000+0xf9b9) [0x7fa28ffcd9b9] 16: /usr/lib64/xorg/modules/libexa.so (0x7fa28ffbe000+0xc83f) [0x7fa28ffca83f] 17: /usr/lib64/xorg/modules/libexa.so (0x7fa28ffbe000+0xcd24) [0x7fa28ffcad24] 18: /usr/bin/Xorg (0x400000+0xdab47) [0x4dab47] 19: /usr/bin/Xorg (0x400000+0x40a8c) [0x440a8c] 20: /usr/bin/Xorg (0x400000+0x2208a) [0x42208a] 21: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7fa292663c5d] 22: /usr/bin/Xorg (0x400000+0x21c49) [0x421c49] Hey Jeff, I've noticed a similar bug report upstream recently, and the EXA maintainer believes that it may be related to another EXA bug he also recently fixed. I've done a scratch build containing the fix. Are you able to test this and see if your problem goes away? https://brewweb.devel.redhat.com/taskinfo?taskID=3013796 Thanks! Ben. Hi all, I had a similar issue with Xvnc. Let me know if you need it reported separately: # [mi] EQ overflowing. The server is probably stuck in an infinite loop. Backtrace: 0: Xvnc (xorg_backtrace+0x28) [0x5af748] 1: Xvnc (mieqEnqueue+0x1ec) [0x59817c] 2: Xvnc (_ZN11InputDevice11PointerMoveERKN3rfb5PointE+0xab) [0x50aeab] 3: Xvnc (_ZN14XserverDesktop12pointerEventERKN3rfb5PointEi+0x1f) [0x50852f] 4: Xvnc (_ZN3rfb10SMsgReader16readPointerEventEv+0x12b) [0x53491b] 5: Xvnc (_ZN3rfb16VNCSConnectionST15processMessagesEv+0x38) [0x52b238] 6: Xvnc (_ZN14XserverDesktop13wakeupHandlerEP6fd_seti+0x11e) [0x50877e] 7: Xvnc (0x400000+0x101144) [0x501144] 8: Xvnc (WakeupHandler+0x4b) [0x56920b] 9: Xvnc (WaitForSomething+0x1d6) [0x5ad386] 10: Xvnc (Dispatch+0xb2) [0x563ca2] 11: Xvnc (main+0x355) [0x5005c5] 12: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7fbda1b6bc9d] 13: Xvnc (0x400000+0x3fae9) [0x43fae9] Ben, Sorry for the late response. Can you drop another scratch build on brew it has been purged. Or has this patch been included in an official build? Thanks, Jeff Sorry for the delay with this. I've submitted a new scratch build for you: https://brewweb.devel.redhat.com/taskinfo?taskID=3211308 Could you give it a go please? Ben, I have been running the with the following packages for a couple of days now. xorg-x11-server-debuginfo-1.7.7-30.el6.x86_64 xorg-x11-server-common-1.7.7-30.el6.x86_64 xorg-x11-server-Xorg-1.7.7-30.el6.x86_64 I have not seen the issue. If I hit the issue I will update the BZ Ben, I was running your version of xorg-x11-server-Xorg-1.7.7-30.el6 for a couple days. I did not hit any specific issues. I then upgraded the rest of my workstation ot use RHEL6.1 Snap1. After doing that the dual head setup became very erratic. My right monitor would come and go. My display would lock. X would restart. I have since downgraded my system from your test version and the problem persisted. I contacted Jerome and he came over to take a look at it but, his impressions this is a different bug. I am not sure what data to use for a new bz. Let me know if you can think of anything. I have gotten to the point where this new issue is problematic in me doing my day to day job. I have remove the NVIDIA QUADRO from my machine and replaced it with nVidia Corporation G96 [GeForce 9500 GT]. Since doing that my system is stable again. Since RHEL 6.2 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. Jeff, would you consider the original bug fixed now (as per comment #18)? Ben, I apologize for the very very late response. Not sure why I did not see this until now. I can't say for sure the issue is resolved. When I had your test package installed I still had issues, but Jerome believed that was a different issue. I took the big hammer approach and swapped out my video card. Best, Jeff |