Bug 695063

Summary: [NV4b] crash in /usr/lib64/xorg/modules/libexa.so
Product: [Fedora] Fedora Reporter: Martin Kho <rh-bugzilla>
Component: xorg-x11-serverAssignee: Adam Jackson <ajax>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: rawhideCC: ajax, dwmw2, mcepl, xgl-maint
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-04-12 17:47:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Xorg.0.log
none
dmesg
none
yum update list April 9th none

Description Martin Kho 2011-04-10 09:48:34 UTC
Created attachment 491054 [details]
Xorg.0.log

Description of problem:
After a few minutes the system freezes. A hard reset is needed. a backtrace can be found in Xorg.0.org(.old).

Version-Release number of selected component (if applicable):
xorg-x11-server-1.10.0-5.fc16


How reproducible:
Everytime

Steps to Reproduce:
1. Boot into graphical UI (KDE)
2. Open a few applications (use mouse and keyboard?)
3. ...freeze (no possibility to switch to a text console or anything else)  
  
Actual results:
System totally hangs

Expected results:
System normally functions

Additional info:

smolt profile: http://www.smolts.org/client/show/pub_18cb4280-e750-417d-9039-af536b838dbf

Comment 1 Martin Kho 2011-04-10 09:48:59 UTC
Created attachment 491055 [details]
dmesg

Comment 2 Matěj Cepl 2011-04-11 19:07:15 UTC
Backtrace:
[   643.778] 0: /usr/bin/X (xorg_backtrace+0x2f) [0x4a11cf]
[   643.778] 1: /usr/bin/X (mieqEnqueue+0x1e9) [0x4a06e9]
[   643.778] 2: /usr/bin/X (xf86PostMotionEventM+0xa3) [0x47d9a3]
[   643.779] 3: /usr/bin/X (xf86PostMotionEventP+0x52) [0x47dac2]
[   643.779] 4: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f84af236000+0x498b) [0x7f84af23a98b]
[   643.779] 5: /usr/bin/X (0x400000+0x6b478) [0x46b478]
[   643.779] 6: /usr/bin/X (0x400000+0x119ad0) [0x519ad0]
[   643.779] 7: /lib64/libpthread.so.0 (0x3061000000+0xf4e0) [0x306100f4e0]
[   643.779] 8: /lib64/libc.so.6 (ioctl+0x7) [0x30608d8957]
[   643.779] 9: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x307e003318]
[   643.779] 10: /usr/lib64/libdrm.so.2 (drmCommandWrite+0x1b) [0x307e0053fb]
[   643.780] 11: /usr/lib64/libdrm_nouveau.so.1 (0x7f84b20d0000+0x2eb7) [0x7f84b20d2eb7]
[   643.780] 12: /usr/lib64/libdrm_nouveau.so.1 (nouveau_bo_map_range+0x109) [0x7f84b20d34d9]
[   643.780] 13: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (0x7f84b22d5000+0x6ddb) [0x7f84b22dbddb]
[   643.780] 14: /usr/lib64/xorg/modules/libexa.so (0x7f84b1687000+0xbf50) [0x7f84b1692f50]
[   643.780] 15: /usr/bin/X (0x400000+0x155f25) [0x555f25]
[   643.780] 16: /usr/bin/X (0x400000+0xa40a5) [0x4a40a5]
[   643.780] 17: /usr/lib64/xorg/modules/libexa.so (exaGetPixmapFirstPixel+0x78) [0x7f84b169b408]
[   643.780] 18: /usr/lib64/xorg/modules/libexa.so (0x7f84b1687000+0x105ec) [0x7f84b16975ec]
[   643.780] 19: /usr/bin/X (0x400000+0xd985d) [0x4d985d]
[   643.781] 20: /usr/bin/X (0x400000+0xd42d5) [0x4d42d5]
[   643.781] 21: /usr/bin/X (0x400000+0x2eb41) [0x42eb41]
[   643.781] 22: /usr/bin/X (0x400000+0x22dca) [0x422dca]
[   643.781] 23: /lib64/libc.so.6 (__libc_start_main+0xed) [0x306082131d]
[   643.781] 24: /usr/bin/X (0x400000+0x230b1) [0x4230b1]

Comment 3 Martin Kho 2011-04-11 19:31:49 UTC
Hi,

I can tell a little more. The total hang started after the update from April 9th. Downgrading systemd from version 24-1 (updated April 9th) to 23-1 'solved' the issue. May be this can give a clue?

Martin Kho

Btw. First tried to disable selinux, and downgrading kde related stuff without success.

Comment 4 Martin Kho 2011-04-11 19:35:21 UTC
Created attachment 491332 [details]
yum update list April 9th

Comment 5 Martin Kho 2011-04-11 20:19:38 UTC
Hi,

Just after I uploaded the yum list my system froze ;-( I'll search somewhat further. You'll here from me later :-)

Martin Kho

Comment 6 Martin Kho 2011-04-12 10:35:06 UTC
Hi,

Arch... I feel a really big shame. Today I had a freeze on my production OS (Fedora 14). This made me think of I had a hardware problem. After re-installing my video card - I fiddled a little with an other card - everything is fine again. So sorry for the noise :-) For me this report can be closed.

Martin Kho

Btw. I had my printer and mouse on the same usb-controller. They are now on different controllers. Don't know if it matters.

@Matej: Maybe it's useful to check hardware?

Comment 7 Matěj Cepl 2011-04-12 17:47:49 UTC
(In reply to comment #6)
> For me this report can be closed.

Thank you for letting us know.

> @Matej: Maybe it's useful to check hardware?

Well, if you come with some way how to do it without bothering reporters too much, I am all ears (go ahead and continue in this bug, I am still following it even when it is closed).

Comment 8 Martin Kho 2011-04-12 20:41:26 UTC
Hi Matej,

With 'check hardware' I meant two things:

1. Physically inspect your hardware. I'm using a desktop pc so I can extend some components (eg. video card, HD, DVD) and clean connectors. In case you use a laptop this is much more delicate or maybe even impossible :-)

2. See if there are potential conflicts (interrupts eq.- this is from the old days, but you can never know). KDE has a very useful program for this - Kinfocenter. Under Device Information you'll find info about interrupts, USB Devices etc. In my case my printer and mouse were both sitting on the 8th host controller. Now I've put my printer on the 6th controller. You can also check if e.g the mouse is sitting on a usb port that is sharing it's interrupt with the graphics driver. Again, in the old days this was a problem, and maybe today it still can be :-)

What I've learned from this report is that the EQ overflow can also be caused by a hardware failure. It's not always a software problem! As Adam Jackson put in [1]: ""EQ overflowing" is not a bug, dang it.  It's a symptom." A symptom - in my case - of a hardware failure.

Martin Kho

[1] http://marc.info/?l=fedora-devel-list&m=124101535025331&w=2

Comment 9 Matěj Cepl 2011-04-13 10:10:30 UTC
(In reply to comment #8)
> What I've learned from this report is that the EQ overflow can also be caused
> by a hardware failure. It's not always a software problem! As Adam Jackson put
> in [1]: ""EQ overflowing" is not a bug, dang it.  It's a symptom." A symptom -
> in my case - of a hardware failure.

Yes, or it is quite caused by crash of something (so the events are not processed) or zillion other things. See bug 465884 for an example where everybody jumping on this message went completely out of hand and made the bug completely useless.

Concerning your suggestions about hardware diagnostics. Either these issues are obvious and then I would expect minimal level of sanity on the part of reporter (checking that all cables are plugged in, etc. before filing the bug), or they are not obvious (like apparently in your case) and then we are moving into the dark areas of the hardware malfunction diagnostics, IRQ numbers and similar stuff, where most usual reporters cry in the horror. Remember, Linux is now supposed to work even for people who are more familiar with ICQ than IRQ. That's what I meant “bothering reporters too much”.

Besides, hardware issues are not really not that much a problem to us. Usually either reporter comes with their tail between their legs (“Ehm … there was a disconnected cable …”) or there is something more serious (like in your case), and we hear from them. Or they simply evaporate and they are never to be seen again.

Comment 10 Martin Kho 2011-04-14 21:18:50 UTC
Hi Matej,

I was thinking about the reason why this report went the way it went. All that talk about hardware etc. Now I saw the reason. Your comment (#2) on my report was just an excerpt from my Xorg.0.log file and I thought that you got the same error. Damn, what a sucker am I :-$. Sorry, again for the noise.

And thanks for the discussion. ;-)

Martin Kho

Comment 11 Matěj Cepl 2011-04-15 15:59:26 UTC
(In reply to comment #10)
> I was thinking about the reason why this report went the way it went. All that
> talk about hardware etc. Now I saw the reason. Your comment (#2) on my report
> was just an excerpt from my Xorg.0.log file and I thought that you got the same
> error. Damn, what a sucker am I :-$. Sorry, again for the noise.

yes, I am sorry, it happened to me couple of times … I should really add some line like "What a nice backtrace in your Xorg.0.log" or something.