Bug 587371

Summary: System freezes. EQ Overflow in log. Nvidia graphics card.
Product: [Fedora] Fedora Reporter: David Alan Hjelle <dahjelle.redhat.com>
Component: xorg-x11-drv-nouveauAssignee: Ben Skeggs <bskeggs>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 12CC: airlied, ajax, barbara.xxx1975, bskeggs, jan.public, luigi.3010, ss
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-01 22:54:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Full dmesg output.
none
Xorg.0.log none

Description David Alan Hjelle 2010-04-29 18:12:59 UTC
Description of problem:
Random freezes, ranging in period from minutes apart to a day or two. Rarely longer. Xorg pegs the CPU. The mouse moves, but no interaction is possible apart from SSH'ing in. Killing X does not seem to resolve the issue.


Version-Release number of selected component (if applicable):
0.0.15-21.20091105gite1c2efd.fc12

How reproducible:


Steps to Reproduce:
1. Wait.
2.
3.
  
Actual results:
Freeze.

Expected results:
Continued normal operation.

Additional info:

Looks exactly like https://bugzilla.redhat.com/show_bug.cgi?id=568591 except under F12.

The end of the kernel log is:

SELinux: initialized (dev rpc_pipefs, type rpc_pipefs), uses genfs_contexts
[drm] nouveau 0000:04:00.0: Allocating FIFO number 2
[drm] nouveau 0000:04:00.0: nouveau_channel_alloc: initialised FIFO 2
[drm] nouveau 0000:04:00.0: PGRAPH_ERROR - nSource: DATA_ERROR, nStatus:
[drm] nouveau 0000:04:00.0: PGRAPH_ERROR - Ch 2/5 Class 0x8597 Mthd 0x16b0 Data 0x00000000:0x00000000
[drm] nouveau 0000:04:00.0: PGRAPH_ERROR - nSource: DATA_ERROR, nStatus:
[drm] nouveau 0000:04:00.0: PGRAPH_ERROR - Ch 2/5 Class 0x8597 Mthd 0x16b0 Data 0x00000000:0x00000000
eth0: no IPv6 routers present
type=1305 audit(1272501313.631:27843): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:readahead_t:s0 res=1
fuse init (API version 7.13)
SELinux: initialized (dev fuse, type fuse), uses genfs_contexts
ip6_tables: (C) 2000-2006 Netfilter Core Team
warning: `VirtualBox' uses 32-bit capabilities (legacy support in use)
vboxdrv: Trying to deactivate the NMI watchdog permanently...
vboxdrv: Successfully done.
vboxdrv: Found 4 processor cores.
VBoxDrv: dbg - g_abExecMemory=ffffffffa027ad80
vboxdrv: fAsync=0 offMin=0x3b8 offMax=0x1581
vboxdrv: TSC mode is 'synchronous', kernel timer mode is 'normal'.
vboxdrv: Successfully loaded version 3.1.4_OSE (interface 0x00100001).
device eth0 entered promiscuous mode



The end of my Xorg.0.log is:

(II) NOUVEAU(0): Modeline "640x480"x66.7   30.24  640 704 768 864  480 483 486 525 -hsync -vsync (35.0 kHz)
(II) NOUVEAU(0): Modeline "640x480"x60.0   25.20  640 656 752 800  480 490 492 525 -hsync -vsync (31.5 kHz)
(II) NOUVEAU(0): Modeline "720x400"x70.1   28.32  720 738 846 900  400 412 414 449 -hsync +vsync (31.5 kHz)
(II) NOUVEAU(0): EDID for output VGA-0
(WW) Apple Inc. Apple Keyboard: unable to handle keycode 464
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x49ef28]
1: /usr/bin/Xorg (mieqEnqueue+0x1f4) [0x49e8f4]
2: /usr/bin/Xorg (xf86PostButtonEventP+0xcf) [0x47905f]
3: /usr/bin/Xorg (xf86PostButtonEvent+0xbe) [0x47918e]
4: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7fea0f775000+0x51f3) [0x7fea0f77a1f3]
5: /usr/bin/Xorg (0x400000+0x6c337) [0x46c337]
6: /usr/bin/Xorg (0x400000+0x117703) [0x517703]
7: /lib64/libpthread.so.0 (0x3bd3600000+0xf0f0) [0x3bd360f0f0]
8: /lib64/libc.so.6 (ioctl+0x7) [0x3bd2ed6917]
9: /usr/lib64/libdrm.so.2 (drmIoctl+0x23) [0x3beec03383]
10: /usr/lib64/libdrm.so.2 (drmCommandWrite+0x1b) [0x3beec0360b]
11: /usr/lib64/libdrm_nouveau.so.1 (0x7fea12b41000+0x304d) [0x7fea12b4404d]
12: /usr/lib64/libdrm_nouveau.so.1 (nouveau_bo_map_range+0xfc) [0x7fea12b4424c]
13: /usr/lib64/libdrm_nouveau.so.1 (0x7fea12b41000+0x2286) [0x7fea12b43286]
14: /usr/lib64/libdrm_nouveau.so.1 (nouveau_pushbuf_flush+0x29c) [0x7fea12b4361c]
15: /usr/lib64/xorg/modules/libexa.so (0x7fea106d3000+0x9625) [0x7fea106dc625]
16: /usr/lib64/xorg/modules/libexa.so (0x7fea106d3000+0xa1da) [0x7fea106dd1da]
17: /usr/bin/Xorg (0x400000+0xd305b) [0x4d305b]
18: /usr/lib64/xorg/modules/libexa.so (0x7fea106d3000+0xb42d) [0x7fea106de42d]
19: /usr/bin/Xorg (0x400000+0xd2a5e) [0x4d2a5e]
20: /usr/bin/Xorg (0x400000+0xcce0e) [0x4cce0e]
21: /usr/bin/Xorg (0x400000+0x2c93c) [0x42c93c]
22: /usr/bin/Xorg (0x400000+0x21f1a) [0x421f1a]
23: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x3bd2e1eb1d]
24: /usr/bin/Xorg (0x400000+0x21ad9) [0x421ad9]

Comment 1 Luigi Pardey 2010-05-02 12:28:54 UTC
Have you tried the steps they suggested in https://bugzilla.redhat.com/show_bug.cgi?id=568591 ?

- boot with "nouveau.noagp=1"
- install kernel 2.6.33 (it's not released in the repositories, you can either get it from F13 which it's not advisable, or source from kernel.org and compile yourself)

It appears to me it's a problem between the kernel 2.6.32 and the newest nouveau driver, and they solved by installing the kernel 2.6.33

Luigi



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 2 David Alan Hjelle 2010-05-03 19:48:00 UTC
I hadn't tried those solutions, as neither particularly appeared conclusive. I'll give them a shot, though.

So far I've run all day with the only change being "nouveau.noagp=1". I'll let you know if that doesn't prove stable after a couple days and what my next steps are.

Thanks!

Comment 3 David Alan Hjelle 2010-05-04 15:00:04 UTC
"nouveau.noagp=1" didn't make any difference, as the same problem occurred a couple hours after I posted. I'll try the new kernel when I get a chance; I'm afraid it won't be until later this week at best. Thanks!

Comment 4 David Alan Hjelle 2010-05-11 16:40:16 UTC
I'm afraid I couldn't test this. I wasn't able to get a new kernel installed, due to a poorly sized /boot partition. I ended up just upgrading to prerelease F13.

Interestingly, I seem to have had the *same* problem at least once with F13. I updated the associated bug here https://bugzilla.redhat.com/show_bug.cgi?id=568591 .

Sorry for not being more help on F12! This is my main work environment, so I don't have as much time to try things as one might wish.

Comment 5 David Alan Hjelle 2010-05-19 22:24:01 UTC
Though I'm running F13 now, I thought maybe something that seemed to help there might help on F12, too. I ended up following an idea at https://bugs.freedesktop.org/show_bug.cgi?id=15473#c4 and compiling my own Xorg 1.8.1 with the QUEUE_SIZE of 8192. So far, so good. I've not run all day, but, considering I was freezing 3 or 4 times a day, it seems to be an improvement.

Comment 6 David Alan Hjelle 2010-05-20 16:02:31 UTC
Nevermind. Odd. Was stable until this morning, and I'm back to frequent crashes.

Comment 7 Jan Vlug 2010-05-29 15:09:56 UTC
See also
https://bugzilla.redhat.com/show_bug.cgi?id=588036

Comment 8 Ben Skeggs 2010-05-30 22:58:29 UTC
David, can I see your *full* dmesg output please.

Comment 9 Scott Smedley 2010-06-01 07:44:25 UTC
Is this the same as:

https://bugzilla.redhat.com/show_bug.cgi?id=575575

??

Scott. :)

Comment 10 David Alan Hjelle 2010-06-01 20:08:51 UTC
Created attachment 418798 [details]
Full dmesg output.

Comment 11 David Alan Hjelle 2010-06-01 20:09:21 UTC
Created attachment 418799 [details]
Xorg.0.log

Comment 12 David Alan Hjelle 2010-06-01 20:15:05 UTC
I've uploaded both my dmesg output and the output of my Xorg.0.log. Interestingly, within the last week or so of updates, I've not been getting the message about overflowing EQ in the Xorg.0.log. Otherwise, the behavior is the same: mouse moves, but no other interaction is possible, and everything on the screen freezes. SSH'ing in shows that Xorg has pegged a CPU core.

Please note that I upgraded to F13 with hopes of resolving this, but no dice. So all of my output is on F13.

$ yum list installed | grep nouveau
xorg-x11-drv-nouveau.x86_64         1:0.0.16-6.20100423git13c1043.fc13 @updates 

$ uname -a
Linux wimsey 2.6.33.5-112.fc13.x86_64 #1 SMP Thu May 27 02:28:31 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

$ Xorg -version

X.Org X Server 1.8.1
Release Date: 2010-05-11
X Protocol Version 11, Revision 0
Build Operating System: Linux 2.6.33.3-85.fc13.x86_64 x86_64 
Current Operating System: Linux wimsey 2.6.33.5-112.fc13.x86_64 #1 SMP Thu May 27 02:28:31 UTC 2010 x86_64
Kernel command line: ro root=/dev/mapper/Wimsey-WimseyRoot rd_LVM_LV=Wimsey/WimseyRoot rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us rhgb quiet
Build Date: 19 May 2010  10:42:55AM
 
Current version of pixman: 0.18.0
	Before reporting problems, check http://wiki.x.org
	to make sure that you have the latest version.

(Yes, I did compile my own X in hopes of resolving the issue. It didn't help, so I can certainly return to a packaged version if that's preferred.)

Anything else that would help?

Comment 13 Ben Skeggs 2010-06-01 22:54:15 UTC
Unfortunately there's still no resolution to this problem as of yet.  I will close this bug as a duplicate now you've moved to F13 however, you can track the progress of this bug at https://bugzilla.redhat.com/show_bug.cgi?id=596330

*** This bug has been marked as a duplicate of bug 596330 ***

Comment 14 David Alan Hjelle 2010-06-02 13:07:04 UTC
Okey-doke. Thanks for the heads-up!

Comment 15 Barbara 2010-06-02 23:22:09 UTC
Did you tried adding pcie_aspm=off to kernel parameters?

Comment 16 Scott Smedley 2010-06-04 00:45:28 UTC
pcie_aspm=off didn't work for me - X still crashed with same error message.

As an aside, is there any way I can restart X without rebooting? I am
able to ssh in from another machine.

Scott.