Bug 524135

Summary: X-server amd/or GPU lockup
Product: [Fedora] Fedora Reporter: Peter Trenholme <PTrenholme>
Component: xorg-x11-drv-nouveauAssignee: Ben Skeggs <bskeggs>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 12CC: airlied, ajax, bskeggs, mcepl, xgl-maint
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: card_NV67
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-02-25 06:21:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Peter Trenholme 2009-09-18 03:02:15 UTC
Description of problem: X-server lockup. Occasional GPU lockup


Version-Release number of selected component (if applicable):
xorg-x11-drv-nouveau-0.0.15-10.20090914git1b72020.fc12.x86_64
xorg-x11-server-Xorg-1.6.99.901-2.fc12.x86_64
kernel-2.6.31-23.fc12.x86_64

Hardware: nVidia M/B chipset MCP67-M

How reproducible:
Every time

Steps to Reproduce:
1. Log on to KDE (or GNOME) desktop (I use KDM for both)
2. Work for 5-10 min.s
3. X-server freezes
  
Actual results:
Freeze

Expected results:
No freeze

Additional info:
1 -----------------------------------------
From the tail of the Xorg.0.log after the last freeze:
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/bin/X (xorg_backtrace+0x28) [0x45e898]
1: /usr/bin/X (mieqEnqueue+0x1f4) [0x457644]
2: /usr/bin/X (xf86PostMotionEventP+0xde) [0x4794ce]
3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f2e78600000+0x3dff) [0x7f2e78603dff]
4: /usr/bin/X (0x400000+0x6c3b7) [0x46c3b7]
5: /usr/bin/X (0x400000+0x1167f3) [0x5167f3]
6: /lib64/libpthread.so.0 (0x7f2e7fb2c000+0xf320) [0x7f2e7fb3b320]
7: /lib64/libc.so.6 (ioctl+0x7) [0x7f2e7e3b8617]
8: /usr/lib64/libdrm.so.2 (drmIoctl+0x23) [0x7f2e7bc3e203]
9: /usr/lib64/libdrm.so.2 (drmCommandWrite+0x1b) [0x7f2e7bc3e48b]
10: /usr/lib64/libdrm_nouveau.so.1 (0x7f2e7b5dd000+0x2d6d) [0x7f2e7b5dfd6d]
11: /usr/lib64/libdrm_nouveau.so.1 (nouveau_bo_map_range+0x10b) [0x7f2e7b5dff7b]
12: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (0x7f2e7b7e2000+0xbfd8) [0x7f2e7b7edfd8]
13: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (0x7f2e7b7e2000+0xd1c0) [0x7f2e7b7ef1c0]
14: /usr/lib64/xorg/modules/libexa.so (0x7f2e79771000+0x8160) [0x7f2e79779160]
15: /usr/bin/X (0x400000+0x152994) [0x552994]
16: /usr/bin/X (0x400000+0xcbf77) [0x4cbf77]
17: /usr/bin/X (0x400000+0x2c66c) [0x42c66c]
18: /usr/bin/X (0x400000+0x21caa) [0x421caa]
19: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7f2e7e2fdb4d]
20: /usr/bin/X (0x400000+0x21859) [0x421859]

2 ---------------------------------------------
Note that I specified the kernel above. Right now, I've booted the -2 kernel instead of the -23 one, and haven't locked up in the last 41 minutes, so this may be a kernel issue rather than a driver issue.

Comment 1 Ben Skeggs 2009-11-05 06:16:43 UTC
Can you update to the latest rawhide packages and retry please.

Comment 2 Peter Trenholme 2009-11-05 16:07:58 UTC
Did that two days ago, still having the problem although there are no longer any crash reports (or any other indication of a problem) in the Xorg.0.log or dmesg output. The server just reboots.

I noted that a new "nv" driver was installed as part of the update, so I switched from "noveau" to "nv" and (even though the on-board chipset was not "recognized" by the "nv" driver, so the "unknown" settings were used) I haven't had a X-server crash since the switch.

Note also that running from a late October nightly build appeared to work without any problems, so I suspect that there's an update to the "nouveau" driver in the pipeline that fixes the problem.

I'll run an update again today and repost here if there's any change from two days ag.

Comment 3 Matěj Cepl 2009-11-05 17:16:54 UTC
Since this bugzilla report was filed, there have been several major updates in various components of the Xorg system, which may have resolved this issue. Users who have experienced this problem are encouraged to upgrade their system to the latest version of their packages (at least F12Beta, but even better if the very latest versions).

Please, if you experience this problem on the up-to-date system, let us now in the comment for this bug, or whether the upgraded system works for you.

If you won't be able to reply in one month, I will have to close this bug as INSUFFICIENT_DATA. Thank you.

[This is a bulk message for all open Fedora Rawhide Xorg-related bugs. I'm adding myself to the CC list for each bug, so I'll see any comments you make after this and do my best to make sure every issue gets proper attention.]

Comment 4 Peter Trenholme 2009-11-06 02:47:24 UTC
O.K., I updated to the most current "Rawhide" as of this afternoon. The update included a new version of the "nouveau" driver.

COMPLETE BUST!

Using the new driver, either with my xorg.conf or NO xorg.conf resulted in a BLANK DISPLAY and NO KEYBOARD RESPONSE. The Xorg.0.log files (for both xorg.conf states) show NO ERRORS. The "poweroff" button did cleanly shut down the system, but that's the only key that seemed to work.

I'm back on F12 using the "nv" driver which dows seem to work (for now).

Note that the nVidia chipset in this laptop is "MCP67 Co-processor rev 162" which is NOT listed as a supported chipset by the "nouveau" driver:

(II) NOUVEAU driver for NVIDIA chipset families :
        RIVA TNT    (NV04)
        RIVA TNT2   (NV05)
        GeForce 256 (NV10)
        GeForce 2   (NV11, NV15)
        GeForce 4MX (NV17, NV18)
        GeForce 3   (NV20)
        GeForce 4Ti (NV25, NV28)
        GeForce FX  (NV3x)
        GeForce 6   (NV4x)
        GeForce 7   (G7x)
        GeForce 8   (G8x)

(Unless, of course, the 4MX somehow implies MCP67.)

Comment 5 Ben Skeggs 2009-11-06 03:06:28 UTC
So for starters, what does "rpm -q kernel xorg-x11-server-Xorg libdrm xorg-x11-drv-nouveau" tell you?  If you're using kernel kernel-2.6.31.5-117.fc12, update to http://koji.fedoraproject.org/koji/buildinfo?buildID=139823 (kernel-2.6.31.5-122.fc12).

MCP67 is GeForce 7 IIRC.  For reference, this line:

(--) NOUVEAU(0): Chipset: "NVIDIA NV84"

Is the useful line, it's the exact chipset the card reports itself to be.

Comment 6 Peter Trenholme 2009-11-06 13:59:55 UTC
Well, the -122 kernel made no difference. Still no display, no keyboard. (Note: Lack of keyboard confirmed by, e.g., no disk activity light on ctrl-alt-del.)

Here's the info you requested:
$ rpm -q kernel xorg-x11-server-Xorg libdrm xorg-x11-drv-nouveau
kernel-2.6.31.1-56.fc12.x86_64
kernel-2.6.31.5-96.fc12.x86_64
kernel-2.6.31.5-117.fc12.x86_64
kernel-2.6.31.5-122.fc12.x86_64
xorg-x11-server-Xorg-1.7.1-6.fc12.x86_64
libdrm-2.4.15-4.fc12.x86_64
libdrm-2.4.15-4.fc12.i686
xorg-x11-drv-nouveau-0.0.15-17.20091105gite1c2efd.fc12.x86_64

$ uname -a
Linux dv9810us 2.6.31.5-122.fc12.x86_64 #1 SMP Thu Nov 5 01:37:34 EST 2009 x86_64 x86_64 x86_64 GNU/Linux

$ grep -i chipset /var/log/Xorg.0.log.old
(II) NOUVEAU driver for NVIDIA chipset families :
(--) NOUVEAU(0): Chipset: "NVIDIA NV67"

(That's from the "old" log file because I had to revert to the "nv" driver and reboot before I could reply.)

Comment 7 Bug Zapper 2009-11-16 12:35:39 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 8 Ben Skeggs 2010-02-22 03:44:30 UTC
How did you fare with the packages in the F12 release (+ updates)?  If still bad, http://koji.fedoraproject.org/koji/buildinfo?buildID=157041 (kernel-2.6.32.8-58.fc12) has all the fixes from nouveau upstream backported and would be worth a try.

Comment 9 Peter Trenholme 2010-02-23 03:50:07 UTC
I moved to the nVidia driver from the rpm-fusion repositories which installs an akmod file which, among other things, re-writes the xorg.conf file to use the nvidia driver and re-runs Plymouth to rebuild the initramfs file with the nouveau driver blacklisted.

So, reverting to try the nouveau driver will require some system reconfiguration. Pease give me a few more days. (I want to finish a project I'm about done with, and run a system backup before reverting to a no-nvidia driver configuration.)

Comment 10 Peter Trenholme 2010-02-24 18:21:24 UTC
O.K., the nouveau driver seem to be working well with the latest release. (I found an old 120 Gb USB drive, did a install from the x86_64 DVD, then updated it.) Here's the info:

$ rpm -q kernel xorg-x11-server-Xorg libdrm xorg-x11-drv-nouveau
kernel-2.6.31.12-174.2.22.fc12.x86_64
xorg-x11-server-Xorg-1.7.4-6.fc12.x86_64
libdrm-2.4.17-1.fc12.x86_64
libdrm-2.4.17-1.fc12.i686
xorg-x11-drv-nouveau-0.0.15-20.20091105gite1c2efd.fc12.x86_64

$ uname -r
2.6.31.12-174.2.22.fc12.x86_64

$ grep -i chipset /var/log/Xorg.0.log
(II) NOUVEAU driver for NVIDIA chipset families :
(II) VESA: driver for VESA chipsets: vesa
(--) NOUVEAU(0): Chipset: "NVIDIA NV67"

I also ran a few glxgears (5 simultaneous ones) with no problem, and I've been running from the USB drive for an hour or so now.

I'll try to use the USB drive setup for a few days, but, for now, it looks good.

Comment 11 Ben Skeggs 2010-02-25 06:21:43 UTC
That's good to know it was fixed :)  I'll close this for now then, please reopen if you see anything change!