Bug 495764 - Xorg freezes at 100% cpu
Xorg freezes at 100% cpu
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-nouveau (Show other bugs)
rawhide
All Linux
low Severity high
: ---
: ---
Assigned To: Ben Skeggs
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-04-14 13:28 EDT by paolo borelli
Modified: 2009-04-17 03:08 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-04-17 03:08:48 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Xorg.0.log (45.32 KB, text/plain)
2009-04-14 13:38 EDT, paolo borelli
no flags Details

  None (edit)
Description paolo borelli 2009-04-14 13:28:30 EDT
Description of problem:

With the latest nouveau driver in rawhide X easily freezes at 100% cpu: the most reliable way to make it happen is to simply open a page in firefox, but it also happened to me scrolling a terminal etc.

Version-Release number of selected component (if applicable):

[root@localhost paolo]# rpm -qa | grep xorg-x11-server
xorg-x11-server-Xorg-1.6.0-19.fc11.i586
xorg-x11-server-common-1.6.0-19.fc11.i586
xorg-x11-server-utils-7.4-7.fc11.i586
[root@localhost paolo]# rpm -qa | grep nouveau
xorg-x11-drv-nouveau-0.0.12-26.20090413git7100c06.fc11.i586
xorg-x11-drv-nouveau-debuginfo-0.0.12-26.20090413git7100c06.fc11.i586
[root@localhost paolo]# uname -a
Linux localhost.localdomain 2.6.29.1-68.fc11.i586 #1 SMP Sat Apr 11 02:06:17 EDT 2009 i686 i686 i386 GNU/Linux


How reproducible:
100% reproduceable


Additional info:

I attached gdb to X, reproduced the 100% cpu and then ctrl+C and bt: the output was consistently

Program received signal SIGINT, Interrupt.
0x00a17e87 in nouveau_dma_wait () from /usr/lib/libdrm_nouveau.so.1
(gdb) bt
#0  0x00a17e87 in nouveau_dma_wait () from /usr/lib/libdrm_nouveau.so.1
#1  0x00a1613a in nouveau_pushbuf_flush () from /usr/lib/libdrm_nouveau.so.1
#2  0x001a74bc in FIRE_RING (chan=<value optimized out>)
    at /usr/include/nouveau/nouveau_pushbuf.h:98
#3  NV04EXASolid (chan=<value optimized out>) at nv04_exa.c:156


tail of dmesg after it happens:

SELinux: initialized (dev fuse, type fuse), uses genfs_contexts
nouveau 0000:01:00.0: PFIFO_DMA_PUSHER - Ch 1
nouveau 0000:01:00.0: nouveau_fifo_free: freeing fifo 1
nouveau 0000:01:00.0: Failed to idle channel 1.  Prepare for strangeness..
nouveau 0000:01:00.0: Unhandled PGRAPH_INTR - 0x00000100
nouveau 0000:01:00.0: PFIFO_CACHE_ERROR - Ch 1/6 Mthd 0x0184 Data 0xd8000002
nouveau 0000:01:00.0: PFIFO_CACHE_ERROR - Ch 1/6 Mthd 0x0188 Data 0xd8000001
nouveau 0000:01:00.0: Unhandled PGRAPH_INTR - 0x00000080
nouveau 0000:01:00.0: nouveau_fifo_free: freeing fifo 0
nouveau 0000:01:00.0: Allocating FIFO number 0
nouveau 0000:01:00.0: nouveau_fifo_alloc: initialised FIFO 0
nouveau 0000:01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
nouveau 0000:01:00.0: PGRAPH_ERROR - Ch 0/7 Class 0x0000 Mthd 0x18c4 Data 0x00000000:0xc1500000
nouveau 0000:01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
nouveau 0000:01:00.0: PGRAPH_ERROR - Ch 0/7 Class 0x0000 Mthd 0x18c8 Data 0xbf800000:0x00000000
nouveau 0000:01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
nouveau 0000:01:00.0: PGRAPH_ERROR - Ch 0/7 Class 0x0000 Mthd 0x18cc Data 0xbf800000:0xbf800000
nouveau 0000:01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
nouveau 0000:01:00.0: PGRAPH_ERROR - Ch 0/7 Class 0x0000 Mthd 0x1900 Data 0x00000000:0xfff40001
nouveau 0000:01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
nouveau 0000:01:00.0: PGRAPH_ERROR - Ch 0/7 Class 0x0000 Mthd 0x18c0 Data 0x41500000:0x00000000
nouveau 0000:01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
nouveau 0000:01:00.0: PGRAPH_ERROR - Ch 0/7 Class 0x0000 Mthd 0x18c4 Data 0x41500000:0x41500000
nouveau 0000:01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
nouveau 0000:01:00.0: PGRAPH_ERROR - Ch 0/7 Class 0x0000 Mthd 0x18c8 Data 0x3f800000:0x00000000
nouveau 0000:01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
nouveau 0000:01:00.0: PGRAPH_ERROR - Ch 0/7 Class 0x0000 Mthd 0x18cc Data 0x3f800000:0x3f800000
nouveau 0000:01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
nouveau 0000:01:00.0: PGRAPH_ERROR - Ch 0/7 Class 0x0000 Mthd 0x1900 Data 0x00000000:0x000e0001
nouveau 0000:01:00.0: Allocating FIFO number 1
nouveau 0000:01:00.0: nouveau_fifo_alloc: initialised FIFO 1
SELinux: initialized (dev fuse, type fuse), uses genfs_contexts
Comment 1 paolo borelli 2009-04-14 13:38:25 EDT
Created attachment 339530 [details]
Xorg.0.log

xorg log file in case it has some useful info
Comment 2 Ben Skeggs 2009-04-14 18:40:56 EDT
Can you downgrade to kernel-2.6.29-0.258.2.3.rc8.git2.fc11 and see if the issue still occurs?  There's a couple of other similar bug reports, but can't track down exactly what the cause is as of yet.
Comment 3 paolo borelli 2009-04-14 19:09:50 EDT
same thing happens with kernel-2.6.29-0.258.2.3.rc8.git2.fc11

(if you want me to try older kernels I'll need a brief explanation of how to downgrade, I happened to still have 2.6.29-0.258.2.3 in the grub list)
Comment 4 Ben Skeggs 2009-04-14 21:18:29 EDT
Ok, interesting.. Can you now downgrade xorg-x11-drv-nouveau to 0.0.12-10.20090310git8f9a580 and try with both your latest kernel and the 258 kernel.

You can grab the RPM for -10 from: http://koji.fedoraproject.org/koji/buildinfo?buildID=93599, and install with "rpm -Uvh --force <filename>".

Thank you!
Comment 5 paolo borelli 2009-04-15 04:40:15 EDT
downgrading to xorg-x11-drv-nouveau-0.0.12-10 works, I was not able to reproduce the bug
Comment 6 Ben Skeggs 2009-04-15 04:52:32 EDT
Excellent, second report I've had with bugs like this pointing the finger at the 2D driver, rather than the kernel!  I'll look at all the changes since then tomorrow and see what stands out.  If you're feeling really keen, narrowing it down a bit more would be very helpful (I can't reproduce on any of my hardware)!

There's a list of all the nouveau 2d driver packages at http://koji.fedoraproject.org/koji/packageinfo?packageID=5871 :)

Thanks!
Ben.
Comment 7 paolo borelli 2009-04-15 05:56:50 EDT
Ok, tracked down when the problem starts:

0.0.12-11 -> OK

0.0.12-15 -> BUG


(Note that 12 is not on koji and 13 and 14 were failed builds)
Comment 8 Ben Skeggs 2009-04-15 08:11:07 EDT
Thank you, that's perfect :)  I'll try and track this down in the morning!
Comment 9 Ben Skeggs 2009-04-16 01:51:55 EDT
There's a build of plain upstream nouveau http://koji.fedoraproject.org/koji/taskinfo?taskID=1301622, do you see the issue there?
Comment 10 paolo borelli 2009-04-16 02:56:39 EDT
I need an x86 build to test...
Comment 11 paolo borelli 2009-04-16 03:14:57 EDT
I compiled upstream driver from the git repo (see below for the version) and it seems to survive a bit of testing.

git log | head
commit 7100c06be099bacc0f8bb8898bbf7eb34ff1cc6e
Author: Ben Skeggs <skeggsb@gmail.com>
Date:   Mon Apr 13 20:21:51 2009 +1000
Comment 12 paolo borelli 2009-04-16 03:34:34 EDT
even more weird: I compiled the following revision from git:

commit 4067ab466fe3aa817e0323959f70c7dd3494de0a
Author: Ben Skeggs <skeggsb@gmail.com>
Date:   Mon Mar 23 14:43:22 2009 +1000

and I still cannot reproduce. However reinstalling the rpm "15" (which should correspond to the same version) still triggers the bug easily.
Comment 13 Ben Skeggs 2009-04-16 04:52:52 EDT
Yeah, I figured as much.. nothing was standing out as I scoured over the diffs as an obvious issue.  It'll be one of the fedora-specific patches interacting badly with one of the commits between -11 and -15 then, fun fun!  I'll keep looking, as this is something that really should be fixed before the release.  Would be much easier if I could reproduce myself!
Comment 14 paolo borelli 2009-04-16 09:19:23 EDT
ok, I tried git HEAD + the patch in the fedora repo and pinpointed it to this patch:

nouveau-multiple-xserver.patch

applying just that one on top of git triggers the issue
Comment 15 Ben Skeggs 2009-04-17 03:08:48 EDT
This is fixed now in libdrm-2.4.6-6.fc11 and xorg-x11-drv-nouveau-0.0.12-29.20090417gitfa2f111.fc11.

Thank you again for all your help tracking this down on IRC earlier :)

Note You need to log in before you can comment on or make changes to this bug.