Bug 493222 - Post Beta updates cause frequent hangs when using nouveau
Summary: Post Beta updates cause frequent hangs when using nouveau
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-nouveau
Version: rawhide
Hardware: All
OS: Linux
high
medium
Target Milestone: ---
Assignee: Ben Skeggs
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: F11Blocker, F11FinalBlocker
TreeView+ depends on / blocked
 
Reported: 2009-03-31 23:36 UTC by David Nielsen
Modified: 2013-01-10 05:08 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-04-23 18:47:22 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
contents of /var/log (171.04 KB, application/x-gzip)
2009-03-31 23:36 UTC, David Nielsen
no flags Details
Xorg.0.log, read via SSH after hang. (38.78 KB, application/octet-stream)
2009-04-12 07:05 UTC, Joonas Sarajärvi
no flags Details
Dmesg output after X hanging. (37.90 KB, application/octet-stream)
2009-04-12 07:05 UTC, Joonas Sarajärvi
no flags Details

Description David Nielsen 2009-03-31 23:36:35 UTC
Created attachment 337422 [details]
contents of /var/log

Description of problem:

I have no idea what is causing this but F11 Beta was stable, after applying updates the first day after the freeze was lifted my machine started locking up frequently. I cannot figure out the trigger but I have included all the logs present in /var/log in the hopes that will narrow things down some more.

How reproducible:
100% on this machine

Steps to Reproduce:
1. install F11 Beta
2. update
  
Actual results:
frequent lockups will be experienced

Expected results:
business as usual

Additional info:
x86_64, da_DK.UTF-8

Comment 1 David Nielsen 2009-04-01 09:51:29 UTC
I am not so sure this is a lock up as such, the cursor can be moved however nothing responds when clicked upon. I can't switch to a VT and letting the machine sit for hours like this does not bring it out of this state. The only recovery seems to be a hard reset and seeing how many delightful minutes of computing one can get done till it happens next.

Comment 2 Peter Staubach 2009-04-01 17:59:06 UTC
This happens very quickly for me, on a Toshiba laptop.    It doesn't seem
to matter whether I am actively using it or not.

Some debugging -- the Xorg process seems to be stuck and the strace looks like:

...
sigreturn()                             = ? (mask now [])
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
--- SIGALRM (Alarm clock) @ 0 (0) ---
...

Comment 3 David Nielsen 2009-04-01 18:49:55 UTC
Would that laptop happen to have an nvidia card like mine or can we cross off nouveau?

Comment 4 Peter Staubach 2009-04-01 18:59:23 UTC
That laptop would have an Nvidia card.  The lspci output looks like:

01:00.0 VGA compatible controller: nVidia Corporation G72M [Quadro NVS 110M] (rev a1)

Sorry, can't cross off nouveau, I guess.

Comment 5 David Nielsen 2009-04-01 19:19:21 UTC
01:00.0 VGA compatible controller: nVidia Corporation G72M [Quadro NVS 110M/GeForce Go 7300] (rev a1)

Curious coincidence. Help us Obi-wan Skeggs, you're our only hope

Comment 6 Ben Skeggs 2009-04-01 21:53:10 UTC
It'd be useful if you could try the nv driver and see if this still happens there?  I'll have a look at the updates between the beta and latest updates, but I can't think of anything off the top of my head that could have caused this.

Comment 7 David Nielsen 2009-04-01 22:37:09 UTC
I see quite a lot of these:

Apr  1 23:23:52 localhost kernel: [drm] PGRAPH_ERROR - nSource: DATA_ERROR, nStatus: INVALID_STATE BAD_ARGUMENT
Apr  1 23:23:52 localhost kernel: [drm] PGRAPH_ERROR - Ch 1/0 Class 0x0062 Mthd 0x0308 Data 0x052a0b00:0x7fff7fff

and this, however nouveau.modeset=1 is not set

Apr  2 00:00:23 localhost kernel: [drm:nouveau_load] *ERROR* Kernel modesetting requested but not supported on this chipset.
Apr  2 00:25:07 localhost kernel: [drm:nouveau_load] *ERROR* Kernel modesetting requested but not supported on this chipset.

Regardless as requested I have switched to nv, let's see if it will crash.

Comment 8 Ben Skeggs 2009-04-01 22:45:38 UTC
It's definitely nouveau in your case then, the GPU is reporting a lot of errors to the driver.  Oops, I'll fix that mistake with the KMS warnings now.  Thanks you!

Comment 9 David Nielsen 2009-04-01 23:46:11 UTC
Yeah, the machine has now been up for nearly 2 hours which is, sadly,  unpresidented ever since the Beta freeze was lifted. Any additional information you need?

Comment 10 Ben Skeggs 2009-04-02 00:15:27 UTC
Nope, that will be enough info for the moment I think.  I'll see what I can find out.

Comment 11 Peter Staubach 2009-04-02 14:15:30 UTC
Hmm.  My symptoms looked like David's, but with more looking, not really.

My system does not exhibit the kernel errors like his.  Instead, I see
the Xorg process looping as described in Comment #2.  I will try the
nv driver though.  (Just to be sure, how do I do this?)

Comment 12 David Nielsen 2009-04-04 22:27:54 UTC
I tried reabling nouveau after seeing a few upgrades, but as of:
kernel-2.6.29.1-46.fc11.x86_64
xorg-x11-drv-nouveau-0.0.12-22.20090404git836d985.fc11.x86_64

This still happens and I still get complaints that modesetting was requested despite it not being the case.

Regardless nv is rocksolid for me and has been for days.

Comment 13 Ben Skeggs 2009-04-08 03:15:38 UTC
Are you both able to test kernel-2.6.29-16.fc11 (http://koji.fedoraproject.org/koji/buildinfo?buildID=95660) and kernel-2.6.29-21.fc11 (http://koji.fedoraproject.org/koji/buildinfo?buildID=95835) and report which (if any) of these kernels work better.  I'm only seeing one likely candidate so far.

Thanks!

Comment 14 David Nielsen 2009-04-08 12:00:05 UTC
I am still seeing this behaviour with -16, -21 testing is up next.

Comment 15 David Nielsen 2009-04-08 23:47:18 UTC
-21 is showing good behavior, no PGRAPH_ERROR errors in the logs. It's only 2 hours into the testing but considering how quickly it was triggered before it should have popped up by now.

Comment 16 David Nielsen 2009-04-09 00:08:44 UTC
and the very definition of irony: 5 mins after saying it was doing well, the hang occured on -21 as well

Comment 17 Joonas Sarajärvi 2009-04-12 07:02:16 UTC
I seem to get these hangs as well with the nouveau driver. On the other hand, the nv driver works fine.

Hardware info:
http://www.smolts.org/client/show/pub_c40b091e-a50f-4fbd-949c-8fa330d8bde5

I'll post dmesg output and Xorg.0.log as attachments shortly.

Comment 18 Joonas Sarajärvi 2009-04-12 07:05:00 UTC
Created attachment 339212 [details]
Xorg.0.log, read via SSH after hang.

Comment 19 Joonas Sarajärvi 2009-04-12 07:05:48 UTC
Created attachment 339213 [details]
Dmesg output after X hanging.

Comment 20 Ben Skeggs 2009-04-13 02:27:30 UTC
Can you give the kernel from the f11 beta (2.6.29-0.258.2.3.rc8.git2.fc11) a try to confirm the issue is definitely on the kernel side.  If the issues still occur there, downgrading to xorg-x11-drv-nouveau-0.0.12-10.20090310git8f9a580.fc11 would be useful to see also.

Comment 21 Joonas Sarajärvi 2009-04-13 07:39:47 UTC
I'm sorry, but at the moment, I don't have access to the machine where I encountered these hangs.

Thanks

Comment 22 Jens Petersen 2009-04-14 04:29:25 UTC
I see hangs with 0.0.12-25 and 26: can provide chipset details if it helps.

Comment 23 Jens Petersen 2009-04-14 04:30:18 UTC
I can also try the latest builds with F11Beta Live and see if that hangs too.

Comment 24 Jens Petersen 2009-04-14 04:52:12 UTC
nv driver also seems fine to me.

Comment 25 Jens Petersen 2009-04-15 06:48:07 UTC
(I just note for the record that -27 also hangs for me.)

Comment 26 Jens Petersen 2009-04-15 07:01:39 UTC
(In reply to comment #20)
> Can you give the kernel from the f11 beta (2.6.29-0.258.2.3.rc8.git2.fc11) a
> try to confirm the issue is definitely on the kernel side.

Ok I just tried -27 with F11Beta Live and that hung quickly for me too.

Comment 27 Jens Petersen 2009-04-15 07:21:14 UTC
(In reply to comment #20)
> If the issues still occur there, downgrading to
> xorg-x11-drv-nouveau-0.0.12-10.20090310git8f9a580.fc11
> would be useful to see also.  

That looks to be fine so far with current rawhide.

Comment 28 Jens Petersen 2009-04-15 07:32:37 UTC
Some sample output from messages:

# grep nouveau /var/log/messages
:
Apr 15 16:34:39 localhost yum: Updated: 1:xorg-x11-drv-nouveau-0.0.12-27.20090413git7100c06.fc11.i586
<reboot>
Apr 15 16:38:24 localhost kernel: nouveau 0000:01:00.0: Detected an NV44 generation card (0x044500a2)
Apr 15 16:38:24 localhost kernel: [drm] Initialized nouveau 0.0.12 20060213 for 0000:01:00.0 on minor 0
Apr 15 16:38:33 localhost kernel: nouveau 0000:01:00.0: Allocating FIFO number 0
Apr 15 16:38:33 localhost kernel: nouveau 0000:01:00.0: nouveau_fifo_alloc: initialised FIFO 0
Apr 15 16:38:33 localhost kernel: nouveau 0000:01:00.0: Allocating FIFO number 1
Apr 15 16:38:33 localhost kernel: nouveau 0000:01:00.0: nouveau_fifo_alloc: initialised FIFO 1
Apr 15 16:39:42 localhost kernel: nouveau 0000:01:00.0: PFIFO_CACHE_ERROR - Ch 1/6 Mthd 0x0184 Data 0xffffffff
Apr 15 16:39:42 localhost kernel: nouveau 0000:01:00.0: PFIFO_CACHE_ERROR - Ch 1/6 Mthd 0x0188 Data 0x32222222
Apr 15 16:39:44 localhost kernel: nouveau 0000:01:00.0: PGRAPH_ERROR - nSource: DATA_ERROR, nStatus: BAD_ARGUMENT
Apr 15 16:39:44 localhost kernel: nouveau 0000:01:00.0: PGRAPH_ERROR - Ch 1/1 Class 0x004a Mthd 0x0300 Data 0x00000000:0x00000000
Apr 15 16:39:50 localhost kernel: nouveau 0000:01:00.0: PFIFO_DMA_PUSHER - Ch 1
<hang>

Comment 29 Ben Skeggs 2009-04-15 07:44:18 UTC
Ok, just to confirm.  xorg-x11-drv-nouveau -10 is working OK with whichever kernel?  That gives me somewhere else to look :)  All the bug reports seem to have blamed the kernel, and I was running out of ideas as to what changed there that could possibly cause this.

Comment 30 James Cassell 2009-04-15 19:51:42 UTC
Here's how I can reproduce it (which I mentioned on bug #473347):
1. Open Firefox, or Gnome Help
2. Select some text
3. drag the text, such that the copy icon is the cursor

at this point, the cursor stays like that, and clicks have no effect, I can't
switch to a VT, and ctrl+alt+del doesn't do anything.  I seem to have to do a
hard power-off.


I'm running on a ThinkPad T61 with nVidia Quadro NVS 140m.

Comment 31 Ben Skeggs 2009-04-17 07:05:12 UTC
(In reply to comment #30)
> Here's how I can reproduce it (which I mentioned on bug #473347):
> 1. Open Firefox, or Gnome Help
> 2. Select some text
> 3. drag the text, such that the copy icon is the cursor
> 
> at this point, the cursor stays like that, and clicks have no effect, I can't
> switch to a VT, and ctrl+alt+del doesn't do anything.  I seem to have to do a
> hard power-off.
> 
> 
> I'm running on a ThinkPad T61 with nVidia Quadro NVS 140m.  
This issue isn't related, more likely to rh#489101...

Everyone else, this should be fixed now as of libdrm-2.4.6-6.fc11 and xorg-x11-drv-nouveau-0.0.12-29.20090417gitfa2f111.fc11.

I'll close after a couple of confirmations :)

Comment 32 Jens Petersen 2009-04-20 04:38:51 UTC
Looks good to me.  Thanks

Comment 33 David Nielsen 2009-04-20 06:14:14 UTC
Seems to have sent the crasher to the wild green yonder.

Comment 34 Adam Williamson 2009-04-23 18:47:22 UTC
Sounds like this is fixed. Closing.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers


Note You need to log in before you can comment on or make changes to this bug.