Red Hat Bugzilla – Bug 684907
[NVa8] X freezes frequently
Last modified: 2011-04-25 09:46:23 EDT
Created attachment 484281 [details]
/var/log/messages from last boot (no hang)
Description of problem:
X freezes around 2 or 3 times a day here. This has happened for quite some time (some 2 months), but I was on rawhide ~~-> 16, and the machine did misbehave in an entertaining variety of ways. I reinstalled from scratch last weekend, and the problem persists.
What happens is that the screen freezes completely (no date/time update, no reaction to keyboard like ctrl-alt-DEL or ctrl-BS). I can move the mouse pointer, but once when the spinner was active it didn't even spin. The keyboard LEDs don't react either (i.e., CapsLock doesn't turn the LED on).
Curiously this has happened on my two Toshiba notebooks, this one here with Nouveau, the other one with some oldish intel GPU (haven't checked that one lately, sorry). I also have a Samsung netbook, that one isn't prone to freezing.
Version-Release number of selected component (if applicable):
01:00.0 VGA compatible controller : nVidia Corporation Device [10de:0a75] (rev a2)
Around 2 to 3 times a day
Steps to Reproduce:
1. (Happens at random; once even while the screensaver was running)
Created attachment 484282 [details]
Created attachment 484283 [details]
~/.xsession errors from current session (no hang up to here)
Can you paste your dmesg output from *after* a hang has happened. If you don't have a way of accessing it after it's hung, /var/log/messages from right after you reboot will do in its place.
Created attachment 484463 [details]
Outoput from dmesg just after a hang
Created attachment 484464 [details]
Output from gdb
Via SSH I ran "gdb /usr/bin/Xorg 1230" as root, and did an "info stack"
The output from "pd -p 1230 -l" is:
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
4 S 0 1320 1317 1 80 0 - 43891 nouvea tty1 00:01:01 Xorg
The process Xorg still used CPU, albeit slowly.
"kill 1320" and such had no effect. Tried to "telinit 3", nothing. A "reboot" finally rebooted the machine, but it took some time to react.
Connected via SSH (dead keyboard), as root in the above. Tried to "kill 1320" and such, nothing; tried "telinit 3", nothing; "reboot" finally worked (but it took some 30 seconds to kick in).
BTW, I had just restarted the machine and was editing a comment just like this one when it froze again...
Just froze again...
And again. Not funny anymore...
Can you add "nouveau.noaccel=1" to your boot options. It's likely that this will become the default for F15 if this problem still persists. There was some hope it was solved. I have a NVA8 laptop now, and see *no* problem.
Froze twice in the meantime... now as suggested in comment 9. Just Gnome fallback mode now :-(
Yes, the problem is *very* frustrating. We have zero real idea about what is wrong. I had hoped that this laptop would actually have the problem, so I could use a trial-and-error approach to fixing it, but for some odd reason it does not... Very, very frustrating.
The smolt profile for this machine is pub_258fc546-3757-4b93-8019-c3ff4fa31e90 (Toshiba Satellite A505 PSAT9U-009LM1).
Tell if I can be of some help.
BTW, no freezes since the change suggested in comment 9 (but I moved to XFCE, Gnome fallback isn't up to snuff).
Created attachment 485781 [details]
/varLog/messages, last two boots
Even with the change suggested in comment 9 I got a hang. Sorry, I rebooted before remembering to do the capture dance with dmesg et al. Attached the full /var/log/messages for the hung session (XFCE4, no screensaver; I was off for lunch and had nothing running) and the messages for the current session.
Oh good (for me, anyway), it doesn't seem likely nouveau can be responsible for your hang in that case. With "noaccel", after the mode is set, nouveau doesn't really do anything.
Your /var/log/messages is filled with a *lot* of messages from something else, not sure what it is, but it's looking like the next candidate to me. Can you "modprobe -r intel_ips" after you've booted?
(In reply to comment #14)
> Oh good (for me, anyway), it doesn't seem likely nouveau can be responsible for
> your hang in that case. With "noaccel", after the mode is set, nouveau doesn't
> really do anything.
It is just a _lot_ less frequent (1/day vs 5/day), so I wouldn't rule it out just yet.
> Your /var/log/messages is filled with a *lot* of messages from something else,
> not sure what it is, but it's looking like the next candidate to me. Can you
> "modprobe -r intel_ips" after you've booted?
Just checked, it is loaded. Will remove that one and go with accelerated nouveau...
Rebooted, now in XFCE. Curiously, both times I ran non-accelerated with intel_ips I got a messed up taskbar (see bug # 688254).
Created attachment 485886 [details]
/var/log/messages, last two boots
One run without intel_ips and with accelerated nouveau (hung like clockwork ;-) and now with intel_ips and acceleration.
(Sorry, haven't anything handy for ssh connection to this machine right now).
Will remove intel_ips again and see what happens.
No freeze til now. Need to shut down for today.
Started this machine up around 9 AM, now it is 3 PM. No freezes, Nouveau accelerated (no kernel parameter), intel_ips rmmod'ed soon after boot. So it looks like the culprit is really intel_ips (normally it would have frozen 2 or 3 times by now).
Interesting, to clarify the situation before I reassign this to the kernel:
nouveau.noaccel=0 + intel_ips = hang
nouveau.noaccel=1 + intel_ips = hang
nouveau.noaccel=0 - intel_ips = no hang
nouveau.noaccel=1 - intel_ips = no hang
It is a bit more complicated than that...
Some notation: NA == Nouveau, accelerated (+/-); II == intel=ips enabled (+/-)
NA+ II+ Hangs rather reliably (within a half hour or so right now)
NA- II+ Hangs, but less frequently
NA+ II- No hangs (*)
NA- II- Not tested
(*) It did hang twice for me, but the second time (it happened today) I had started my XFCE session before remembering to disable II; I killed the (starting) session with ctrl-BS, and on tty2 I rmmod'ed intel_ips. Shortly after logging in again it hung. It seems that the II- has to be before there was any serious use of X, i.e. very soon after boot.
I did run the full day yesterday (9 AM to around 6 PM) without any hang with NA+ II- (Using XFCE, presumably not so 3D-demanding? But then again, Gnome vs XFCE didn't make much of a difference before...)
Count another day without incident (from around 9 AM to 4 PM or so), NA+, II- (but II- set right after a cold boot).
For me, NA+ II- still results in lock-ups, even with the intel_ips module blacklisted.
Got another hang with NA+ II-, setup just like comment 22.
(In reply to comment #24)
> Got another hang with NA+ II-, setup just like comment 22.
I updated rsyslog at that time, which broke the system (see bug 689121), that might have been the cause for the hang (I tried to login via SSH, but got no response and had to reboot the hard way).
NA+, II-. Uptime is 22 hours.
NA+, II-. Two crashes in <24h.
Now kernel-2.6.38-1.fc15.x86_64, xorg-x11-drv-nouveau-0.0.16-23.20110303git92db2bc.fc15.x86_64. By accident I left out the "disable intel_ips" bit a few times, and it has hung much less (some 5 hours between hangs vs less than half an hour).
(In reply to comment #28)
> Now kernel-2.6.38-1.fc15.x86_64,
> xorg-x11-drv-nouveau-0.0.16-23.20110303git92db2bc.fc15.x86_64. By accident I
> left out the "disable intel_ips" bit a few times, and it has hung much less
> (some 5 hours between hangs vs less than half an hour).
Still hanging then, even with noaccel? Okay.. I really doubt what you're seeing is nouveau's fault now. It's still possible though I guess.
Any chance you can brave using the vesa driver for a day or so? Just add "nomodeset" to your boot options and X should automatically fall back to it.
(In reply to comment #29)
> (In reply to comment #28)
> > Now kernel-2.6.38-1.fc15.x86_64,
> > xorg-x11-drv-nouveau-0.0.16-23.20110303git92db2bc.fc15.x86_64. By accident I
> > left out the "disable intel_ips" bit a few times, and it has hung much less
> > (some 5 hours between hangs vs less than half an hour).
> Still hanging then, even with noaccel? Okay.. I really doubt what you're
> seeing is nouveau's fault now. It's still possible though I guess.
No, this is without noaccel and with intel_ips, but the newer kernel is _much_ less prone to X hangs (current uptime is almost 5 hours, no hang; used to freeze at most after an hour or so).
Had a couple of hangs with NA+ II- (kernel-2.6.38-1.fc15.x86_64 and kernel-18.104.22.168-6.fc15.x86_64, xorg-x11-drv-nouveau-0.0.16-24.20110324git8378443.fc15.x86_64)
(In reply to comment #31)
> Had a couple of hangs with NA+ II- (kernel-2.6.38-1.fc15.x86_64 and
If you're still getting hangs with noaccel (I presume that is NA+?), I think it's probably almost time to reassign this elsewhere. But, where, I'm not exactly sure. Probably the kernel. One last ditch effort to rule out nouveau completely, are you able to brave the vesa driver for a bit?
OK, yet again: noaccell == NA- (Nouveau, acceleration off).
intel_ips == II+ (Intel ips on)
And as I said: This was _without_ noaccell, and with intel_ips rmmod'ed.
Currently my uptime is almost 20 hours, NA+, II- (no noaccell, intel_ips rmmod'ed). Yesterday it hung after 2 hours with the same (old kernel), then with the new kernel it hung after some 1/2 hour.
By the timing, I think this is a different bug than the original one.
A new hang, NA+, II- after some 3 hours.
Currently with nomodeset, II+, uptime 2 1/2 hours.
(In reply to comment #34)
> A new hang, NA+, II- after some 3 hours.
> Currently with nomodeset, II+, uptime 2 1/2 hours.
BTW, I now see strange compositing errors (once XEmacs window was all vertical stripes until I refreshed it; now the XFCE taskbar has the battery icon replicated 4 times and no Bluetooth nor NetworkManager).
Ran most of today with VESA (nomodeset) + intel_ips, no hangs; right now nouveau with acceleration + intel_ips is up for some 4 hours.
That's thoroughly confusing.. We don't really *do* anything in the noaccel case.
Can you use noaccel again, and get the X log from that, and a gdb backtrace of where X is stuck when it hangs?
Yet again: The one that made (!) the most difference was intel_ips, it did hang noticeably less without it, noaccell didn't make much of a difference lately. Right now, I've had this machine running all night (but the screensaver did kick in) and all day (mostly away, so also screensaver) with _no_ special configuration at all, no more hangs. I saw the frequency of hanging diminish with kernel version, right now it is kernel-22.214.171.124-8.fc15.x86_64.
VESA looks but-ugly, and has compositing problems.
(In reply to comment #38)
> Yet again: The one that made (!) the most difference was intel_ips, it did hang
> noticeably less without it, noaccell didn't make much of a difference lately.
> Right now, I've had this machine running all night (but the screensaver did
> kick in) and all day (mostly away, so also screensaver) with _no_ special
> configuration at all, no more hangs. I saw the frequency of hanging diminish
> with kernel version, right now it is kernel-126.96.36.199-8.fc15.x86_64.
> VESA looks but-ugly, and has compositing problems.
Argh... scratch that, Had configured the kernel with nomodeset :-(
OK, now really without any special configuration (except for selinux=0 due to /run breakage), running a few hours without any trouble.
Ran most of today without trouble (no special configuration), did hang on shutdown (it looked like it, everything froze and I had to turn the machine off as I had no time to wait).
Recently is seems to have frozen for a few seconds.
If it still happens here, it is certainly after some 5 hours running at the very least.
Updading with latest data: kernel-188.8.131.52-9.fc15.x86_64, xorg-x11-drv-nouveau-0.0.16-24.20110324git8378443.fc15.x86_64. Running a few days with nomodeset (== VESA) gives no hang, running some 36 hours with no special configuration gave 2 (or perhaps 3) hangs. Currently running a few hours withouth intel_ips.
Had a hang yesterday in the afternoon with no special configuration, booted and rmmod'ed intel_ips and had no hang after that. Today I logged into XFCE and after that I rmmod'ed intel_ips, and had a hang a half hour later. Now trying the same again (remembered intel_ips too late ;-)
Yet again: No special configuration, froze after an hour or so; rmmod intel_ips _before_ entering XFCE and running for some 6 hours now. Doing the rmmod _after_ the desktop starts seems to be much less effective.
This is driving me nuts... No special configuration, two days straight (some 20 hours) no hangs, then a hang. Reboot, a new hang 20 minutes later, and again after 10 minutes. Today a hang after running an hour or so, then a hang 15 minutes later. Then hours without a problem, and a new hang.
BTW, the hangs with the short period between them where when I was running something under qemu-kvm (i686 program under x86_64).
Now I rmmod'ed intel_ips before logging in, had a hang (after suspending + waking up, and some 30 minutes of use). Again, a hang with the same configuration today after working some 2 hours. Currently running with nomodeset, no intel_ips (intel_ips now blacklisted, but I doubt it will do much good).
Problem is that the very same configuration can work 2 days straight (some 20 hours) and then hang thrice in an hour. Very frustrating.
(And Murphy's law makes sure it happens at the worst possible moments too, at least that part of the universe is working fine).
Tried with nouveau.noaccell=1, but that doesn't work now: At most 1024x768 (my notebook monitor does 1366x768, the external VGA monitor does 1680x1050; both look awful as you can imagine); can't configure an external monitor at all. It seems nomodeset is better here :-(
OK, new data points: After removing all configuration, it hung after some 20 minutes of use. And then the rest of Friday without trouble, currently (Sunday) uptime is 25 hours (not all in active use, though ;-), no troubles.
Looks like I'll just ride it out. I guess there was some intel_ips problem (now fixed) with the same sympthoms (or which triggered the Nouveau bug somehow).
Can you please try upgrading to the latest kernel from koji?
Will do. Note that the frequency of freezing is around once a day or so right now, it'll take a few days to confirm this is really fixed.
Have been running a few hours without freezes now.
Ben, this should probably be closed as a dupe of bug #684608
(In reply to comment #52)
> Ben, this should probably be closed as a dupe of bug #684608
Perhaps. I left it open as Horst also mentioned hangs with NoAccel.. Which, well, should not happen.. I'm not entirely sure how nouveau *could* cause them with NoAccel turned on...
(In reply to comment #53)
> (In reply to comment #52)
> > Ben, this should probably be closed as a dupe of bug #684608
> Perhaps. I left it open as Horst also mentioned hangs with NoAccel.. Which,
> well, should not happen.. I'm not entirely sure how nouveau *could* cause them
> with NoAccel turned on...
I did see infrequent hangs with noaccell (I didn't use it much), but that was way back (when hangs without configuration happened each 10 to 20 minutes). Also note that my older notebook (also Toshiba, but with an intel GPU; same Fedora) was prone to freezing the same way, but less. I haven't used it much lately, but it didn't freeze at all recently.
The situation has changed substantially (due to kernel evolution?), so perhaps there were several different bugs at work. And yes, the symptoms are as in bug #684608.
Currently I'm running the kernel suggested in comment 49 for more than a day without incidents. I'll let this weekend pass and call it fixed unless something shows up.
Updated the kernel today. Had kernel-184.108.40.206-15.rc1.fc15.x86_64 running for some 20 hours straight without freezes, now kernel-220.127.116.11-18.fc15.x86_64 for a few hours. I guess this can be closed...
Almost a day with kernel-18.104.22.168-18.fc15.x86_64, no problems up to now.
No further freezes. Any other experiment worth doing, or just close this sucker?
We can close this then. We do have 684608 covering this bug already (and, actually, another one which i originally intended to cover this issue), I didn't duplicate it on suspicion that you were also seeing another bug.
But, let's do it!