Bug 560147 - Xorg suddenly crashes and console "loose input"
Summary: Xorg suddenly crashes and console "loose input"
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-nouveau
Version: 14
Hardware: All
OS: Linux
low
urgent
Target Milestone: ---
Assignee: Ben Skeggs
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-01-29 23:51 UTC by Barbara
Modified: 2018-04-11 07:15 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-08-16 21:48:58 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
/var/log/Xorg.0.log obtained running 'startx -- -logverbose 9' (8.81 KB, application/x-bzip2)
2010-01-29 23:51 UTC, Barbara
no flags Details
dmesg output after freeze on runlevel 3 (11.20 KB, text/plain)
2010-01-31 10:50 UTC, Barbara
no flags Details
two sets of logs after X crash (30.55 KB, application/x-bzip-compressed-tar)
2010-02-22 21:38 UTC, Barbara
no flags Details
some Xorg.0.log containg backtrace (150.00 KB, text/plain)
2010-02-26 00:00 UTC, Barbara
no flags Details
lspci -vvvxxx -s 02:0 (3.05 KB, text/plain)
2010-04-06 05:53 UTC, Barbara
no flags Details
display using nouveau WITHOUT pcie_aspm=off (563.51 KB, image/png)
2010-06-20 12:21 UTC, Barbara
no flags Details

Description Barbara 2010-01-29 23:51:54 UTC
Created attachment 387679 [details]
/var/log/Xorg.0.log obtained running 'startx -- -logverbose 9'

Description of problem:
Before trying the upgrade from Fedora 11, I tried an install of Fedora 12 on a spare partition.
The installer tried to use nouveau but after many many attempts I tried with vesa as it was impossible to finish due to locks on different parts of the procedure.
Once the installation finished, I was unable to run X. Anyway I decided to keep that installation and follow the upgrades.
Now I run it on runlevel 3 and very often it seems that I loose the keyboard but what is really happening is that the commands are not echoed to the screen. In fact I can "blindly" shut down the pc. This running with nomodeset and rdblacklist=nouveau, else the situation goes even worst. For example, the boot process take a lot of time - minutes - due to the slowness of the output to the console, while on F11 it take about 40 seconds.
While I was getting a corrupted desktop running startx (with no xorg.conf), now it starts and I can also start gnome-terminal, but I loose the keyboard and the CPU goes 100%, and I have no other remedy then connecting via ssh and shutting down the pc, as soon I start firefox, for example.

On /var/log/Xorg.0.log there are entries like 
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
and a backtrace. I'll attach the file.
The last two lines on /var/log/messages are:
hrtimer: interrupt too slow, forcing clock min delta to 2333931 ns
[drm] nouveau 0000:02:00.0: PFIFO_DMA_PUSHER - Ch 2

This is what lspci -v says about my GPU:
02:00.0 VGA compatible controller: nVidia Corporation GeForce 8500 GT (rev a1) (prog-if 00 [VGA controller])
	Flags: bus master, fast devsel, latency 0, IRQ 24
	Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
	Memory at d0000000 (64-bit, prefetchable) [size=256M]
	Memory at f8000000 (64-bit, non-prefetchable) [size=32M]
	I/O ports at cc00 [size=128]
	Expansion ROM at fbbe0000 [disabled] [size=128K]
	Capabilities: [60] Power Management version 2
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [100] Virtual Channel <?>
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information <?>
	Kernel modules: nouveau, nvidiafb



Version-Release number of selected component (if applicable):
# uname -rsm
Linux 2.6.31.12-174.2.3.fc12.x86_64 x86_64
all rpms updated via yum

How reproducible:
startx

Steps to Reproduce:
1.startx
2.
3.
  
Actual results:
pc locally locked

Expected results:
a working graphical environment


Additional info:

Comment 1 Barbara 2010-01-31 10:50:11 UTC
Created attachment 387832 [details]
dmesg output after freeze on runlevel 3

Comment 2 Barbara 2010-01-31 11:22:44 UTC
I've added the full output of dmesg after a freeze on runlevel 3 and with nouveau.modeset=1 at boot.

After it froze the following line was added:
[drm] nouveau 0000:02:00.0: GPU lockup - switching to software fbcon

As I've said, it seems that the keyboard is not working, but I can actually send commands (e.g. shutdown -r now). What happens is that what I type is not echoed.
Switching vt doesn't help.
It ofter happens also while the OS is booting, leaving (uncomplete) boot messages on the screen; I can still "blindly" login and shutdown.


I don't know which other information could be helpful to you, so please ask.
I can do any test you need.

Comment 3 Barbara 2010-02-07 15:59:11 UTC
In the meanwhile I've tried to start installing the i386 version but I gave up because I was experiencing the same random locks of the installer using the default graphic driver. So it shouldn't be a x86_64 only problem.

Comment 4 Barbara 2010-02-18 20:52:29 UTC
I'm still keeping my installation updated running yum update every day.
This is probably a nouveau problem, as the lost of visual feedback symptom on runlevel 3 is not happening blacklisting nouveau.
Now X is not crashing anymore with vesa as soon I run startx, but it's very sluggish and barely usable @ 1440x900. Still freeze and loose keyboard running startx with nouveau loaded.

Comment 5 Ben Skeggs 2010-02-18 22:57:29 UTC
Are you able to give the most recent kernel build from koji a try, and see if this still happens? http://koji.fedoraproject.org/koji/buildinfo?buildID=157041

Thanks!

Comment 6 Barbara 2010-02-19 08:43:10 UTC
I tried downloading the rpm and installing it with rpm -ivh, but it requires a dep which I'm not able to find:
error: Failed dependencies:
kernel-firmware >= 2.6.32.8-58.fc12 is needed by kernel-2.6.32.8-58.fc12.x86_64
Where can I get it?

Comment 7 Barbara 2010-02-19 19:26:20 UTC
Ok, I've missed to add that I've finally found. It is on the same page, but I was seeing that because it takes minutes for the page to scroll as it get slowly repainted after few pixels are scrolled.
Now I'm going to test it.

Comment 8 Barbara 2010-02-20 10:35:34 UTC
The problem on runlevel 3 still exists with the new kernel and I still get the following line in /var/log/message:
kernel: [drm] nouveau 0000:02:00.0: GPU lockup - switching to software fbcon

Comment 9 Barbara 2010-02-22 21:34:18 UTC
I'm attaching a tarball with two sets of dmesg, messages and Xorg.0.log obtained after 2 crashes.
Both Xorg.0.log contain backtraces.
They both happened with the kernel you suggested after running startx and starting firefox as soon as the desktop was loaded.

Comment 10 Barbara 2010-02-22 21:38:12 UTC
Created attachment 395581 [details]
two sets of logs after X crash

Comment 11 Barbara 2010-02-22 22:38:49 UTC
I should have added that when those crashes happened, the screen remained on the VT where Xorg spawned instead of going back to the one where startx was launched (I think that should be the normal behavior).
The mouse pointer was stuck and ctrl-alt-f[1-5] apparently has no effect.
*Sometimes*, waiting for a while, I am able to "blindly" switch to a VT, login and shutdown the system.
Here with "blindly" I mean that on the monitor there is still the frozen GNOME desktop.
Doing some other tests, I had another couple of crashes one after maximizing aisleriot soon after starting it, and one starting it (it was maximized after the prev. start); gnome-terminal seems running but with some artifacts, though I've not tried maximizing it.

Comment 12 Barbara 2010-02-24 22:50:59 UTC
I'm still updating every day.
Today I've got xorg-x11-server-common-1.7.5-1.fc12.x86_64 and xorg-x11-server-Xorg-1.7.5-1.fc12.x86_64 among the other, but nothing changed since #9.

Comment 13 Barbara 2010-02-25 23:58:33 UTC
Today I've installed fluxbox and then run startx.
As it was looking more stable (the way it started), I've opened xterm and I've enjoyed myself a lot searching for similar problems on bugzilla(*) using elinks.

Then I started epiphany; it didn't start maximized but then it automatically went fullscreen, and when it reached ~75% of loading the home page, the lock happened again.
Looking at top, X was comsuming 100% of the cpu.
So after waiting a while, I started pressing ctrl+alt+f[1-3] and numlock repeatedly while running tail -f /var/log/Xorg.0.log from my ssh connected laptop, and the backtrace appeared.
And I can always reproduce it.

I repeated this pattern and I've collected about 10 Xorg.0.log containing backtrace.
They look very similar, but not identical.
And also:
- numlock is always dead.
- most of the times the mouse is still alive but windows are frozen and they don't receive focus.
- often, after the crash or after pressing alt+sysrq+r, numlock start working again so I can blindly (with epiphany on the screen) switch to a VT, login and shutdown.

BTW, I also had a couple of previously described no keyboard feedback on VT also without nouveau loaded. They just happened after a greater uptime.
This is very confusing.

(*) IMHO, at least 566987, 559791 and 556302 are caused by the same causes.

Comment 14 Barbara 2010-02-26 00:00:03 UTC
Created attachment 396432 [details]
some Xorg.0.log containg backtrace

Comment 15 Barbara 2010-03-05 07:57:07 UTC
My smolt profile
http://www.smolts.org/client/show/pub_4f7873bb-2a08-4d1a-9615-fd7f3a450603

BTW
"smoltSendProfile -p" prints absolutely nothing

Comment 16 Barbara 2010-03-06 07:10:25 UTC
I'm still updating and it's still not working.
I've found that a way to stop that problem is renaming /lib/modules/`uname -r`/kernel/drivers/gpu/drm/nouveau.ko and rebuild initramfs so that the module doesn't get loaded.

Now a suggestion would be really appreciated.

Comment 17 Barbara 2010-03-13 11:56:52 UTC
I'm not even sure about comment #16.

Comment 18 Barbara 2010-04-05 23:50:55 UTC
Hey!
I tried this: https://bugzilla.redhat.com/show_bug.cgi?id=566987#c12
and it finally seems to have fixed the problem for me.
I can start Xorg using nouveau now.
Anyway I have to do some other tests to for example if it still locks on runlevel 3, as I did a lot of things including downgrading BIOS, change various options, etc.

Comment 19 Ben Skeggs 2010-04-06 01:02:30 UTC
CC'ing Matthew Garrett, pcie_aspm=off fixes the problem.

Comment 20 Matthew Garrett 2010-04-06 01:48:11 UTC
Could you attach the output of

lspci -vvvxxx -s 02:0

run as root?

Comment 21 Barbara 2010-04-06 05:53:01 UTC
Created attachment 404614 [details]
lspci -vvvxxx -s 02:0

Comment 22 Barbara 2010-04-08 16:32:17 UTC
Please contact me ASAP if you need further tests/probes/etc. for F12, as I plan to install F13-Alpha on next Sunday.
In fact the workaround seems to have fixed the problem (573207) with the latter too.
As F13 should be released in less then a month and a half, I have no intention to move from F11 to F12 now, I will simply skip it, as I consider testing and providing feedback for F13 more important at this moment, in the hope to have less unpleasant surprises with the final release.

Comment 23 Matěj Cepl 2010-06-18 22:02:07 UTC
(In reply to comment #22)
> Please contact me ASAP if you need further tests/probes/etc. for F12, as I plan
> to install F13-Alpha on next Sunday.

What is the current status of this bug for you? Can you reproduce this on F13?

Thank you

Comment 24 Barbara 2010-06-20 12:21:10 UTC
Created attachment 425437 [details]
display using nouveau WITHOUT pcie_aspm=off

This is how display looks with nouveau and without pcie_aspm=off as soon as Xorg starts.

Comment 25 Barbara 2010-06-20 12:37:38 UTC
Yes, I can reproduce it on a fully updated F13.

But I've found that I have two kinds of bugs with nouveau.

Using nouveau:
If I boot WITHOUT pcie_aspm=off in grub.conf:
    - the display appears like the picture attached on Comment 24
    - xorg uses 100% cpu
    - the keyboard is not responding (num-lock led)
If I boot WITH pcie_aspm=off, xorg starts regularly.
But I can freeze it just playing a movie with totem. And this is always reproducible. When it happens:
    - the display is not updated after few seconds the movie started
    - xorg start using 100% cpu
    - keyboard is not responding
    - in /var/log/message the following line is added
        "kernel: [drm] nouveau 0000:02:00.0: PFIFO_DMA_PUSHER - Ch 2"

In both cases, sysrq+r unlock keyboard (num-lock) but I can't switch VT, and I can't restore the display. I can only restart the machine if connected with ssh.

So I've tried the nVidia binary driver.
If I don't add pcie_aspm=off in grub.conf, it doesn't work. I only get a black screen.
With pcie_aspm=off, it starts normally and I have no problem playing the same movie that make nouveau hanging.

Comment 26 Matěj Cepl 2010-06-20 23:08:41 UTC
Looks similar to bug 596330, but not sure whether it is the same.

Comment 27 Bug Zapper 2010-11-03 23:33:39 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 28 Barbara 2010-12-05 13:10:46 UTC
(In reply to comment #27)

The problem still exists in F13 and F14.

Comment 29 Fedora End Of Life 2012-08-16 21:49:00 UTC
This message is a notice that Fedora 14 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 14. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained.  At this time, all open bugs with a Fedora 'version'
of '14' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this 
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen 
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we were unable to fix it before Fedora 14 reached end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" (top right of this page) and open it against that 
version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping


Note You need to log in before you can comment on or make changes to this bug.