Bug 811531 - nouveau: Kernel panics do not allow the system to boot or to be re-installed
Summary: nouveau: Kernel panics do not allow the system to boot or to be re-installed
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-nouveau
Version: 16
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Ben Skeggs
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-04-11 10:55 UTC by Panos Kavalagios
Modified: 2012-06-13 12:25 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-13 12:25:08 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Log file /tmp/syslog (109.63 KB, text/plain)
2012-04-11 10:56 UTC, Panos Kavalagios
no flags Details
Log file /var/log/Xorg.0.log of installed system (19.49 KB, text/plain)
2012-04-11 11:18 UTC, Panos Kavalagios
no flags Details

Description Panos Kavalagios 2012-04-11 10:55:15 UTC
Description of problem: An installed Fedora 16 system was unable to boot after an upgrade. A kernel panic for drm_nouveau is printed and nothing more. Booted in rescue mode and installed the latest updates to no avail. Now the behaviour is that the monitor goes blank. An attempt to re-install the system from Fedora 16 DVD media is also failing with a kernel exception.


Version-Release number of selected component (if applicable): kernel-3.3.1-3.fc16.x86_64


How reproducible: Boot Fedora 16 installation DVD


Steps to Reproduce:
1. Insert Fedora 16 DVD
2. Select to install fedora (default)
3. Wait for a while for the boot process to complete
  
Actual results: A kernel exception occurs.


Expected results: No exception should have been printed. Even though you can continue in language and keyboard selection, after that the installation hangs on anaconda startup.


Additional info: The /tmp/syslog is attached. Any workarounds to revive the system, even with clean installation are highly appreciate.

Comment 1 Panos Kavalagios 2012-04-11 10:56:06 UTC
Created attachment 576743 [details]
Log file /tmp/syslog

Comment 2 Panos Kavalagios 2012-04-11 11:18:54 UTC
Created attachment 576745 [details]
Log file /var/log/Xorg.0.log of installed system

Already installed system replied in ping. I was able to log in and retrieve the Xorg.0.log.

I apologise for the confusion. I am reporting two issues. One for Fedora 16 DVD and another one of the installed system. I don't know if they are related. Please, advise if they should be split.

Comment 3 Josh Boyer 2012-04-11 12:32:34 UTC
(In reply to comment #2)
> Created attachment 576745 [details]
> Log file /var/log/Xorg.0.log of installed system
> 
> Already installed system replied in ping. I was able to log in and retrieve the
> Xorg.0.log.
> 
> I apologise for the confusion. I am reporting two issues. One for Fedora 16 DVD
> and another one of the installed system. I don't know if they are related.
> Please, advise if they should be split.

We can't remaster the F16 install media, so we'll ignore that one for now.  I'm not sure why it would have worked the first time for you and not now though.

You might want to note which kernel version produced the nouveau backtrace on the installed system, and provide dmesg or /var/log/messages containing the backtrace if you can ssh in.

Comment 4 Panos Kavalagios 2012-04-11 17:49:24 UTC
Hello Josh,

It seems that my logs are clean, since "grep -i oops /var/log/*" does not reveal any kernel backtrace. I am trying to remember if it was an exception or a regular nouveau print message that was the last thing displayed as it behaves now after the latest updates installed. I don't think the logs have been rotated (even though I changed weekly to monthly rotation in /etc/logrotate.conf today). I think we may safely conclude that there was no exception on the installed system.

It is really surprising how I have managed to perform a clean F16 in the first place. I do have changed the hardware configuration, since then, but it was only to install another Nvidia Geforce 550 Ti again (replacement of the same card model).

Current kernel versions installed:

root@alcestis:[66] ~ # rpm -q kernel
kernel-3.1.2-1.fc16.x86_64
kernel-3.2.2-1.fc16.x86_64
kernel-3.3.1-3.fc16.x86_64

The remote ssh access is a good sign that recovery is possible. It is weird why virtual consoles are not working though (ctrl-alt-F2 etc), that's why it gives the impression of a totally stuck system.

Looking forward for Nouveau Experts investigation. Just to note that the exactly same hardware configuration runs fine with nvidia drivers from rpmfusion.

Comment 5 Panos Kavalagios 2012-04-20 05:55:25 UTC
Same error with the new kernel-3.3.2-1.fc16.x86_64. The monitor goes to standby as there is no signal from the video card. Only remote access is available to the defected system.

Comment 6 Panos Kavalagios 2012-05-29 06:26:38 UTC
(In reply to comment #4)
> The remote ssh access is a good sign that recovery is possible. It is weird
> why virtual consoles are not working though (ctrl-alt-F2 etc), that's why it
> gives the impression of a totally stuck system.

This is also observed on Bug 825092 on a virutalbox system and not physical that time. The guest additions after the upgrade haven't been installed and the video driver that couldn't be loaded led to a stuck system (unable to log in). It is very important to be able to at least log in in text mode, when graphics driver fails. At least that was the case before. Now it seems that the weird auto-spawn of ttys and possible systemd have managed to lock the system :)

Comment 7 Panos Kavalagios 2012-06-09 19:11:58 UTC
I've seen in forums that many users having the same graphics card Nvidia GeForce GTX 550 Ti are affected by the same problem. The problem is caused on kernels 3.2 and 3.3. Looking forward for kernel 3.4 that the issue has been resolved.

Related to the issue described on bug 802751. I think it should be re-assigned to kernel team.

Comment 8 Ben Skeggs 2012-06-13 12:25:08 UTC
This issue has been fixed upstream, and should also be fixed in newer Fedora kernels already.

At some point NVIDIA modified one of their VBIOS tables in a way that made nouveau crash, Nouveau has since been fixed.


Note You need to log in before you can comment on or make changes to this bug.