Red Hat Bugzilla – Bug 815577
F16: Nouveau fails on reboot but OK on power-cycle - using NVS-295 - later soft lockup
Last modified: 2013-02-13 07:50:26 EST
Created attachment 579717 [details]
Powerup (good) /var/log/messages
Description of problem: When rebooting the system, the nouveau driver fails in a manner that does NOT occur if the system is started from a power-off state (even briefly). Smells like the NVS-295 hardware being in a different state (i.e. not reset by power off) is confusing the nouveau boot logic.
Version-Release number of selected component (if applicable):
kernel = 3.3.2-1.fc16.x86_64
xorg-x11-drv-nouveau.x86_64 = 1:0.0.16-27.20110720gitb806e3f.fc16
How reproducible: Quite repeatable
Steps to Reproduce:
1. Boot Fedora-16 from power-off, wait until up
2. Issue reboot command
On reboot, display usually (always???) freezes after "Loading initial ramdisk ...". The display adapter switches from big font to smaller font (no change in the two lines displayed) and that's all we see on the console. The system does continue to boot though and is accessible via SSH.
Expected results: Some better
The first OOPS logged to /var/log/messages occurs after 134 seconds (see below). This is quite a while after things have become obviously bad to the human observer. I have attached both a reboot(bad) and powerup(good) subset of /var/log/messages. Some of the hardware driven behavior (especially network devices) starts to follow a different execution order after 2-3 seconds.
In the "reboot" case the nouveau driver get initialized 2 seconds later (at 5 seconds rather than 3) and then everything seems to stall out for 60 seconds, unlike the "powerup" case where things proceed without a 60 second pause.
Around 135 seconds the first warning is issued (detected in nv50_display_flip_stop). Later warnings are marked tainted.
I SSH'ed into the system and eventually did "poweroff -f", which resulted in a series of "BUG: soft lockup - CPU#0 stuck for 23s! [poweroff:3143]" (also visible on the SSH client), until I got bored and powercycled the machine.
---------- First Warning in /var/log/messages ---------------------
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973839] ------------[ cut here ]------------
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973862] WARNING: at drivers/gpu/drm/nouveau/nv50_display.c:414 nv50_display_flip_stop+0x221/0x4f0 [nouveau]()
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973864] Hardware name: Precision WorkStation T3500
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973866] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack vhost_net macvtap macvlan tun virtio_net kvm_intel kvm snd_hda_codec_analog iTCO_wdt joydev i2c_i801 iTCO_vendor_support uinput serio_raw ppdev parport_pc dell_wmi parport sparse_keymap snd_hda_intel snd_hda_codec snd_hwdep i7core_edac edac_core tg3 snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc microcode dcdbas nfsd lockd nfs_acl auth_rpcgss sunrpc raid1 nouveau ttm drm_kms_helper drm i2c_core mxm_wmi video wmi [last unloaded: scsi_wait_scan]
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973895] Pid: 212, comm: plymouthd Not tainted 3.3.2-1.fc16.x86_64 #1
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973897] Call Trace:
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973903] [<ffffffff81057abf>] warn_slowpath_common+0x7f/0xc0
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973906] [<ffffffff81057b1a>] warn_slowpath_null+0x1a/0x20
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973917] [<ffffffffa011c3e1>] nv50_display_flip_stop+0x221/0x4f0 [nouveau]
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973928] [<ffffffffa01150fe>] nv50_crtc_mode_set_base+0x2e/0x80 [nouveau]
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973933] [<ffffffffa007cfb8>] drm_crtc_helper_set_config+0x778/0xb10 [drm_kms_helper]
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973943] [<ffffffffa003e257>] drm_mode_setcrtc+0x127/0x480 [drm]
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973950] [<ffffffffa002f464>] drm_ioctl+0x444/0x510 [drm]
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973957] [<ffffffffa003e130>] ? drm_mode_setplane+0x3b0/0x3b0 [drm]
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973961] [<ffffffff8108fb7d>] ? set_next_entity+0xad/0xd0
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973964] [<ffffffff81193228>] do_vfs_ioctl+0x98/0x550
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973967] [<ffffffff815f2e04>] ? __schedule+0x3c4/0x7b0
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973969] [<ffffffff81193771>] sys_ioctl+0x91/0xa0
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973972] [<ffffffff815fc5a9>] system_call_fastpath+0x16/0x1b
Apr 23 15:38:25 pc-110-cb kernel: [ 134.973974] ---[ end trace 7075723de90c731c ]---
Created attachment 579718 [details]
is this still happening with 3.4 ?
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '16'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 16's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 16 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora, you are encouraged to click on
"Clone This Bug" and open it against that version of Fedora.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.