Bug 694936

Summary: kernel-2.6.38.2-13.fc15.x86_64 hard locks on Geforce 9400 GT (10de:0641)
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: xorg-x11-drv-nouveauAssignee: Ben Skeggs <bskeggs>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 15CC: airlied, ajax, bskeggs, hdegoede
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-04-15 11:55:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Dmesg of lattitude 6400 + nvs160m
none
xorg.log of lattitude 6400 + nvs160m none

Description Adam Williamson 2011-04-08 23:33:28 UTC
See summary. I installed kernel-2.6.38.2-13.fc15.x86_64, booted it twice, it hard locked (cursor stuck, couldn't switch to a vt, couldn't ssh in) within ten minutes each time. Back to -9 and that hasn't locked in an hour of use, so I'm pretty sure it's the kernel.

/var/log/messages doesn't have anything at the time of the lock, it just stops. I can try again with drm.debug if needed, but I know Ben has this same adapter so he may be able to reproduce easily.

Comment 1 Hans de Goede 2011-04-09 18:33:36 UTC
I'm seeing the exact same thing (did not try to ssh in though) on my Dell latitude 6400 laptop with a Quadro NVS 160M (G98M), which is also an NV50 card and actually is pretty close to the 9400GT all around.

I'm quite experienced with most forms of debugging let me know if there is anything I can do help. I'm hansg@freenode on irc.

Comment 2 Hans de Goede 2011-04-09 18:50:20 UTC
Created attachment 490987 [details]
Dmesg of lattitude 6400 + nvs160m

Comment 3 Hans de Goede 2011-04-09 18:53:46 UTC
Created attachment 490988 [details]
xorg.log of lattitude 6400 + nvs160m

Note I believe I've been seeing this since 2.6.38.2-11 (-10 is doa, -9 is ok), but I did not file this bug before because I wasn't sure that this did not happen with -9. However I'm pretty confident now that this is a regression from -9.

Note 2: The attached logs contains some vc / vt switches. This happens without them too, this was just me switching to a text vc to scp the logs away from the
system, since sometimes it does not even get enough uptime to browse to a bug and attach logs (when running -13).

Comment 4 Adam Williamson 2011-04-11 07:03:23 UTC
Backtrace (via netconsole):

[11299.704013] ------------[ cut here ]------------
[11299.704013] WARNING: at kernel/watchdog.c:226 watchdog_overflow_callback+0x9b/0xa6()
[11299.704013] Hardware name: System Product Name
[11299.704013] Watchdog detected hard LOCKUP on cpu 0
[11299.704013] Modules linked in: tcp_lp tun fuse netconsole configfs ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc coretemp snd_hda_codec_realtek snd_ice1724 snd_ice17xx_ak4xxx snd_ac97_codec ac97_bus snd_ak4xxx_adda snd_hda_intel nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
[11299.704013] Pid: 1554, comm: Xorg Not tainted 2.6.38.2-13.fc15.x86_64 #1
[11299.704013] Call Trace:
[11299.704013]  <NMI>  [<ffffffff81055066>] ? warn_slowpath_common+0x83/0x9b
[11299.704013]  [<ffffffff81055121>] ? warn_slowpath_fmt+0x46/0x48
[11299.704013]  [<ffffffff810ac1a3>] ? watchdog_overflow_callback+0x9b/0xa6
[11299.704013]  [<ffffffff810d3b5d>] ? __perf_event_overflow+0x135/0x191
[11299.704013]  [<ffffffff81016296>] ? paravirt_write_msr+0xf/0x13
[11299.704013]  [<ffffffff810d41b6>] ? perf_event_overflow+0x14/0x16
[11299.704013]  [<ffffffff8101988c>] ? intel_pmu_handle_irq+0x37e/0x3e1
[11299.704013]  [<ffffffff8147655e>] ? perf_event_nmi_handler+0x67/0xb3
[11299.704013]  [<ffffffff81478207>] ? notifier_call_chain+0x37/0x63
[11299.704013]  [<ffffffff8147825f>] ? atomic_notifier_call_chain+0x18/0x1a
[11299.704013]  [<ffffffff8147828f>] ? notify_die+0x2e/0x30
[11299.704013]  [<ffffffff814759f4>] ? do_nmi+0x6d/0x217
[11299.704013]  [<ffffffff81475710>] ? nmi+0x20/0x30
[11299.704013]  [<ffffffff81474c2f>] ? _raw_spin_lock_irqsave+0x27/0x2f
[11299.704013]  <<EOE>>  <IRQ>  [<ffffffffa008c4d7>] ? nouveau_irq_handler+0x4c/0x116 [nouveau]
[11299.704013]  [<ffffffff810ac961>] ? handle_IRQ_event+0x58/0x11f
[11299.704013]  [<ffffffff8101012c>] ? sched_clock+0x9/0xd
[11298.652093] ------------[ cut here ]------------
[11298.652093] WARNING: at kernel/watchdog.c:226 watchdog_overflow_callback+0x9b/0xa6()
[11298.652093] Hardware name: System Product Name
[11298.652093] Watchdog detected hard LOCKUP on cpu 1
[11298.652093] Modules linked in: tcp_lp tun fuse netconsole configfs ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc coretemp snd_hda_codec_realtek snd_ice1724 snd_ice17xx_ak4xxx snd_ac97_codec ac97_bus snd_ak4xxx_adda snd_hda_intel[11298.652093]  [<ffffffffa00e1626>] ? nv50_vm_flush_engine+0x27/0x9f [nouveau]
[11298.652093]  [<ffffffffa00b695a>] ? nv84_graph_tlb_flush+0x16a/0x19c [nouveau]
[11298.652093]  [<ffffffffa00e16f5>] ? nv50_vm_flush+0x57/0x6e [nouveau]
[11298.652093]  [<ffffffffa00a6534>] ? nouveau_vm_unmap_at+0xbd/0xcc [nouveau]
[11298.652093]  [<ffffffffa00a655e>] ? nouveau_vm_unmap+0x1b/0x1d [nouveau]
[11298.652093]  [<ffffffffa008deab>] ? nouveau_bo_del_ttm+0x66/0x7b [nouveau]
[11298.652093]  [<ffffffffa006f819>] ? ttm_bo_release_list+0x9d/0xc1 [ttm]
[11298.652093]  [<ffffffffa006f77c>] ? ttm_bo_release_list+0x0/0xc1 [ttm]
[11298.652093]  [<ffffffff8122babf>] ? kref_put+0x43/0x4d
[11298.652093]  [<ffffffffa00701eb>] ? ttm_bo_delayed_delete+0xb3/0x111 [ttm]
[11298.652093]  [<ffffffffa0070249>] ? ttm_bo_delayed_workqueue+0x0/0x31 [ttm]
[11298.652093]  [<ffffffffa0070265>] ? ttm_bo_delayed_workqueue+0x1c/0x31 [ttm]
[11298.652093]  [<ffffffff8106ae83>] ? process_one_work+0x186/0x298
[11298.652093]  [<ffffffff8106b210>] ? worker_thread+0xda/0x15d
[11298.652093]  [<ffffffff8106b136>] ? worker_thread+0x0/0x15d
[11298.652093]  [<ffffffff8106b136>] ? worker_thread+0x0/0x15d
[11298.652093]  [<ffffffff8106ea73>] ? kthread+0x84/0x8c
[11298.652093]  [<ffffffff8100a9e4>] ? kernel_thread_helper+0x4/0x10
[11298.652093]  [<ffffffff8106e9ef>] ? kthread+0x0/0x8c
[11298.652093]  [<ffffffff8100a9e0>] ? kernel_thread_helper+0x0/0x10
[11298.652093] ---[ end trace d29049a9b791b5a5 ]---
[11339.831000] ------------[ cut here ]------------
[11339.831000] WARNING: at kernel/watchdog.c:226 watchdog_overflow_callback+0x9b/0xa6()
[11339.831000] Hardware name: System Product Name
[11339.831000] Watchdog detected hard LOCKUP on cpu 2
[11339.831000] Modules linked in: tcp_lp tun fuse netconsole configfs ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc coretemp snd_hda_codec_realtek snd_ice1724 snd_ice17xx_ak4xxx snd_ac97_codec ac97_bus snd_ak4xxx_adda snd_hda_intel drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
[11339.831000] Pid: 2723, comm: qemu-kvm Tainted: G        W   2.6.38.2-13.fc15.x86_64 #1
[11339.831000] Call Trace:
[11339.831000]  <NMI>  [<ffffffff81055066>] ? warn_slowpath_common+0x83/0x9b
[11339.831000]  [<ffffffff81055121>] ? warn_slowpath_fmt+0x46/0x48
[11339.831000]  [<ffffffff810ac1a3>] ? watchdog_overflow_callback+0x9b/0xa6
[11339.831000]  [<ffffffff810d3b5d>] ? __perf_event_overflow+0x135/0x191
[11339.831000]  [<ffffffff81016296>] ? paravirt_write_msr+0xf/0x13
[11339.831000]  [<ffffffff810d41b6>] ? perf_event_overflow+0x14/0x16
[11339.831000]  [<ffffffff8101988c>] ? intel_pmu_handle_irq+0x37e/0x3e1
[11339.831000]  [<ffffffff8147655e>] ? perf_event_nmi_handler+0x67/0xb3
[11339.831000]  [<ffffffff81478207>] ? notifier_call_chain+0x37/0x63
[11339.831000]  [<ffffffff8147825f>] ? atomic_notifier_call_chain+0x18/0x1a
[11339.831000]  [<ffffffff8147828f>] ? notify_die+0x2e/0x30
[11339.831000]  [<ffffffff814759f4>] ? do_nmi+0x6d/0x217
[11339.831000]  [<ffffffff81475710>] ? nmi+0x20/0x30
[11339.831000]  <<EOE>> 
[11339.831000] ---[ end trace d29049a9b791b5a6 ]---

Comment 5 Hans de Goede 2011-04-15 11:55:47 UTC
I've been running kernel 2.6.38-2.14:

* Tue Apr 12 2011 Ben Skeggs <bskeggs> 2.6.38-2.14 - nouveau: correct lock ordering problem 

On the machine in question, with gnome-shell the entire day yesterday and
I had 0 lockups, so it seems this fixes this, closing.

Comment 6 Adam Williamson 2011-04-15 15:09:55 UTC
yeah, ditto.