Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1302816

Summary: possible spinlock issue in nouveau on quadro k420
Product: Red Hat Enterprise Linux 7 Reporter: Joe Wright <jwright>
Component: xorg-x11-drv-nouveauAssignee: Ben Skeggs <bskeggs>
Status: CLOSED WONTFIX QA Contact: Desktop QE <desktop-qa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.2CC: cww, tpelka
Target Milestone: rcKeywords: Desktop
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-01 16:53:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sosreport none

Description Joe Wright 2016-01-28 17:00:28 UTC
Description of problem:
-possible spinlock issue on quadro k420 in nouveau

Version-Release number of selected component (if applicable):
- RHEL 7.1
- xorg-x11-drv-nouveau-1.0.10-5.el7.x86_64  
- libdrm-2.4.56-2.el7.x86_64 

How reproducible:
- intermittent

Steps to Reproduce:
1. unsure, happens occasionally
2.
3.

Actual results:
- softlockup on what appears to be a spinlock issue

Expected results:


Additional info:

01:00.0 VGA compatible controller: NVIDIA Corporation GK107GL [Quadro K420] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GK107 HDMI Audio Controller (rev a1)


Jan 25 11:11:06 dhws122 kernel: [1536034.220466] nouveau E[   PFIFO][0000:01:00.0] write fault at 0x0005924000 [PTE] from CE2/GR_COPY on channel 0x003fa31000 [Xorg[18982]]
Jan 25 11:11:06 dhws122 kernel: [1536034.220471] nouveau E[   PFIFO][0000:01:00.0] PCE2 engine fault on channel 2, recovering...
Jan 25 11:11:21 dhws122 kernel: [1536049.624322] nouveau E[     DRM] GPU lockup - switching to software fbcon
Jan 25 11:11:31 dhws122 kernel: [1536060.033767] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1:15476]
Jan 25 11:11:31 dhws122 kernel: [1536060.033800] Modules linked in: binfmt_misc bnep bluetooth fuse mvfs(OF) nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd sunrpc fscache intel_powerclamp coretemp snd_hda_codec_hdmi hp_wmi snd_hda_codec_realtek sparse_keymap snd_hda_codec_generic intel_rapl kvm snd_hda_intel crct10dif_pclmul crc32_pclmul snd_hda_controller crc32c_intel ghash_clmulni_intel snd_hda_codec aesni_intel snd_hwdep lrw gf128mul glue_helper ablk_helper cryptd snd_seq snd_seq_device iTCO_wdt snd_pcm iTCO_vendor_support rfkill ppdev serio_raw pcspkr i2c_i801 lpc_ich mei_me snd_timer mfd_core mei snd soundcore shpchp parport_pc parport tpm_infineon uinput ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_common nouveau video mxm_wmi i2c_algo_bit drm_kms_helper ttm ahci e1000e libahci drm libata ptp pps_core i2c_core wmi dm_mirror dm_region_hash dm_log dm_mod
Jan 25 11:11:31 dhws122 kernel: [1536060.033850] CPU: 0 PID: 15476 Comm: kworker/0:1 Tainted: GF          O--------------   3.10.0-229.7.2.el7.x86_64 #1
Jan 25 11:11:31 dhws122 kernel: [1536060.033851] Hardware name: Hewlett-Packard HP Z230 SFF Workstation/1906, BIOS L51 v01.52 07/20/2015
Jan 25 11:11:31 dhws122 kernel: [1536060.033877] Workqueue: events nve0_fifo_recover_work [nouveau]
Jan 25 11:11:31 dhws122 kernel: [1536060.033878] task: ffff8804129896c0 ti: ffff8800bde18000 task.ti: ffff8800bde18000
Jan 25 11:11:31 dhws122 kernel: [1536060.033879] RIP: 0010:[<ffffffffa01ef688>]  [<ffffffffa01ef688>] nve0_fifo_recover_work+0x88/0x330 [nouveau]
Jan 25 11:11:31 dhws122 kernel: [1536060.033896] RSP: 0018:ffff8800bde1bde0  EFLAGS: 00000202
Jan 25 11:11:31 dhws122 kernel: [1536060.033897] RAX: 0000000100000000 RBX: ffff8800bde1bd70 RCX: 0000000000000020
Jan 25 11:11:31 dhws122 kernel: [1536060.033898] RDX: 00000000fffffffe RSI: 0000000000000282 RDI: 0000000000000282
Jan 25 11:11:31 dhws122 kernel: [1536060.033899] RBP: ffff8800bde1be18 R08: 0000000000000282 R09: dfef3a34a013dcd8
Jan 25 11:11:31 dhws122 kernel: [1536060.033899] R10: dfef3a34a013dcd8 R11: 0000000000000293 R12: ffff8800bde1bd60
Jan 25 11:11:31 dhws122 kernel: [1536060.033900] R13: ffff88041d010100 R14: 0000000000000400 R15: 0000000200000000
Jan 25 11:11:31 dhws122 kernel: [1536060.033901] FS:  0000000000000000(0000) GS:ffff88042dc00000(0000) knlGS:0000000000000000
Jan 25 11:11:31 dhws122 kernel: [1536060.033902] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 25 11:11:31 dhws122 kernel: [1536060.033903] CR2: 00007f64da427000 CR3: 00000001970d4000 CR4: 00000000001407f0
Jan 25 11:11:31 dhws122 kernel: [1536060.033904] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 25 11:11:31 dhws122 kernel: [1536060.033905] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 25 11:11:31 dhws122 kernel: [1536060.033905] Stack:
Jan 25 11:11:31 dhws122 kernel: [1536060.033906]  ffff8802de7a5140 ffff88042df87d00 ffff88041373dcd8 ffff880307610580
Jan 25 11:11:31 dhws122 kernel: [1536060.033908]  ffff88042dc12ec0 ffff88042dc17000 0000000000000000 ffff8800bde1be60
Jan 25 11:11:31 dhws122 kernel: [1536060.033910]  ffffffff8108f0bb 000000002dc12ed8 0000000000000000 ffff88042dc12ed8
Jan 25 11:11:31 dhws122 kernel: [1536060.033912] Call Trace:
Jan 25 11:11:31 dhws122 kernel: [1536060.033915]  [<ffffffff8108f0bb>] process_one_work+0x17b/0x470
Jan 25 11:11:31 dhws122 kernel: [1536060.033917]  [<ffffffff8108fe8b>] worker_thread+0x11b/0x400
Jan 25 11:11:31 dhws122 kernel: [1536060.033919]  [<ffffffff8108fd70>] ? rescuer_thread+0x400/0x400
Jan 25 11:11:31 dhws122 kernel: [1536060.033921]  [<ffffffff8109726f>] kthread+0xcf/0xe0
Jan 25 11:11:31 dhws122 kernel: [1536060.033922]  [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Jan 25 11:11:31 dhws122 kernel: [1536060.033925]  [<ffffffff81614158>] ret_from_fork+0x58/0x90
Jan 25 11:11:31 dhws122 kernel: [1536060.033927]  [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Jan 25 11:11:31 dhws122 kernel: [1536060.033927] Code: 00 00 00 83 ea 17 83 fa 0b 0f 87 64 02 00 00 ff 24 d5 90 c7 24 a0 0f 1f 44 00 00 ba 01 00 00 00 0f 1f 00 41 09 d6 ba fe ff ff ff <d3> c2 48 63 d2 48 21 d0 f3 48 0f bc c8 48 85 c0 89 ca 75 c4 49
Jan 25 11:11:36 dhws122 kernel: [1536064.638747] nouveau E[Xorg[18982]] failed to idle channel 0xcccc0001 [Xorg[18982]]
Jan 25 11:11:51 dhws122 kernel: [1536079.645191] nouveau E[Xorg[18982]] failed to idle channel 0xcccc0001 [Xorg[18982]]
Jan 25 11:11:53 dhws122 kernel: [1536081.646068] nouveau E[   PFIFO][0000:01:00.0] runlist 0 update timeout
Jan 25 11:11:55 dhws122 kernel: [1536083.941920] nouveau E[   PFIFO][0000:01:00.0] SCHED_ERROR [ CTXSW_TIMEOUT ]
Jan 25 11:11:58 dhws122 /etc/gdm/Xsession[19164]: (abrt:19922): libnotify-WARNING **: Failed to connect to proxy
Jan 25 11:11:58 dhws122 /etc/gdm/Xsession[19164]: abrt-applet: Failed to receive server caps
Jan 25 11:11:59 dhws122 kernel: [1536088.045796] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1:15476]
Jan 25 11:11:59 dhws122 kernel: [1536088.045823] Modules linked in: binfmt_misc bnep bluetooth fuse mvfs(OF) nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd sunrpc fscache intel_powerclamp coretemp snd_hda_codec_hdmi hp_wmi snd_hda_codec_realtek sparse_keymap snd_hda_codec_generic intel_rapl kvm snd_hda_intel crct10dif_pclmul crc32_pclmul snd_hda_controller crc32c_intel ghash_clmulni_intel snd_hda_codec aesni_intel snd_hwdep lrw gf128mul glue_helper ablk_helper cryptd snd_seq snd_seq_device iTCO_wdt snd_pcm iTCO_vendor_support rfkill ppdev serio_raw pcspkr i2c_i801 lpc_ich mei_me snd_timer mfd_core mei snd soundcore shpchp parport_pc parport tpm_infineon uinput ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_common nouveau video mxm_wmi i2c_algo_bit drm_kms_helper ttm ahci e1000e libahci drm libata ptp pps_core i2c_core wmi dm_mirror dm_region_hash dm_log dm_mod
Jan 25 11:11:59 dhws122 kernel: [1536088.045868] CPU: 0 PID: 15476 Comm: kworker/0:1 Tainted: GF          O--------------   3.10.0-229.7.2.el7.x86_64 #1
Jan 25 11:11:59 dhws122 kernel: [1536088.045869] Hardware name: Hewlett-Packard HP Z230 SFF Workstation/1906, BIOS L51 v01.52 07/20/2015
Jan 25 11:11:59 dhws122 kernel: [1536088.045889] Workqueue: events nve0_fifo_recover_work [nouveau]
Jan 25 11:11:59 dhws122 kernel: [1536088.045890] task: ffff8804129896c0 ti: ffff8800bde18000 task.ti: ffff8800bde18000
Jan 25 11:11:59 dhws122 kernel: [1536088.045891] RIP: 0010:[<ffffffffa01ef678>]  [<ffffffffa01ef678>] nve0_fifo_recover_work+0x78/0x330 [nouveau]
Jan 25 11:11:59 dhws122 kernel: [1536088.045905] RSP: 0018:ffff8800bde1bde0  EFLAGS: 00000293
Jan 25 11:11:59 dhws122 kernel: [1536088.045906] RAX: 0000000100000000 RBX: ffff8800bde1bd70 RCX: 0000000000000020
Jan 25 11:11:59 dhws122 kernel: [1536088.045907] RDX: 0000000000000009 RSI: 0000000000000282 RDI: 0000000000000282
Jan 25 11:11:59 dhws122 kernel: [1536088.045907] RBP: ffff8800bde1be18 R08: 0000000000000282 R09: dfef3a34a013dcd8
Jan 25 11:11:59 dhws122 kernel: [1536088.045908] R10: dfef3a34a013dcd8 R11: 0000000000000293 R12: ffff8800bde1bd60
Jan 25 11:11:59 dhws122 kernel: [1536088.045909] R13: ffff88041d010100 R14: 0000000000000400 R15: 0000000200000000
Jan 25 11:11:59 dhws122 kernel: [1536088.045910] FS:  0000000000000000(0000) GS:ffff88042dc00000(0000) knlGS:0000000000000000
Jan 25 11:11:59 dhws122 kernel: [1536088.045910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 25 11:11:59 dhws122 kernel: [1536088.045911] CR2: 00007f64da427000 CR3: 00000001970d4000 CR4: 00000000001407f0
Jan 25 11:11:59 dhws122 kernel: [1536088.045912] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 25 11:11:59 dhws122 kernel: [1536088.045913] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 25 11:11:59 dhws122 kernel: [1536088.045914] Stack:
Jan 25 11:11:59 dhws122 kernel: [1536088.045914]  ffff8802de7a5140 ffff88042df87d00 ffff88041373dcd8 ffff880307610580
Jan 25 11:11:59 dhws122 kernel: [1536088.045916]  ffff88042dc12ec0 ffff88042dc17000 0000000000000000 ffff8800bde1be60
Jan 25 11:11:59 dhws122 kernel: [1536088.045917]  ffffffff8108f0bb 000000002dc12ed8 0000000000000000 ffff88042dc12ed8
Jan 25 11:11:59 dhws122 kernel: [1536088.045918] Call Trace:
Jan 25 11:11:59 dhws122 kernel: [1536088.045922]  [<ffffffff8108f0bb>] process_one_work+0x17b/0x470
Jan 25 11:11:59 dhws122 kernel: [1536088.045923]  [<ffffffff8108fe8b>] worker_thread+0x11b/0x400
Jan 25 11:11:59 dhws122 kernel: [1536088.045925]  [<ffffffff8108fd70>] ? rescuer_thread+0x400/0x400
Jan 25 11:11:59 dhws122 kernel: [1536088.045927]  [<ffffffff8109726f>] kthread+0xcf/0xe0
Jan 25 11:11:59 dhws122 kernel: [1536088.045928]  [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Jan 25 11:11:59 dhws122 kernel: [1536088.045931]  [<ffffffff81614158>] ret_from_fork+0x58/0x90
Jan 25 11:11:59 dhws122 kernel: [1536088.045932]  [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Jan 25 11:11:59 dhws122 kernel: [1536088.045933] Code: 86 02 00 00 89 ca 48 89 d8 45 31 f6 0f 1f 80 00 00 00 00 83 ea 17 83 fa 0b 0f 87 64 02 00 00 ff 24 d5 90 c7 24 a0 0f 1f 44 00 00 <ba> 01 00 00 00 0f 1f 00 41 09 d6 ba fe ff ff ff d3 c2 48 63 d2
Jan 25 11:12:00 dhws122 kernel: [1536088.238462] nouveau E[   PFIFO][0000:01:00.0] SCHED_ERROR [ CTXSW_TIMEOUT ]
Jan 25 11:12:04 dhws122 kernel: [1536092.535003] nouveau E[   PFIFO][0000:01:00.0] SCHED_ERROR [ CTXSW_TIMEOUT ]
Jan 25 11:12:06 dhws122 kernel: [1536094.246459] INFO: rcu_sched self-detected stall on CPU { 0}  (t=60000 jiffies g=37003578 c=37003577 q=0)
Jan 25 11:12:06 dhws122 kernel: [1536094.246495] sending NMI to all CPUs:
Jan 25 11:12:06 dhws122 kernel: [1536094.246497] NMI backtrace for cpu 0
Jan 25 11:12:06 dhws122 kernel: [1536094.246499] CPU: 0 PID: 15476 Comm: kworker/0:1 Tainted: GF          O--------------   3.10.0-229.7.2.el7.x86_64 #1
Jan 25 11:12:06 dhws122 kernel: [1536094.246512] Hardware name: Hewlett-Packard HP Z230 SFF Workstation/1906, BIOS L51 v01.52 07/20/2015
Jan 25 11:12:06 dhws122 kernel: [1536094.246546] Workqueue: events nve0_fifo_recover_work [nouveau]
Jan 25 11:12:06 dhws122 kernel: [1536094.246547] task: ffff8804129896c0 ti: ffff8800bde18000 task.ti: ffff8800bde18000
Jan 25 11:12:06 dhws122 kernel: [1536094.246549] RIP: 0010:[<ffffffff812e1852>]  [<ffffffff812e1852>] __const_udelay+0x12/0x30
Jan 25 11:12:06 dhws122 kernel: [1536094.246553] RSP: 0018:ffff88042dc03de0  EFLAGS: 00000046
Jan 25 11:12:06 dhws122 kernel: [1536094.246554] RAX: 0000000001062560 RBX: 0000000000002710 RCX: 0000000000000008
Jan 25 11:12:06 dhws122 kernel: [1536094.246556] RDX: 000000000033c311 RSI: 0000000000000008 RDI: 0000000000418958
Jan 25 11:12:06 dhws122 kernel: [1536094.246557] RBP: ffff88042dc03df8 R08: 0000000000000092 R09: 00000000000003c4
Jan 25 11:12:06 dhws122 kernel: [1536094.246558] R10: 0000000000000000 R11: ffff88042dc03b06 R12: ffffffff81962680
Jan 25 11:12:06 dhws122 kernel: [1536094.246559] R13: ffffffff81962680 R14: ffff88042dc0de00 R15: 0000000000000000
Jan 25 11:12:06 dhws122 kernel: [1536094.246561] FS:  0000000000000000(0000) GS:ffff88042dc00000(0000) knlGS:0000000000000000
Jan 25 11:12:06 dhws122 kernel: [1536094.246562] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 25 11:12:06 dhws122 kernel: [1536094.246563] CR2: 00007f64da427000 CR3: 00000001970d4000 CR4: 00000000001407f0
Jan 25 11:12:06 dhws122 kernel: [1536094.246565] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 25 11:12:06 dhws122 kernel: [1536094.246566] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 25 11:12:06 dhws122 kernel: [1536094.246567] Stack:
Jan 25 11:12:06 dhws122 kernel: [1536094.246568]  ffff88042dc03df8 ffffffff81045e9a ffffffff81a21a00 ffff88042dc03e58
Jan 25 11:12:06 dhws122 kernel: [1536094.246571]  ffffffff81115753 ffff8804129896c0 0000000000000000 00057478247380ab
Jan 25 11:12:06 dhws122 kernel: [1536094.246573]  ffff8800bde18000 0000000000000000 ffff8804129896c0 0000000000000000
Jan 25 11:12:06 dhws122 kernel: [1536094.246575] Call Trace:
Jan 25 11:12:06 dhws122 kernel: [1536094.246576]  <IRQ>
Jan 25 11:12:06 dhws122 kernel: [1536094.246578]  [<ffffffff81045e9a>] ? arch_trigger_all_cpu_backtrace+0x7a/0xa0
Jan 25 11:12:06 dhws122 kernel: [1536094.246584]  [<ffffffff81115753>] rcu_check_callbacks+0x313/0x5f0
Jan 25 11:12:06 dhws122 kernel: [1536094.246588]  [<ffffffff81080f37>] update_process_times+0x47/0x80
Jan 25 11:12:06 dhws122 kernel: [1536094.246591]  [<ffffffff810d0405>] tick_sched_handle.isra.16+0x25/0x60
Jan 25 11:12:06 dhws122 kernel: [1536094.246593]  [<ffffffff810d0481>] tick_sched_timer+0x41/0x60
Jan 25 11:12:06 dhws122 kernel: [1536094.246596]  [<ffffffff8109b1a7>] __run_hrtimer+0x77/0x1d0
Jan 25 11:12:06 dhws122 kernel: [1536094.246598]  [<ffffffff810d0440>] ? tick_sched_handle.isra.16+0x60/0x60
Jan 25 11:12:06 dhws122 kernel: [1536094.246601]  [<ffffffff8109b9e7>] hrtimer_interrupt+0xf7/0x240
Jan 25 11:12:06 dhws122 kernel: [1536094.246603]  [<ffffffff810441c7>] local_apic_timer_interrupt+0x37/0x60
Jan 25 11:12:06 dhws122 kernel: [1536094.246606]  [<ffffffff8161678f>] smp_apic_timer_interrupt+0x3f/0x60
Jan 25 11:12:06 dhws122 kernel: [1536094.246609]  [<ffffffff81614e5d>] apic_timer_interrupt+0x6d/0x80
Jan 25 11:12:06 dhws122 kernel: [1536094.246610]  <EOI>
Jan 25 11:12:06 dhws122 kernel: [1536094.246615]  [<ffffffffa01ef678>] ? nve0_fifo_recover_work+0x78/0x330 [nouveau]
Jan 25 11:12:06 dhws122 kernel: [1536094.246634]  [<ffffffff8108f0bb>] process_one_work+0x17b/0x470
Jan 25 11:12:06 dhws122 kernel: [1536094.246637]  [<ffffffff8108fe8b>] worker_thread+0x11b/0x400
Jan 25 11:12:06 dhws122 kernel: [1536094.246639]  [<ffffffff8108fd70>] ? rescuer_thread+0x400/0x400
Jan 25 11:12:06 dhws122 kernel: [1536094.246641]  [<ffffffff8109726f>] kthread+0xcf/0xe0
Jan 25 11:12:06 dhws122 kernel: [1536094.246643]  [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Jan 25 11:12:06 dhws122 kernel: [1536094.246646]  [<ffffffff81614158>] ret_from_fork+0x58/0x90
Jan 25 11:12:06 dhws122 kernel: [1536094.246648]  [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Jan 25 11:12:06 dhws122 kernel: [1536094.246649] Code: 89 e5 ff 15 a9 43 6d 00 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d 04 bd 00 00 00 00 65 48 8b 14 25 28 2e 01 00 <48> 89 e5 48 69 d2 fa 00 00 00 f7 e2 48 8d 7a 01 ff 15 70 43 6d


Suspicion is that this soft lock up is caused by line 436 in 'drivers/gpu/drm/nouveau/core/engine/fifo/nve0.c' which is trying to acquire a spinlock :

 427 static void
 428 nve0_fifo_recover_work(struct work_struct *work)
 429 {
 430     struct nve0_fifo_priv *priv = container_of(work, typeof(*priv), fault);
 431     struct nouveau_object *engine;
 432     unsigned long flags;
 433     u32 engn, engm = 0;
 434     u64 mask, todo;
 435 
 436     spin_lock_irqsave(&priv->base.lock, flags);
 437     mask = priv->mask;
 438     priv->mask = 0ULL;
 439     spin_unlock_irqrestore(&priv->base.lock, flags);
 440 
 441     for (todo = mask; engn = __ffs64(todo), todo; todo &= ~(1 << engn))
 442         engm |= 1 << nve0_fifo_engidx(priv, engn);
 443     nv_mask(priv, 0x002630, engm, engm);
 444 
 445     for (todo = mask; engn = __ffs64(todo), todo; todo &= ~(1 << engn)) {
 446         if ((engine = (void *)nouveau_engine(priv, engn))) {
 447             nv_ofuncs(engine)->fini(engine, false);
 448             WARN_ON(nv_ofuncs(engine)->init(engine));
 449         }
 450         nve0_fifo_runlist_update(priv, nve0_fifo_engidx(priv, engn));
 451     }
 452 
 453     nv_wr32(priv, 0x00262c, engm);
 454     nv_mask(priv, 0x002630, engm, 0x00000000);
 455 }


These may be relevant to what the customer is seeing

https://bugs.centos.org/view.php?id=8700
https://bugs.freedesktop.org/show_bug.cgi?id=85086
https://bugs.freedesktop.org/show_bug.cgi?id=89985

Comment 2 Joe Wright 2016-01-28 17:04:04 UTC
Created attachment 1119210 [details]
sosreport