Bug 1302816

Summary: possible spinlock issue in nouveau on quadro k420
Product: Red Hat Enterprise Linux 7 Reporter: Joe Wright <jwright>
Component: xorg-x11-drv-nouveauAssignee: Ben Skeggs <bskeggs>
Status: CLOSED WONTFIX QA Contact: Desktop QE <desktop-qa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.2CC: cww, tpelka
Target Milestone: rcKeywords: Desktop
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-01 16:53:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sosreport none

Description Joe Wright 2016-01-28 17:00:28 UTC
Description of problem:
-possible spinlock issue on quadro k420 in nouveau

Version-Release number of selected component (if applicable):
- RHEL 7.1
- xorg-x11-drv-nouveau-1.0.10-5.el7.x86_64  
- libdrm-2.4.56-2.el7.x86_64 

How reproducible:
- intermittent

Steps to Reproduce:
1. unsure, happens occasionally
2.
3.

Actual results:
- softlockup on what appears to be a spinlock issue

Expected results:


Additional info:

01:00.0 VGA compatible controller: NVIDIA Corporation GK107GL [Quadro K420] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GK107 HDMI Audio Controller (rev a1)


Jan 25 11:11:06 dhws122 kernel: [1536034.220466] nouveau E[   PFIFO][0000:01:00.0] write fault at 0x0005924000 [PTE] from CE2/GR_COPY on channel 0x003fa31000 [Xorg[18982]]
Jan 25 11:11:06 dhws122 kernel: [1536034.220471] nouveau E[   PFIFO][0000:01:00.0] PCE2 engine fault on channel 2, recovering...
Jan 25 11:11:21 dhws122 kernel: [1536049.624322] nouveau E[     DRM] GPU lockup - switching to software fbcon
Jan 25 11:11:31 dhws122 kernel: [1536060.033767] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1:15476]
Jan 25 11:11:31 dhws122 kernel: [1536060.033800] Modules linked in: binfmt_misc bnep bluetooth fuse mvfs(OF) nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd sunrpc fscache intel_powerclamp coretemp snd_hda_codec_hdmi hp_wmi snd_hda_codec_realtek sparse_keymap snd_hda_codec_generic intel_rapl kvm snd_hda_intel crct10dif_pclmul crc32_pclmul snd_hda_controller crc32c_intel ghash_clmulni_intel snd_hda_codec aesni_intel snd_hwdep lrw gf128mul glue_helper ablk_helper cryptd snd_seq snd_seq_device iTCO_wdt snd_pcm iTCO_vendor_support rfkill ppdev serio_raw pcspkr i2c_i801 lpc_ich mei_me snd_timer mfd_core mei snd soundcore shpchp parport_pc parport tpm_infineon uinput ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_common nouveau video mxm_wmi i2c_algo_bit drm_kms_helper ttm ahci e1000e libahci drm libata ptp pps_core i2c_core wmi dm_mirror dm_region_hash dm_log dm_mod
Jan 25 11:11:31 dhws122 kernel: [1536060.033850] CPU: 0 PID: 15476 Comm: kworker/0:1 Tainted: GF          O--------------   3.10.0-229.7.2.el7.x86_64 #1
Jan 25 11:11:31 dhws122 kernel: [1536060.033851] Hardware name: Hewlett-Packard HP Z230 SFF Workstation/1906, BIOS L51 v01.52 07/20/2015
Jan 25 11:11:31 dhws122 kernel: [1536060.033877] Workqueue: events nve0_fifo_recover_work [nouveau]
Jan 25 11:11:31 dhws122 kernel: [1536060.033878] task: ffff8804129896c0 ti: ffff8800bde18000 task.ti: ffff8800bde18000
Jan 25 11:11:31 dhws122 kernel: [1536060.033879] RIP: 0010:[<ffffffffa01ef688>]  [<ffffffffa01ef688>] nve0_fifo_recover_work+0x88/0x330 [nouveau]
Jan 25 11:11:31 dhws122 kernel: [1536060.033896] RSP: 0018:ffff8800bde1bde0  EFLAGS: 00000202
Jan 25 11:11:31 dhws122 kernel: [1536060.033897] RAX: 0000000100000000 RBX: ffff8800bde1bd70 RCX: 0000000000000020
Jan 25 11:11:31 dhws122 kernel: [1536060.033898] RDX: 00000000fffffffe RSI: 0000000000000282 RDI: 0000000000000282
Jan 25 11:11:31 dhws122 kernel: [1536060.033899] RBP: ffff8800bde1be18 R08: 0000000000000282 R09: dfef3a34a013dcd8
Jan 25 11:11:31 dhws122 kernel: [1536060.033899] R10: dfef3a34a013dcd8 R11: 0000000000000293 R12: ffff8800bde1bd60
Jan 25 11:11:31 dhws122 kernel: [1536060.033900] R13: ffff88041d010100 R14: 0000000000000400 R15: 0000000200000000
Jan 25 11:11:31 dhws122 kernel: [1536060.033901] FS:  0000000000000000(0000) GS:ffff88042dc00000(0000) knlGS:0000000000000000
Jan 25 11:11:31 dhws122 kernel: [1536060.033902] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 25 11:11:31 dhws122 kernel: [1536060.033903] CR2: 00007f64da427000 CR3: 00000001970d4000 CR4: 00000000001407f0
Jan 25 11:11:31 dhws122 kernel: [1536060.033904] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 25 11:11:31 dhws122 kernel: [1536060.033905] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 25 11:11:31 dhws122 kernel: [1536060.033905] Stack:
Jan 25 11:11:31 dhws122 kernel: [1536060.033906]  ffff8802de7a5140 ffff88042df87d00 ffff88041373dcd8 ffff880307610580
Jan 25 11:11:31 dhws122 kernel: [1536060.033908]  ffff88042dc12ec0 ffff88042dc17000 0000000000000000 ffff8800bde1be60
Jan 25 11:11:31 dhws122 kernel: [1536060.033910]  ffffffff8108f0bb 000000002dc12ed8 0000000000000000 ffff88042dc12ed8
Jan 25 11:11:31 dhws122 kernel: [1536060.033912] Call Trace:
Jan 25 11:11:31 dhws122 kernel: [1536060.033915]  [<ffffffff8108f0bb>] process_one_work+0x17b/0x470
Jan 25 11:11:31 dhws122 kernel: [1536060.033917]  [<ffffffff8108fe8b>] worker_thread+0x11b/0x400
Jan 25 11:11:31 dhws122 kernel: [1536060.033919]  [<ffffffff8108fd70>] ? rescuer_thread+0x400/0x400
Jan 25 11:11:31 dhws122 kernel: [1536060.033921]  [<ffffffff8109726f>] kthread+0xcf/0xe0
Jan 25 11:11:31 dhws122 kernel: [1536060.033922]  [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Jan 25 11:11:31 dhws122 kernel: [1536060.033925]  [<ffffffff81614158>] ret_from_fork+0x58/0x90
Jan 25 11:11:31 dhws122 kernel: [1536060.033927]  [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Jan 25 11:11:31 dhws122 kernel: [1536060.033927] Code: 00 00 00 83 ea 17 83 fa 0b 0f 87 64 02 00 00 ff 24 d5 90 c7 24 a0 0f 1f 44 00 00 ba 01 00 00 00 0f 1f 00 41 09 d6 ba fe ff ff ff <d3> c2 48 63 d2 48 21 d0 f3 48 0f bc c8 48 85 c0 89 ca 75 c4 49
Jan 25 11:11:36 dhws122 kernel: [1536064.638747] nouveau E[Xorg[18982]] failed to idle channel 0xcccc0001 [Xorg[18982]]
Jan 25 11:11:51 dhws122 kernel: [1536079.645191] nouveau E[Xorg[18982]] failed to idle channel 0xcccc0001 [Xorg[18982]]
Jan 25 11:11:53 dhws122 kernel: [1536081.646068] nouveau E[   PFIFO][0000:01:00.0] runlist 0 update timeout
Jan 25 11:11:55 dhws122 kernel: [1536083.941920] nouveau E[   PFIFO][0000:01:00.0] SCHED_ERROR [ CTXSW_TIMEOUT ]
Jan 25 11:11:58 dhws122 /etc/gdm/Xsession[19164]: (abrt:19922): libnotify-WARNING **: Failed to connect to proxy
Jan 25 11:11:58 dhws122 /etc/gdm/Xsession[19164]: abrt-applet: Failed to receive server caps
Jan 25 11:11:59 dhws122 kernel: [1536088.045796] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1:15476]
Jan 25 11:11:59 dhws122 kernel: [1536088.045823] Modules linked in: binfmt_misc bnep bluetooth fuse mvfs(OF) nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd sunrpc fscache intel_powerclamp coretemp snd_hda_codec_hdmi hp_wmi snd_hda_codec_realtek sparse_keymap snd_hda_codec_generic intel_rapl kvm snd_hda_intel crct10dif_pclmul crc32_pclmul snd_hda_controller crc32c_intel ghash_clmulni_intel snd_hda_codec aesni_intel snd_hwdep lrw gf128mul glue_helper ablk_helper cryptd snd_seq snd_seq_device iTCO_wdt snd_pcm iTCO_vendor_support rfkill ppdev serio_raw pcspkr i2c_i801 lpc_ich mei_me snd_timer mfd_core mei snd soundcore shpchp parport_pc parport tpm_infineon uinput ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_common nouveau video mxm_wmi i2c_algo_bit drm_kms_helper ttm ahci e1000e libahci drm libata ptp pps_core i2c_core wmi dm_mirror dm_region_hash dm_log dm_mod
Jan 25 11:11:59 dhws122 kernel: [1536088.045868] CPU: 0 PID: 15476 Comm: kworker/0:1 Tainted: GF          O--------------   3.10.0-229.7.2.el7.x86_64 #1
Jan 25 11:11:59 dhws122 kernel: [1536088.045869] Hardware name: Hewlett-Packard HP Z230 SFF Workstation/1906, BIOS L51 v01.52 07/20/2015
Jan 25 11:11:59 dhws122 kernel: [1536088.045889] Workqueue: events nve0_fifo_recover_work [nouveau]
Jan 25 11:11:59 dhws122 kernel: [1536088.045890] task: ffff8804129896c0 ti: ffff8800bde18000 task.ti: ffff8800bde18000
Jan 25 11:11:59 dhws122 kernel: [1536088.045891] RIP: 0010:[<ffffffffa01ef678>]  [<ffffffffa01ef678>] nve0_fifo_recover_work+0x78/0x330 [nouveau]
Jan 25 11:11:59 dhws122 kernel: [1536088.045905] RSP: 0018:ffff8800bde1bde0  EFLAGS: 00000293
Jan 25 11:11:59 dhws122 kernel: [1536088.045906] RAX: 0000000100000000 RBX: ffff8800bde1bd70 RCX: 0000000000000020
Jan 25 11:11:59 dhws122 kernel: [1536088.045907] RDX: 0000000000000009 RSI: 0000000000000282 RDI: 0000000000000282
Jan 25 11:11:59 dhws122 kernel: [1536088.045907] RBP: ffff8800bde1be18 R08: 0000000000000282 R09: dfef3a34a013dcd8
Jan 25 11:11:59 dhws122 kernel: [1536088.045908] R10: dfef3a34a013dcd8 R11: 0000000000000293 R12: ffff8800bde1bd60
Jan 25 11:11:59 dhws122 kernel: [1536088.045909] R13: ffff88041d010100 R14: 0000000000000400 R15: 0000000200000000
Jan 25 11:11:59 dhws122 kernel: [1536088.045910] FS:  0000000000000000(0000) GS:ffff88042dc00000(0000) knlGS:0000000000000000
Jan 25 11:11:59 dhws122 kernel: [1536088.045910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 25 11:11:59 dhws122 kernel: [1536088.045911] CR2: 00007f64da427000 CR3: 00000001970d4000 CR4: 00000000001407f0
Jan 25 11:11:59 dhws122 kernel: [1536088.045912] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 25 11:11:59 dhws122 kernel: [1536088.045913] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 25 11:11:59 dhws122 kernel: [1536088.045914] Stack:
Jan 25 11:11:59 dhws122 kernel: [1536088.045914]  ffff8802de7a5140 ffff88042df87d00 ffff88041373dcd8 ffff880307610580
Jan 25 11:11:59 dhws122 kernel: [1536088.045916]  ffff88042dc12ec0 ffff88042dc17000 0000000000000000 ffff8800bde1be60
Jan 25 11:11:59 dhws122 kernel: [1536088.045917]  ffffffff8108f0bb 000000002dc12ed8 0000000000000000 ffff88042dc12ed8
Jan 25 11:11:59 dhws122 kernel: [1536088.045918] Call Trace:
Jan 25 11:11:59 dhws122 kernel: [1536088.045922]  [<ffffffff8108f0bb>] process_one_work+0x17b/0x470
Jan 25 11:11:59 dhws122 kernel: [1536088.045923]  [<ffffffff8108fe8b>] worker_thread+0x11b/0x400
Jan 25 11:11:59 dhws122 kernel: [1536088.045925]  [<ffffffff8108fd70>] ? rescuer_thread+0x400/0x400
Jan 25 11:11:59 dhws122 kernel: [1536088.045927]  [<ffffffff8109726f>] kthread+0xcf/0xe0
Jan 25 11:11:59 dhws122 kernel: [1536088.045928]  [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Jan 25 11:11:59 dhws122 kernel: [1536088.045931]  [<ffffffff81614158>] ret_from_fork+0x58/0x90
Jan 25 11:11:59 dhws122 kernel: [1536088.045932]  [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Jan 25 11:11:59 dhws122 kernel: [1536088.045933] Code: 86 02 00 00 89 ca 48 89 d8 45 31 f6 0f 1f 80 00 00 00 00 83 ea 17 83 fa 0b 0f 87 64 02 00 00 ff 24 d5 90 c7 24 a0 0f 1f 44 00 00 <ba> 01 00 00 00 0f 1f 00 41 09 d6 ba fe ff ff ff d3 c2 48 63 d2
Jan 25 11:12:00 dhws122 kernel: [1536088.238462] nouveau E[   PFIFO][0000:01:00.0] SCHED_ERROR [ CTXSW_TIMEOUT ]
Jan 25 11:12:04 dhws122 kernel: [1536092.535003] nouveau E[   PFIFO][0000:01:00.0] SCHED_ERROR [ CTXSW_TIMEOUT ]
Jan 25 11:12:06 dhws122 kernel: [1536094.246459] INFO: rcu_sched self-detected stall on CPU { 0}  (t=60000 jiffies g=37003578 c=37003577 q=0)
Jan 25 11:12:06 dhws122 kernel: [1536094.246495] sending NMI to all CPUs:
Jan 25 11:12:06 dhws122 kernel: [1536094.246497] NMI backtrace for cpu 0
Jan 25 11:12:06 dhws122 kernel: [1536094.246499] CPU: 0 PID: 15476 Comm: kworker/0:1 Tainted: GF          O--------------   3.10.0-229.7.2.el7.x86_64 #1
Jan 25 11:12:06 dhws122 kernel: [1536094.246512] Hardware name: Hewlett-Packard HP Z230 SFF Workstation/1906, BIOS L51 v01.52 07/20/2015
Jan 25 11:12:06 dhws122 kernel: [1536094.246546] Workqueue: events nve0_fifo_recover_work [nouveau]
Jan 25 11:12:06 dhws122 kernel: [1536094.246547] task: ffff8804129896c0 ti: ffff8800bde18000 task.ti: ffff8800bde18000
Jan 25 11:12:06 dhws122 kernel: [1536094.246549] RIP: 0010:[<ffffffff812e1852>]  [<ffffffff812e1852>] __const_udelay+0x12/0x30
Jan 25 11:12:06 dhws122 kernel: [1536094.246553] RSP: 0018:ffff88042dc03de0  EFLAGS: 00000046
Jan 25 11:12:06 dhws122 kernel: [1536094.246554] RAX: 0000000001062560 RBX: 0000000000002710 RCX: 0000000000000008
Jan 25 11:12:06 dhws122 kernel: [1536094.246556] RDX: 000000000033c311 RSI: 0000000000000008 RDI: 0000000000418958
Jan 25 11:12:06 dhws122 kernel: [1536094.246557] RBP: ffff88042dc03df8 R08: 0000000000000092 R09: 00000000000003c4
Jan 25 11:12:06 dhws122 kernel: [1536094.246558] R10: 0000000000000000 R11: ffff88042dc03b06 R12: ffffffff81962680
Jan 25 11:12:06 dhws122 kernel: [1536094.246559] R13: ffffffff81962680 R14: ffff88042dc0de00 R15: 0000000000000000
Jan 25 11:12:06 dhws122 kernel: [1536094.246561] FS:  0000000000000000(0000) GS:ffff88042dc00000(0000) knlGS:0000000000000000
Jan 25 11:12:06 dhws122 kernel: [1536094.246562] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 25 11:12:06 dhws122 kernel: [1536094.246563] CR2: 00007f64da427000 CR3: 00000001970d4000 CR4: 00000000001407f0
Jan 25 11:12:06 dhws122 kernel: [1536094.246565] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 25 11:12:06 dhws122 kernel: [1536094.246566] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 25 11:12:06 dhws122 kernel: [1536094.246567] Stack:
Jan 25 11:12:06 dhws122 kernel: [1536094.246568]  ffff88042dc03df8 ffffffff81045e9a ffffffff81a21a00 ffff88042dc03e58
Jan 25 11:12:06 dhws122 kernel: [1536094.246571]  ffffffff81115753 ffff8804129896c0 0000000000000000 00057478247380ab
Jan 25 11:12:06 dhws122 kernel: [1536094.246573]  ffff8800bde18000 0000000000000000 ffff8804129896c0 0000000000000000
Jan 25 11:12:06 dhws122 kernel: [1536094.246575] Call Trace:
Jan 25 11:12:06 dhws122 kernel: [1536094.246576]  <IRQ>
Jan 25 11:12:06 dhws122 kernel: [1536094.246578]  [<ffffffff81045e9a>] ? arch_trigger_all_cpu_backtrace+0x7a/0xa0
Jan 25 11:12:06 dhws122 kernel: [1536094.246584]  [<ffffffff81115753>] rcu_check_callbacks+0x313/0x5f0
Jan 25 11:12:06 dhws122 kernel: [1536094.246588]  [<ffffffff81080f37>] update_process_times+0x47/0x80
Jan 25 11:12:06 dhws122 kernel: [1536094.246591]  [<ffffffff810d0405>] tick_sched_handle.isra.16+0x25/0x60
Jan 25 11:12:06 dhws122 kernel: [1536094.246593]  [<ffffffff810d0481>] tick_sched_timer+0x41/0x60
Jan 25 11:12:06 dhws122 kernel: [1536094.246596]  [<ffffffff8109b1a7>] __run_hrtimer+0x77/0x1d0
Jan 25 11:12:06 dhws122 kernel: [1536094.246598]  [<ffffffff810d0440>] ? tick_sched_handle.isra.16+0x60/0x60
Jan 25 11:12:06 dhws122 kernel: [1536094.246601]  [<ffffffff8109b9e7>] hrtimer_interrupt+0xf7/0x240
Jan 25 11:12:06 dhws122 kernel: [1536094.246603]  [<ffffffff810441c7>] local_apic_timer_interrupt+0x37/0x60
Jan 25 11:12:06 dhws122 kernel: [1536094.246606]  [<ffffffff8161678f>] smp_apic_timer_interrupt+0x3f/0x60
Jan 25 11:12:06 dhws122 kernel: [1536094.246609]  [<ffffffff81614e5d>] apic_timer_interrupt+0x6d/0x80
Jan 25 11:12:06 dhws122 kernel: [1536094.246610]  <EOI>
Jan 25 11:12:06 dhws122 kernel: [1536094.246615]  [<ffffffffa01ef678>] ? nve0_fifo_recover_work+0x78/0x330 [nouveau]
Jan 25 11:12:06 dhws122 kernel: [1536094.246634]  [<ffffffff8108f0bb>] process_one_work+0x17b/0x470
Jan 25 11:12:06 dhws122 kernel: [1536094.246637]  [<ffffffff8108fe8b>] worker_thread+0x11b/0x400
Jan 25 11:12:06 dhws122 kernel: [1536094.246639]  [<ffffffff8108fd70>] ? rescuer_thread+0x400/0x400
Jan 25 11:12:06 dhws122 kernel: [1536094.246641]  [<ffffffff8109726f>] kthread+0xcf/0xe0
Jan 25 11:12:06 dhws122 kernel: [1536094.246643]  [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Jan 25 11:12:06 dhws122 kernel: [1536094.246646]  [<ffffffff81614158>] ret_from_fork+0x58/0x90
Jan 25 11:12:06 dhws122 kernel: [1536094.246648]  [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Jan 25 11:12:06 dhws122 kernel: [1536094.246649] Code: 89 e5 ff 15 a9 43 6d 00 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d 04 bd 00 00 00 00 65 48 8b 14 25 28 2e 01 00 <48> 89 e5 48 69 d2 fa 00 00 00 f7 e2 48 8d 7a 01 ff 15 70 43 6d


Suspicion is that this soft lock up is caused by line 436 in 'drivers/gpu/drm/nouveau/core/engine/fifo/nve0.c' which is trying to acquire a spinlock :

 427 static void
 428 nve0_fifo_recover_work(struct work_struct *work)
 429 {
 430     struct nve0_fifo_priv *priv = container_of(work, typeof(*priv), fault);
 431     struct nouveau_object *engine;
 432     unsigned long flags;
 433     u32 engn, engm = 0;
 434     u64 mask, todo;
 435 
 436     spin_lock_irqsave(&priv->base.lock, flags);
 437     mask = priv->mask;
 438     priv->mask = 0ULL;
 439     spin_unlock_irqrestore(&priv->base.lock, flags);
 440 
 441     for (todo = mask; engn = __ffs64(todo), todo; todo &= ~(1 << engn))
 442         engm |= 1 << nve0_fifo_engidx(priv, engn);
 443     nv_mask(priv, 0x002630, engm, engm);
 444 
 445     for (todo = mask; engn = __ffs64(todo), todo; todo &= ~(1 << engn)) {
 446         if ((engine = (void *)nouveau_engine(priv, engn))) {
 447             nv_ofuncs(engine)->fini(engine, false);
 448             WARN_ON(nv_ofuncs(engine)->init(engine));
 449         }
 450         nve0_fifo_runlist_update(priv, nve0_fifo_engidx(priv, engn));
 451     }
 452 
 453     nv_wr32(priv, 0x00262c, engm);
 454     nv_mask(priv, 0x002630, engm, 0x00000000);
 455 }


These may be relevant to what the customer is seeing

https://bugs.centos.org/view.php?id=8700
https://bugs.freedesktop.org/show_bug.cgi?id=85086
https://bugs.freedesktop.org/show_bug.cgi?id=89985

Comment 2 Joe Wright 2016-01-28 17:04:04 UTC
Created attachment 1119210 [details]
sosreport