Bug 1245875

Summary: nouveau PGRAPH TLB flush idle timeout fail (F22)
Product: [Fedora] Fedora Reporter: Matt Domsch <matt_domsch>
Component: xorg-x11-drv-nouveauAssignee: Ben Skeggs <bskeggs>
Status: CLOSED CANTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 22CC: airlied, ajax, bskeggs, eugenemah
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-12 02:35:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kernel log messages none

Description Matt Domsch 2015-07-23 03:30:02 UTC
Created attachment 1055143 [details]
kernel log messages

Description of problem:
System locks up (console unresponsive) with the following errors being reported by the kernel repeatedly:

[  441.989044] nouveau E[     PGR][0000:01:00.0] vm flush timeout
[  443.995127] nouveau E[     PGR][0000:01:00.0] PGRAPH TLB flush idle timeout fail
[  444.002432] nouveau E[     PGR][0000:01:00.0] PGRAPH_STATUS  : 0x00be0001 BUSY ENG2D RMASK TPC_RAST TPC_PROP TPC_TEX TPC_MP
[  444.013495] nouveau E[     PGR][0000:01:00.0] PGRAPH_VSTATUS0: 0x00000000
[  444.020206] nouveau E[     PGR][0000:01:00.0] PGRAPH_VSTATUS1: 0x0000106d TPC_TEX TPC_MP
[  444.028224] nouveau E[     PGR][0000:01:00.0] PGRAPH_VSTATUS2: 0x00148000 ENG2D
[  446.035627] nouveau E[     PGR][0000:01:00.0] vm flush timeout
[  448.041675] nouveau E[     PGR][0000:01:00.0] PGRAPH TLB flush idle timeout fail
[  448.048978] nouveau E[     PGR][0000:01:00.0] PGRAPH_STATUS  : 0x00be0001 BUSY ENG2D RMASK TPC_RAST TPC_PROP TPC_TEX TPC_MP
[  448.060041] nouveau E[     PGR][0000:01:00.0] PGRAPH_VSTATUS0: 0x00000000
[  448.066749] nouveau E[     PGR][0000:01:00.0] PGRAPH_VSTATUS1: 0x0000106d TPC_TEX TPC_MP
[  448.074763] nouveau E[     PGR][0000:01:00.0] PGRAPH_VSTATUS2: 0x00148000 ENG2D

Version-Release number of selected component (if applicable):
Fedora release 22 (Twenty Two)
Kernel 4.0.4-303.fc22.x86_64 on an x86_64 (ttyS0)
xorg-x11-drv-nouveau-1.0.11-2.fc22.x86_64

How reproducible:
trivial with this hardware and Fedora 22.  I had stopped using Gnome about Fedora 18 due to similar lockups; KDE was fine. With Fedora 22, I had to stop using KDE and instead use LXDE.  After doing a dnf upgrade yesterday to newest Fedora 22 packages, this failure happens within a few minutes of booting even with LXDE.

Steps to Reproduce:
1. Boot Fedora 22 with this graphics hardware and nouveau driver / xorg, LXDE desktop
2. Start a web browser
3. crash imminent

Actual results:
system lockup

Expected results:
no lockup

Additional info:
Kernel messages related to nouveau at bootup:

[    3.729680] nouveau  [  DEVICE][0000:01:00.0] BOOT0  : 0x0a3000a2
[    3.738835] nouveau  [  DEVICE][0000:01:00.0] Chipset: GT215 (NVA3)
[    3.748288] nouveau  [  DEVICE][0000:01:00.0] Family : NV50
[    3.880804] nouveau  [   VBIOS][0000:01:00.0] using image from PRAMIN
[    3.887635] nouveau  [   VBIOS][0000:01:00.0] BIT signature found
[    3.893972] nouveau  [   VBIOS][0000:01:00.0] version 70.15.32.00.00
[    3.920937] nouveau  [     PMC][0000:01:00.0] MSI interrupts enabled
[    3.920962] nouveau  [     PFB][0000:01:00.0] RAM type: DDR3
[    3.920963] nouveau  [     PFB][0000:01:00.0] RAM size: 1024 MiB
[    3.920964] nouveau  [     PFB][0000:01:00.0]    ZCOMP: 2048 tags
[    3.923543] nouveau  [    VOLT][0000:01:00.0] GPU voltage: 900000uv
[    3.951128] nouveau  [  PTHERM][0000:01:00.0] FAN control: PWM
[    3.951138] nouveau  [  PTHERM][0000:01:00.0] fan management: automatic
[    3.951156] nouveau  [  PTHERM][0000:01:00.0] internal sensor: yes
[    3.971188] nouveau  [     CLK][0000:01:00.0] 03: core 135 MHz shader 270 MHz memory 135 MHz
[    3.971191] nouveau  [     CLK][0000:01:00.0] 07: core 405 MHz shader 810 MHz memory 324 MHz
[    3.971193] nouveau  [     CLK][0000:01:00.0] 0f: core 550 MHz shader 1340 MHz memory 790 MHz
[    3.971214] nouveau  [     CLK][0000:01:00.0] --: core 405 MHz shader 810 MHz memory 324 MHz
[    3.971398] nouveau  [     DRM] VRAM: 1024 MiB
[    3.971398] nouveau  [     DRM] GART: 1048576 MiB
[    3.971401] nouveau  [     DRM] TMDS table version 2.0
[    3.971402] nouveau  [     DRM] DCB version 4.0
[    3.971403] nouveau  [     DRM] DCB outp 00: 01000302 00020030
[    3.971404] nouveau  [     DRM] DCB outp 01: 02000300 00000000
[    3.971405] nouveau  [     DRM] DCB outp 02: 040113b6 0f220010
[    3.971405] nouveau  [     DRM] DCB outp 03: 04011372 00020010
[    3.971407] nouveau  [     DRM] DCB conn 00: 00001030
[    3.971407] nouveau  [     DRM] DCB conn 01: 00202146
[    4.043294] nouveau  [     DRM] MM: using COPY for buffer copies
[    4.146305] nouveau  [     DRM] allocated 1920x1200 fb: 0x70000, bo ffff880409716c00
[    4.161841] fbcon: nouveaufb (fb0) is primary device
[    4.333332] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device
[    4.333333] nouveau 0000:01:00.0: registered panic notifier
[    4.350656] [drm] Initialized nouveau 1.2.1 20120801 for 0000:01:00.0 on minor 0

Comment 1 Matt Domsch 2015-07-23 03:36:19 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=754882 asks for installed package versions of other apps.  I include those here:

libdrm-2.4.61-3.fc22.x86_64
libdrm-2.4.61-3.fc22.i686
mesa-dri-drivers-10.6.1-1.20150629.fc22.x86_64
mesa-dri-drivers-10.6.1-1.20150629.fc22.i686

Comment 2 Matt Domsch 2015-07-24 01:22:23 UTC
Ben had tried to fix this in this upstream commit:
commit 464d636bd0a7a905209816d1dee0838ccb79e57a
Author: Ben Skeggs <bskeggs>
Date:   Mon May 13 20:55:46 2013 +1000

    drm/nv50/vm: remove explicit vm knowledge from engines
    
    This reverses the lock ordering between VM and gr/nv84:nvc0.
    
    Signed-off-by: Ben Skeggs <bskeggs>


With reference to 
/* unfortunate hw bug workaround... */

Not sure what else to do here but perhaps buy a different video card that doesn't crash?

Comment 3 Matt Domsch 2015-07-25 03:58:45 UTC
Running google-chrome with --disable-gpu seems to help reduce crashes.  It hasn't crashed in several hours of using the system today at least.

Comment 4 Eugene Mah 2015-08-04 17:36:18 UTC
Have been running into this issue on my system as well. never happens when I'm working at the computer. I leave the computer for a while, come back to a frozen screensaver and unresponsive to mouse or keyboard.

I see
Aug  4 11:36:12 tungsten kernel: nouveau E[  PGRAPH][0000:09:00.0] PGRAPH TLB flush idle timeout fail
Aug  4 11:36:12 tungsten kernel: nouveau E[  PGRAPH][0000:09:00.0] PGRAPH_STATUS  : 0x01982703 BUSY DISPATCH CTXPROG VFETCH CCACHE_PREGEOM RATTR_APLANE TPC_RAST TPC_PROP TPC_MP ROP
Aug  4 11:36:12 tungsten kernel: nouveau E[  PGRAPH][0000:09:00.0] PGRAPH_VSTATUS0: 0x0000000d CCACHE
Aug  4 11:36:12 tungsten kernel: nouveau E[  PGRAPH][0000:09:00.0] PGRAPH_VSTATUS1: 0x0000102d TPC_MP
Aug  4 11:36:12 tungsten kernel: nouveau E[  PGRAPH][0000:09:00.0] PGRAPH_VSTATUS2: 0x00200028 ROP

repeated about every two seconds in /var/log/messages

From dmesg
[    2.791703] nouveau  [  DEVICE][0000:0a:00.0] BOOT0  : 0x298c00a2
[    2.791709] nouveau  [  DEVICE][0000:0a:00.0] Chipset: G98 (NV98)
[    2.791712] nouveau  [  DEVICE][0000:0a:00.0] Family : NV50
[    2.908645] nouveau  [   VBIOS][0000:0a:00.0] using image from PROM
[    2.908802] nouveau  [   VBIOS][0000:0a:00.0] BIT signature found
[    2.908806] nouveau  [   VBIOS][0000:0a:00.0] version 62.98.6f.00.07
[    2.909268] nouveau  [ DEVINIT][0000:0a:00.0] adaptor not initialised
[    2.909276] nouveau  [   VBIOS][0000:0a:00.0] running init tables
[    2.962965] nouveau  [     PMC][0000:0a:00.0] MSI interrupts enabled
[    2.963028] nouveau  [     PFB][0000:0a:00.0] RAM type: GDDR3
[    2.963031] nouveau  [     PFB][0000:0a:00.0] RAM size: 256 MiB
[    2.963034] nouveau  [     PFB][0000:0a:00.0]    ZCOMP: 960 tags
[    3.605032] nouveau  [  PTHERM][0000:0a:00.0] FAN control: none / external
[    3.605054] nouveau  [  PTHERM][0000:0a:00.0] fan management: automatic
[    3.605061] nouveau  [  PTHERM][0000:0a:00.0] internal sensor: yes
[    3.625085] nouveau  [     CLK][0000:0a:00.0] 03: core 169 MHz shader 358 MHz memory 100 MHz
[    3.625091] nouveau  [     CLK][0000:0a:00.0] 0f: core 550 MHz shader 1400 MHz memory 700 MHz
[    3.625160] nouveau  [     CLK][0000:0a:00.0] --: core 550 MHz shader 1400 MHz memory 702 MHz
[    3.625371] nouveau  [     DRM] VRAM: 256 MiB
[    3.625374] nouveau  [     DRM] GART: 1048576 MiB
[    3.625380] nouveau  [     DRM] TMDS table version 2.0
[    3.625383] nouveau  [     DRM] DCB version 4.0
[    3.625387] nouveau  [     DRM] DCB outp 00: 02000386 0f220010
[    3.625391] nouveau  [     DRM] DCB outp 01: 02000302 00020010
[    3.625394] nouveau  [     DRM] DCB outp 02: 040113a6 0f220010
[    3.625397] nouveau  [     DRM] DCB outp 03: 04011312 00020010
[    3.625400] nouveau  [     DRM] DCB conn 00: 00005046
[    3.625404] nouveau  [     DRM] DCB conn 01: 00006146
[    3.680555] nouveau  [     DRM] MM: using M2MF for buffer copies
[    3.680577] [drm] Initialized nouveau 1.2.2 20120801 for 0000:0a:00.0 on minor 1

Currently running Fedora 22
Kernel: 4.1.3-200.fc22.x86_64
xorg-x11-drv-nouveau 1.0.11-2.fc22
libdrm.i686 2.4.61-3.fc22
libdrm.x86_64 2.4.61-3.fc22
mesa-dri-drivers.x86_64 10.6.3-1.20150729.fc22
xorg-x11-server-Xorg.x86_64 1.17.2-2.fc22

Comment 5 Matt Domsch 2015-08-19 17:07:16 UTC
I gave up and replaced my video card with an ATI Radeon HD 5450. Bye bye nouveau.