Bug 1564759 - Display freeze caused by nouveau
Summary: Display freeze caused by nouveau
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-nouveau
Version: 31
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Ben Skeggs
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-07 13:12 UTC by Stefano Biagiotti
Modified: 2021-11-05 04:13 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-24 15:26:14 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
journalctl -k -b -1 --no-pager --no-hostname (97.81 KB, text/plain)
2018-04-07 13:12 UTC, Stefano Biagiotti
no flags Details
journalctl -k -b -1 --no-pager --no-hostname (98.55 KB, text/plain)
2018-04-15 12:22 UTC, Stefano Biagiotti
no flags Details
Output of journalctl -k -b -1 --no-pager --no-hostname (175.97 KB, text/plain)
2018-04-21 07:50 UTC, Stefano Biagiotti
no flags Details
Output of journalctl -k -b -3 --no-pager --no-hostname (375.99 KB, text/plain)
2018-07-23 22:19 UTC, Stefano Biagiotti
no flags Details
Output of journalctl -k -b -1 --no-pager --no-hostname (501.91 KB, text/plain)
2019-11-10 18:07 UTC, Stefano Biagiotti
no flags Details
dmesg log showing call trace after screen freeze (98.73 KB, text/plain)
2020-02-18 20:22 UTC, Andrew M. Shooman
no flags Details
Log from /var/crash (86.55 KB, text/plain)
2020-05-05 16:06 UTC, Stefano Biagiotti
no flags Details


Links
System ID Private Priority Status Summary Last Updated
FreeDesktop.org 105940 0 'medium' 'RESOLVED' 'Display freeze caused by nouveau' 2019-12-09 08:39:27 UTC
Linux Kernel 173041 0 None None None 2019-10-21 21:07:54 UTC

Description Stefano Biagiotti 2018-04-07 13:12:41 UTC
Created attachment 1418582 [details]
journalctl -k -b -1 --no-pager --no-hostname

Description of problem:
Display often freezes unpredictably.

Version-Release number of selected component (if applicable):
xorg-x11-drv-nouveau-1.0.15-3.fc27.x86_64
kernel-4.15.14-300.fc27.x86_64

How reproducible:
It happens randomly. Most frequently when viewing videos.

Actual results:
Display is frozen.

Expected results:
Display doesn't freeze. :-)

Additional info:
While the display is frozen, mouse and keyboard don't work, but system is still pingable and I can login using ssh from another PC.

I have two monitors connected to the display adapter. Hardware is (from lspci):
01:00.0 VGA compatible controller: NVIDIA Corporation GT215 [GeForce GT 320] (rev a2)

Excerpt of journalctl (full journalctl in attachment):

apr 07 12:51:35 kernel: nouveau 0000:01:00.0: gr: PGRAPH TLB flush idle timeout fail
apr 07 12:51:35 kernel: nouveau 0000:01:00.0: gr: PGRAPH_STATUS 00be0003 [BUSY DISPATCH ENG2D RMASK TPC_RAST TPC_PROP TPC_TEX TPC_MP]
apr 07 12:51:35 kernel: nouveau 0000:01:00.0: gr: PGRAPH_VSTATUS0: 00000000 []
apr 07 12:51:35 kernel: nouveau 0000:01:00.0: gr: PGRAPH_VSTATUS1: 0000106d [TPC_TEX TPC_MP]
apr 07 12:51:35 kernel: nouveau 0000:01:00.0: gr: PGRAPH_VSTATUS2: 00148000 [ENG2D]
apr 07 12:51:35 kernel: ------------[ cut here ]------------
apr 07 12:51:35 kernel: nouveau 0000:01:00.0: timeout
apr 07 12:51:35 kernel: WARNING: CPU: 3 PID: 1929 at drivers/gpu/drm/nouveau/nvkm/engine/gr/g84.c:171 g84_gr_tlb_flush+0x2ce/0x360 [nouveau]
a

Comment 1 Stefano Biagiotti 2018-04-15 12:22:08 UTC
Created attachment 1422178 [details]
journalctl -k -b -1 --no-pager --no-hostname

Bug still present with kernel-4.15.15-300.fc27.x86_64.

Comment 2 Stefano Biagiotti 2018-04-21 07:50:09 UTC
Created attachment 1424911 [details]
Output of journalctl -k -b -1 --no-pager --no-hostname

Bug still present with kernel-4.15.17-300.fc27.x86_64.

Comment 3 Stefano Biagiotti 2018-07-23 22:19:44 UTC
Created attachment 1470084 [details]
Output of journalctl -k -b -3 --no-pager --no-hostname

Bug still present with kernel-4.17.6-100.fc27.x86_64.

Comment 4 Ben Cotton 2018-11-27 14:56:21 UTC
This message is a reminder that Fedora 27 is nearing its end of life.
On 2018-Nov-30  Fedora will stop maintaining and issuing updates for
Fedora 27. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora  'version' of '27'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 27 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 5 Ben Cotton 2018-11-30 22:14:39 UTC
Fedora 27 changed to end-of-life (EOL) status on 2018-11-30. Fedora 27 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 6 Stefano Biagiotti 2019-10-21 21:00:15 UTC
Bug still present on Fedora 30 with
01:00.0 VGA compatible controller: NVIDIA Corporation GT215 [GeForce GT 320] (rev a2)
kernel-5.3.6-200.fc30.x86_64
xorg-x11-drv-nouveau-1.0.15-7.fc30.x86_64

ott 21 21:40:11 kernel: ------------[ cut here ]------------
ott 21 21:40:11 kernel: nouveau 0000:01:00.0: timeout
ott 21 21:40:11 kernel: WARNING: CPU: 3 PID: 4700 at drivers/gpu/drm/nouveau/nvkm/engine/gr/g84.c:168 g84_gr_tlb_flush+0x2e2/0x2f0 [nouveau]
ott 21 21:40:11 kernel: Modules linked in: fuse nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables hwmon_vid sunrpc snd_hda_codec_hdmi intel_powerclamp coretemp nouveau kvm_intel kvm iTCO_wdt iTCO_vendor_support gpio_ich snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_hda_codec irqbypass mxm_wmi video i2c_algo_bit intel_cstate ttm intel_uncore snd_hda_core snd_hwdep snd_seq snd_seq_device drm_kms_helper snd_pcm snd_timer drm i2c_i801 snd lpc_ich soundcore acpi_cpufreq binfmt_misc uas crc32c_intel serio_raw usb_storage r8169 wmi
ott 21 21:40:11 kernel: CPU: 3 PID: 4700 Comm: kworker/3:2 Kdump: loaded Not tainted 5.3.6-200.fc30.x86_64 #1

Comment 7 Stefano Biagiotti 2019-11-10 18:07:50 UTC
Created attachment 1634648 [details]
Output of journalctl -k -b -1 --no-pager --no-hostname

Bug still present with kernel-5.3.8-200.fc30.x86_64 and xorg-x11-drv-nouveau-1.0.15-7.fc30.x86_64.

Comment 8 Stefano Biagiotti 2020-01-20 22:08:37 UTC
Bug still present with kernel-5.4.10-100.fc30.x86_64 and xorg-x11-drv-nouveau-1.0.15-7.fc30.x86_64.

Comment 9 Henry Kroll 2020-02-13 06:32:20 UTC
I too have started experiencing display freeze, under Gnome, but also under a minimal jwm desktop setup on a fresh, new user account.

Two external monitors (3 total) used to work great with nouveau. Until Rawhide f(32). Display has started freezing randomly after 10-30 minutes. Sound, mouse, and hotkeys continue to work, though. So I created a hotkey to log out when it happens. Screen does not freeze when I boot without the HDMI monitor. But repeat, they all did work in Fedora 31. The behavior is that of a memory issue with nouveau and HDMI. It could be related/connected with a couple other problems that appeared in rawhide. 1) xorg-x11-drv-intel GPU hangs at boot. And 2) The HDMI sound device is no longer being detected. I don't know about this as much as it might seem, but intuitively, because the GPU crash at boot doesn't prevent system startup, it could be causing the intel driver to load a failsafe configuration that later messes with nouveau--and sound driver, snd-hda-intel. Just grabbing at straws. Ideas?


Dell Precision M4800 
CPU: Quad Core Intel Core i7-4800MQ (-MT MCP-) speed/min/max: 898/800/3700 MHz Kernel: 5.6.0-0.rc1.git0.1.fc32.x86_64 x86_64 
Up: 8h 03m Mem: 2331.8/15912.3 MiB (14.7%) Storage: 447.13 GiB (26.7% used) Procs: 249 Shell: bash 5.0.11 inxi: 3.0.37 


Display configuration that precedes eventual desktop graphics freeze:
xrandr \
--output eDP-1 --primary --mode 1920x1080 --pos 0x1592 -r 59.93 \
--output VGA-1 --mode 1440x900 --pos 1920x1080 -r 59.89 \
--output DP-1-4 --mode 1920x1080 --pos 1920x0 -r 60
[k.122.1:~]$

Comment 10 Andrew M. Shooman 2020-02-18 20:22:45 UTC
Created attachment 1663887 [details]
dmesg log showing call trace after screen freeze

Comment 11 Andrew M. Shooman 2020-02-18 20:28:55 UTC
CPU:      Dell Precision 5820
GPU:      0000:65:00.0 VGA compatible controller: NVIDIA Corporation GP106GL [Quadro P2000] (rev a1)
Fedora:   5.4.19-200.fc31.x86_64 #1 SMP Wed Feb 12 15:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Nouveau:  xorg-x11-drv-nouveau-1.0.15-8.fc31

Display freezes on screen lock 100% of the time.  Can still connect remotely to machine via ssh.

[  269.747439] RIP: 0010:evo_wait+0x5a/0x130 [nouveau]
[  269.747639] RIP: 0033:0x7f761c53538b
[  270.376128] RIP: 0010:evo_wait+0x5a/0x130 [nouveau]

(see dmesg log attachment for full call trace after display freeze)

If you need any more information to debug the problem, please let me know.
Please advise on any workarounds while problem is being debugged.
Right now, I disabled screen lock on timeout.
Thanks.


Best Regards,

Andy Shooman

Comment 12 Andrew M. Shooman 2020-02-27 00:26:17 UTC
CPU:      Dell Precision 5820
GPU:      0000:65:00.0 VGA compatible controller: NVIDIA Corporation GP106GL [Quadro P2000] (rev a1)
Fedora:   5.3.7-301.fc31.x86_64 #1 SMP Mon Oct 21 19:18:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Nouveau:  xorg-x11-drv-nouveau-1.0.15-8.fc31

Updated BIOS from 1.12.1 to 2.0.2

Bug seems to be page fault on memory read.

Call trace from original test on DP DP with older BIOS:
[  269.746859] nouveau 0000:65:00.0: bus: MMIO read of 00000000 FAULT at 616e18 [ IBUS ]
[  269.747247] nouveau 0000:65:00.0: bus: MMIO read of 00000000 FAULT at 616f98 [ IBUS ]
[  269.747367] BUG: unable to handle page fault for address: ffffa43e7b2be000
[  269.747369] #PF: supervisor write access in kernel mode
[  269.747370] #PF: error_code(0x0002) - not-present page
[  269.747371] PGD 87c833067 P4D 87c833067 PUD 0 
[  269.747373] Oops: 0002 [#1] SMP PTI
[  269.747375] nouveau 0000:65:00.0: bus: MMIO read of 00000000 FAULT at 642000 [ IBUS ]
[  269.747376] CPU: 2 PID: 1759 Comm: gnome-shell Not tainted 5.4.19-200.fc31.x86_64 #1
[  269.747378] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.12.1 08/19/2019
[  269.747439] RIP: 0010:evo_wait+0x5a/0x130 [nouveau]

Same error with other test cases and newer BIOS.

More testing of display freeze with different video interfaces (DP, HDMI, DVI, VGA) with one and two monitors:

DP      DP      bad
HDMI    HDMI    bad
DVI     DVI     good
VGA             good
DP      VGA     bad
HDMI    VGA     bad

Waiting for additional DP to VGA converter to test VGA with two monitors.
If VGA with two monitors does not work, next step may be to replace NVIDIA graphics card with AMD graphics card.
Current monitors do not support DVI (only good case with two monitors).  Old monitors support DVI but are too small.
Would like to help debug the problem with xorg-x11-drv-nouveau, but need to get some work done.
Repeated display freeze and rebooting machine is not good for productivity...
Thanks.

Comment 13 Andrew M. Shooman 2020-02-27 14:28:47 UTC
Additional test cases:

DP (single monitor)     good
HDMI (single monitor)   good


Summary:

Nouveau has trouble in the case of two or more monitors when at least one of the monitors is using DP or HDMI.
Nouveau works in the case of single monitor, any video interface (DP, HDMI, DVI, VGA), or with two monitors using DVI.
Waiting for additional DP to VGA converter to test VGA with two monitors.

Comment 14 Stefano Biagiotti 2020-05-05 16:06:01 UTC
Created attachment 1685277 [details]
Log from /var/crash

Bug still present on a fully updated Fedora 31.

kernel-5.6.8-200.fc31.x86_64
xorg-x11-drv-nouveau-1.0.15-8.fc31.x86_64

Excerpt of the attachment:
[ 4341.729525] nouveau 0000:01:00.0: gr: PGRAPH TLB flush idle timeout fail
[ 4341.729534] nouveau 0000:01:00.0: gr: PGRAPH_STATUS 00be0003 [BUSY DISPATCH ENG2D RMASK TPC_RAST TPC_PROP TPC_TEX TPC_MP]
[ 4341.729538] nouveau 0000:01:00.0: gr: PGRAPH_VSTATUS0: 00000000 []
[ 4341.729542] nouveau 0000:01:00.0: gr: PGRAPH_VSTATUS1: 0000106d [TPC_TEX TPC_MP]
[ 4341.729545] nouveau 0000:01:00.0: gr: PGRAPH_VSTATUS2: 00148000 [ENG2D]
[ 4343.729566] ------------[ cut here ]------------
[ 4343.729570] nouveau 0000:01:00.0: timeout
[ 4343.729716] WARNING: CPU: 0 PID: 2328 at drivers/gpu/drm/nouveau/nvkm/engine/gr/g84.c:168 g84_gr_tlb_flush+0x2e2/0x2f0 [nouveau]

Comment 15 Ian Laurie 2020-11-03 02:57:01 UTC
My bug #1893931 may possibly be a duplicate of this one.  However for me it's happening in Fedora 33 and I have a single monitor on both the systems it is happening on.  I can agree with Stefano's original description in that it does happen more when playing videos (in my case using VLC).

In my case however the problems started with Fedora 33, never had the issue in previous versions.

Comment 16 Ben Cotton 2020-11-03 16:51:06 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 17 Ben Cotton 2020-11-24 15:26:14 UTC
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 18 Yonatan 2021-11-05 04:13:57 UTC
[11188.869825] nouveau 0000:00:0d.0: gr: intr 00100000 [ERROR] nsource 00000002 [DATA_ERROR] nstatus 02000000 [BAD_ARGUMENT] ch 7 [00072000 Xwayland[3948]] subc 7 class 4497 mthd 022c data 00000004
[11188.869843] nouveau 0000:00:0d.0: gr: intr 00100000 [ERROR] nsource 00000002 [DATA_ERROR] nstatus 02000000 [BAD_ARGUMENT] ch 7 [00072000 Xwayland[3948]] subc 7 class 4497 mthd 020c data 00000004
[11188.869934] nouveau 0000:00:0d.0: gr: intr 00100000 [ERROR] nsource 00000002 [DATA_ERROR] nstatus 02000000 [BAD_ARGUMENT] ch 7 [00072000 Xwayland[3948]] subc 7 class 4497 mthd 022c data 00000004
[11188.869944] nouveau 0000:00:0d.0: gr: intr 00100000 [ERROR] nsource 00000002 [DATA_ERROR] nstatus 02000000 [BAD_ARGUMENT] ch 7 [00072000 Xwayland[3948]] subc 7 class 4497 mthd 020c data 00000004
[11189.004561] nouveau 0000:00:0d.0: gr: intr 00100000 [ERROR] nsource 00000002 [DATA_ERROR] nstatus 02000000 [BAD_ARGUMENT] ch 7 [00072000 Xwayland[3948]] subc 7 class 4497 mthd 022c data 00000004
[11189.004580] nouveau 0000:00:0d.0: gr: intr 00100000 [ERROR] nsource 00000002 [DATA_ERROR] nstatus 02000000 [BAD_ARGUMENT] ch 7 [00072000 Xwayland[3948]] subc 7 class 4497 mthd 020c data 00000004
[11189.012490] nouveau 0000:00:0d.0: gr: intr 00100000 [ERROR] nsource 00000002 [DATA_ERROR] nstatus 02000000 [BAD_ARGUMENT] ch 7 [00072000 Xwayland[3948]] subc 7 class 4497 mthd 022c data 00000004
[11189.012532] nouveau 0000:00:0d.0: gr: intr 00100000 [ERROR] nsource 00000002 [DATA_ERROR] nstatus 02000000 [BAD_ARGUMENT] ch 7 [00072000 Xwayland[3948]] subc 7 class 4497 mthd 020c data 00000004
[11189.012541] nouveau 0000:00:0d.0: gr: intr 00100000 [ERROR] nsource 00000002 [DATA_ERROR] nstatus 02000000 [BAD_ARGUMENT] ch 7 [00072000 Xwayland[3948]] subc 7 class 4497 mthd 022c data 00000004
[11189.012550] nouveau 0000:00:0d.0: gr: intr 00100000 [ERROR] nsource 00000002 [DATA_ERROR] nstatus 02000000 [BAD_ARGUMENT] ch 7 [00072000 Xwayland[3948]] subc 7 class 4497 mthd 020c data 00000004


Note You need to log in before you can comment on or make changes to this bug.