Bug 1655788 - Screen freeze with nouveau timeout and EVGA Nvidia GeForce GTX 1050 Ti SC
Summary: Screen freeze with nouveau timeout and EVGA Nvidia GeForce GTX 1050 Ti SC
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-nouveau
Version: 29
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Ben Skeggs
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-03 22:25 UTC by Ian Shields
Modified: 2019-11-27 20:45 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-11-27 20:45:38 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Last 272 lines of journalctl log around time of failure (41.90 KB, text/plain)
2018-12-03 22:25 UTC, Ian Shields
no flags Details

Description Ian Shields 2018-12-03 22:25:00 UTC
Created attachment 1511128 [details]
Last 272 lines of journalctl log around time of failure

Description of problem:
Fedora 29 runs for a short while then becomes very unresponsive to the point of appearing to freeze. Mouse  and keyboard response does still exist, but is extremely slow to the point of appearing non-existent. Problem occurs when booting vmlinuz-4.19.4-300.fc29.x86_64, but does not seem to happen with vmlinuz-0-rescue. 


Version-Release number of selected component (if applicable):
Fedora 29 kernel.x86_64 4.19.4-300.fc29
xorg-x11-drv-nouveau.x86_64 1:1.0.15-6.fc29 


How reproducible:
My setup has three computers connected to a Samsung U28E590D 28-Inch 4k UHD monitor, two via HDMI and this one via DP. I am low vision so I run the monitor in GNOME with a display setting of 200%. The keyboard and mouse are switched via a KVM switch. So I typically switch KB and mouse then switch display to  the right system


Steps to Reproduce:
1. Boot Fedora 29 system. 
2. Run for a while. System freezes. Problem may be exacerbated by switching to another system/display and then returning to the Fedora 29 system + DP adapter. However, I believe I have seen it freeze almost immediately without this switch. 
3.

Actual results:
System becomes unresponsive to KB and mouse


Expected results:
System should be responsive.


Additional info:
Many nouveau timeouts and Window manager errors in journalctl log. I am attaching the log that covers the failure. Some example lines to aid other searching for this bug.
Dec 03 16:42:01 localhost.localdomain org.gnome.Shell.desktop[1640]: Window manager warning: last_user_time (2469524) is greater than comparison timestamp (2468476).  This most likely represents a buggy client sending inaccurate timestamps in messages such as _NET_ACTIVE_WINDOW.  Trying to work around...
Dec 03 16:42:01 localhost.localdomain org.gnome.Shell.desktop[1640]: Window manager warning: W1 (ian@localh) appears to be one of the offending windows with a timestamp of 2469524.  Working around...
Dec 03 16:42:01 localhost.localdomain kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Dec 03 16:42:03 localhost.localdomain kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Dec 03 16:42:05 localhost.localdomain kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Dec 03 16:42:05 localhost.localdomain org.gnome.Shell.desktop[1640]: Window manager warning: last_focus_time (2474628) is greater than comparison timestamp (2474436).  This most likely represents a buggy client sending inaccurate timestamps in messages such as _NET_ACTIVE_WINDOW.  Trying to work around...
Dec 03 16:42:05 localhost.localdomain org.gnome.Shell.desktop[1640]: Window manager warning: last_user_time (2474628) is greater than comparison timestamp (2474436).  This most likely represents a buggy client sending inaccurate timestamps in messages such as _NET_ACTIVE_WINDOW.  Trying to work around...
Dec 03 16:42:05 localhost.localdomain org.gnome.Shell.desktop[1640]: Window manager warning: 0x1a00010 (Command-li) appears to be one of the offending windows with a timestamp of 2474628.  Working around...
Dec 03 16:42:05 localhost.localdomain org.gnome.Shell.desktop[1640]: Window manager warning: W1 (ian@localh) appears to be one of the offending windows with a timestamp of 2474628.  Working around...
Dec 03 16:42:07 localhost.localdomain kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Dec 03 16:42:09 localhost.localdomain kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Dec 03 16:42:09 localhost.localdomain kernel: usb 4-5.4: USB disconnect, device number 6
Dec 03 16:42:11 localhost.localdomain kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Dec 03 16:42:11 localhost.localdomain fwupd[2173]: disabling polling: failed to open /dev/hidraw1
Dec 03 16:42:13 localhost.localdomain kernel: nouveau 0000:01:00.0: DRM: base-0: timeout

Comment 1 Ian Shields 2018-12-04 02:17:26 UTC
BTW, this system works just fine with Fedora 28 as well as Fedora 29 Rescue image.

Comment 2 Ian Shields 2018-12-05 18:43:28 UTC
Updated Fedora 28 this mornign and mouse is now behaving strangely. Has the shakes, leaves droppings. Sometimes two or three pointers showing.

Comment 3 Lars E. Pettersson 2018-12-06 11:37:40 UTC
Just now noticed something similar on Fedora 28.

Kernels not OK:
kernel-4.19.4-200.fc28.x86_64
kernel-4.19.5-200.fc28.x86_64

OK:
kernel-4.18.16-200.fc28.x86_64

In my case the GUI freezes up quite quickly, some minutes after reboot. The system is reachable via SSH, only the GUI is effected.

xorg-x11-drv-nouveau-1.0.15-4.fc28.x86_64

Graphic card is NVIDIA Corporation GP107GL [Quadro P600] with three monitors using display port.

I see similar errors etc in the journal as Ian Shields.

Comment 4 Ian Shields 2018-12-06 11:59:58 UTC
SSH access also works for me. Using 'top', FWIW, I don't see any unusual CPU activity.

Comment 5 Fred New 2018-12-10 04:54:14 UTC
This problem first started for me with the 4.19 kernels. The 4.18.17-300.fc29.x86_64 kernel was the last one that didn't have this problem. I have a GeForce GTX 1050 Ti.

Comment 6 Fred New 2018-12-10 17:27:54 UTC
I booted kernel-4.19.6-300.fc29.x86_64 and the resulting journal ends with the following messages. The mouse was a little sticky while I was only getting the "evo channel stalled" messages. Things got really bad after 19:04:55. The clock on my gnome panel displays seconds, and it was updating once every 6 seconds. The keyboard and mouse were erratic:

Dec 10 18:46:56 kernel: fuse init (API version 7.27)
Dec 10 18:46:57 kernel: rfkill: input handler disabled
Dec 10 19:02:44 kernel: nouveau 0000:08:00.0: disp: chid 0 mthd 0000 data 00000000 00001000 00000001
Dec 10 19:02:46 kernel: nouveau: evo channel stalled
Dec 10 19:03:07 kernel: nouveau: evo channel stalled
Dec 10 19:03:24 kernel: nouveau: evo channel stalled
Dec 10 19:03:35 kernel: nouveau: evo channel stalled
Dec 10 19:03:47 kernel: nouveau: evo channel stalled
Dec 10 19:03:57 kernel: nouveau: evo channel stalled
Dec 10 19:04:55 kernel: nouveau 0000:08:00.0: DRM: core notifier timeout
Dec 10 19:04:57 kernel: nouveau 0000:08:00.0: DRM: base-0: timeout
Dec 10 19:04:59 kernel: nouveau: evo channel stalled
Dec 10 19:05:01 kernel: nouveau 0000:08:00.0: DRM: base-0: timeout
Dec 10 19:05:03 kernel: nouveau 0000:08:00.0: DRM: base-0: timeout
Dec 10 19:05:05 kernel: nouveau 0000:08:00.0: DRM: base-0: timeout
Dec 10 19:05:07 kernel: nouveau 0000:08:00.0: DRM: base-0: timeout
(and many more of this last line)

Comment 7 Fred New 2018-12-12 17:14:31 UTC
I'm still having this problem with
     kernel-4.19.7-300.fc29.x86_64
     xorg-x11-drv-nouveau-1.0.15-6.fc29.x86_64

Comment 8 Fred New 2018-12-12 17:56:24 UTC
With kernel-4.19.8-300.fc29.x86_64 my journal gives perhaps a little more information:

Dec 12 19:23:55 kernel: fuse init (API version 7.27)
Dec 12 19:23:56 kernel: rfkill: input handler disabled
Dec 12 19:47:22 kernel: nouveau 0000:08:00.0: disp: chid 1 mthd 0000 data 00000000 00003000 00000000
Dec 12 19:47:22 kernel: nouveau 0000:08:00.0: disp: chid 1 mthd 0004 data 04380780 10003004 00000000
Dec 12 19:47:22 kernel: nouveau 0000:08:00.0: disp: chid 1 mthd 0008 data 00007804 10003008 00000000
Dec 12 19:47:22 kernel: nouveau 0000:08:00.0: disp: chid 1 mthd 000c data 0000cf00 1000300c 00000000
Dec 12 19:47:22 kernel: nouveau 0000:08:00.0: disp: chid 1 mthd 0010 data 20000000 10003010 00000000
Dec 12 19:47:22 kernel: nouveau 0000:08:00.0: disp: chid 1 mthd 0014 data 00000000 10003014 00000000
Dec 12 19:47:22 kernel: nouveau 0000:08:00.0: disp: chid 1 mthd 0000 data 00000400 10001000 00000002
Dec 12 19:47:24 kernel: nouveau 0000:08:00.0: DRM: base-0: timeout
Dec 12 19:47:26 kernel: nouveau 0000:08:00.0: DRM: base-0: timeout
Dec 12 19:47:28 kernel: nouveau 0000:08:00.0: DRM: base-0: timeout
...

Comment 9 Ben Skeggs 2018-12-13 23:22:59 UTC
I recently pushed a fix[1] upstream for an issue that could cause the problems that are being seen here.  Hopefully it'll be picked up by a Fedora kernel update after it's landed in the stable branches.

[1] https://github.com/skeggsb/linux/commit/970a5ee41c72df46e3b0f307528c7d8ef7734a2e

Comment 10 Ian Shields 2018-12-14 01:46:15 UTC
Thanks. Let us know which kernel it finally arrives in. Meantime I have a Radeon RX 550 that works fine on my system.

Comment 11 Ludek Finstrle 2019-01-22 11:03:21 UTC
Ben, you're my superhero. The patch fixes my problem with CentOS 7.6. I applied it against latest CentOS 7.6 kernel 3.10.0-957.1.3.el7.x86_64. Thanks!
BTW is it possible to push it also into RHEL kernel?

Comment 12 Ben Cotton 2019-10-31 20:08:28 UTC
This message is a reminder that Fedora 29 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 29 on 2019-11-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '29'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 29 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 13 Ben Cotton 2019-11-27 20:45:38 UTC
Fedora 29 changed to end-of-life (EOL) status on 2019-11-26. Fedora 29 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.