Bug 1783765 - nouveau hangs video with TU116 - regression in kernel 5.3-rc4/5.4
Summary: nouveau hangs video with TU116 - regression in kernel 5.3-rc4/5.4
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 31
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-15 12:45 UTC by Marcin Zajaczkowski
Modified: 2020-01-18 21:20 UTC (History)
17 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2020-01-18 21:20:58 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
freedesktop.org Gitlab xorg/driver/xf86-video-nouveau/issues/516 0 None None None 2019-12-15 12:45:36 UTC

Description Marcin Zajaczkowski 2019-12-15 12:45:36 UTC
1. Please describe the problem:

System hangs with:
> kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
...
> kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
> kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff
> kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> kernel: ------------[ cut here ]------------
> kernel: nouveau 0000:01:00.0: timeout
> kernel: WARNING: CPU: 10 PID: 384 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/> 0xe0 [nouveau]
(and the end)

after upgrade to kernel v5.3-rc4+ (tested - among others - with 5.3-rc4, 5.3.0, kernel-5.4.2-300.fc31).

2. What is the Version-Release number of the kernel:

kernel-5.4.2-300.fc31 and any >= kernel-5.3.0-0.rc4.git0.1.fc31.

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

The first affected version is kernel-5.3.0-0.rc4.git0.1.fc31 (kernel-5.3.0-0.rc3.git1.1.fc31 works fine).

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Run system with GeForce GTX 1660 Ti - TU116 (probably also TU117) with kernel 5.3/5.4 (tested with Hyperbook NH5/Clevo NH55RCQ).

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Some time ago definitively. Not tested with the current rawhide. Most likely yes.

6. Are you running any modules that not shipped with directly Fedora's kernel?:

vboxdrv from rpmfusion.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Attached in the upstram bug - https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/516

Reporting here mostly due to "Test Day:2019-12-09 Kernel 5.4 Test Week".

8. Additional information.

I spent some time trying to narrow the scope using bisect. As a result I know that kernel 5.3.0-0.rc3.git1.1 (commit - 33920f1ec5bf) works fine, but 5.3.0-0.rc4.git0.1 (tag v5.3-rc4) does not. It's "just" ~260 commits between 7-11 Aug. 

However, there are no nouveau-related commits and just a few drm-related. Maybe any nouveau/drm developer here could help me to find our what change broke it.

Comment 1 Marcin Zajaczkowski 2020-01-18 21:20:58 UTC
After the discussion on the mailing list [1], with a help from Ilia Mirkin, further bisecting pointed that commit [2] as a culprit. Later on it turned out that the problem had been already solved and with 5.5.0-rc2+ (tested with kernel-core-5.5.0-0.rc2.git0.1.fc32.x86_64 and kernel-core-5.5.0-0.rc5.git0.1.fc32.x86_64) the problem doesn't occur and my TU116 works fine. Closing.

[1] - https://lists.freedesktop.org/archives/nouveau/2019-December/034943.html
[2] - https://github.com/torvalds/linux/commit/0acf5676dc0ffe0683543a20d5ecbd112af5b8ee


Note You need to log in before you can comment on or make changes to this bug.