Bug 1783765

Summary: nouveau hangs video with TU116 - regression in kernel 5.3-rc4/5.4
Product: [Fedora] Fedora Reporter: Marcin Zajaczkowski <mszpak>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 31CC: airlied, bskeggs, hdegoede, ichavero, itamar, jarodwilson, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, masami256, mchehab, mjg59, steved
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-18 21:20:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marcin Zajaczkowski 2019-12-15 12:45:36 UTC
1. Please describe the problem:

System hangs with:
> kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
...
> kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
> kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff
> kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> kernel: ------------[ cut here ]------------
> kernel: nouveau 0000:01:00.0: timeout
> kernel: WARNING: CPU: 10 PID: 384 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/> 0xe0 [nouveau]
(and the end)

after upgrade to kernel v5.3-rc4+ (tested - among others - with 5.3-rc4, 5.3.0, kernel-5.4.2-300.fc31).

2. What is the Version-Release number of the kernel:

kernel-5.4.2-300.fc31 and any >= kernel-5.3.0-0.rc4.git0.1.fc31.

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

The first affected version is kernel-5.3.0-0.rc4.git0.1.fc31 (kernel-5.3.0-0.rc3.git1.1.fc31 works fine).

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Run system with GeForce GTX 1660 Ti - TU116 (probably also TU117) with kernel 5.3/5.4 (tested with Hyperbook NH5/Clevo NH55RCQ).

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Some time ago definitively. Not tested with the current rawhide. Most likely yes.

6. Are you running any modules that not shipped with directly Fedora's kernel?:

vboxdrv from rpmfusion.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Attached in the upstram bug - https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/516

Reporting here mostly due to "Test Day:2019-12-09 Kernel 5.4 Test Week".

8. Additional information.

I spent some time trying to narrow the scope using bisect. As a result I know that kernel 5.3.0-0.rc3.git1.1 (commit - 33920f1ec5bf) works fine, but 5.3.0-0.rc4.git0.1 (tag v5.3-rc4) does not. It's "just" ~260 commits between 7-11 Aug. 

However, there are no nouveau-related commits and just a few drm-related. Maybe any nouveau/drm developer here could help me to find our what change broke it.

Comment 1 Marcin Zajaczkowski 2020-01-18 21:20:58 UTC
After the discussion on the mailing list [1], with a help from Ilia Mirkin, further bisecting pointed that commit [2] as a culprit. Later on it turned out that the problem had been already solved and with 5.5.0-rc2+ (tested with kernel-core-5.5.0-0.rc2.git0.1.fc32.x86_64 and kernel-core-5.5.0-0.rc5.git0.1.fc32.x86_64) the problem doesn't occur and my TU116 works fine. Closing.

[1] - https://lists.freedesktop.org/archives/nouveau/2019-December/034943.html
[2] - https://github.com/torvalds/linux/commit/0acf5676dc0ffe0683543a20d5ecbd112af5b8ee