Bug 1899060 - Nouveau driving crashing on P15 with latest Fedora33
Summary: Nouveau driving crashing on P15 with latest Fedora33
Keywords:
Status: CLOSED DUPLICATE of bug 1902798
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 33
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 816768
TreeView+ depends on / blocked
 
Reported: 2020-11-18 13:54 UTC by Mark Pearson
Modified: 2021-02-10 13:54 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-10 13:54:32 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg log (622.67 KB, text/plain)
2020-11-18 13:54 UTC, Mark Pearson
no flags Details
Issue seen when switching from USB-c to HDMI (112.25 KB, text/plain)
2020-11-18 17:53 UTC, Mark Pearson
no flags Details

Description Mark Pearson 2020-11-18 13:54:27 UTC
Created attachment 1730560 [details]
dmesg log

1. Please describe the problem:
I have F33 installed on my P15 to do some final sanity checks and bug review as we get ready for web sales for this.
Updated to the latest - SW, BIOS, EC, ME

When I boot in hybrid mode the Nouveau driver is getting a lot of crashes - I've attached the dmesg log but unfortunately wasn't able to catch the first crash.

I'm unable to get an external monitor working (HDMI, USB-c, TBT).I get
DRM: Dropped ACPI reprobe event due to RPM error: -22

It seems to work in discrete mode (though I hit some other issues there)
Afraid this one will gate us releasing the Fedora P15 so is high priority for us. Let me know what I can try/do/collect to help. I need to collect more details from other systems as it seems weird nobody else has noticed this.

Nvidia card is a Quadro RTX 4000 mobile (TU104GLM)

2. What is the Version-Release number of the kernel: 5.9.8


3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
The test team don't seem to have seen this so I have to assume it's OK. I haven't myself been back to older versions to try - I will do that. This machine is new to me so I don't have any historical results.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below: Can reproduce - no special steps needed


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
Yes - see dmesg.txt2

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No, but as a note I did enable nvidia driver (rpmfusion) and did some testing with that (works) and then disabled it again. 

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Mark Pearson 2020-11-18 17:52:18 UTC
I reinstalled F33 from scratch and it is now (mostly) working.

The only issue I had was in switching from USB-C to HDMI on one occaision it didn't come up and there is a kernel warning log in the dmesg log (I'll attach)

I also hit an issue where if I changed the display to external only when using thunderbolt then gnome-shell crashes. I think we have a bug report already open for that so will update that (or raise a separate issue if not). With HDMI it works fine.

I have no idea why it all worked when reinstalling from scratch. Let me know if you want to keep this bug open for the HDMI kernel warning or if a separate bug for that is better...

Thanks
Mark

Comment 2 Mark Pearson 2020-11-18 17:53:06 UTC
Created attachment 1730635 [details]
Issue seen when switching from USB-c to HDMI

Comment 3 Mark Pearson 2020-11-18 18:16:58 UTC
Note - it seems the error happens when either the HDMI or USB-c connection is removed - up until the "acr: AHESASC binary failed" line.

The section afterwards happens when a new connection is made. It seems this happens regardless of which connection was plugged in. The display can be made to work again by rebooting but as soon as it's changed the issue happens again.

Comment 4 Mark Pearson 2020-12-16 13:59:33 UTC
I believe this has been fixed by the changes in 1896904 (but still has 1902798 present).

I'll do some checking but we can probably close this as a dupe

Mark

Comment 5 Mark Pearson 2021-02-10 13:54:32 UTC
The issue is there but only seen on the RTX4000 (and seen on P17 with the same card too).
I'm going to mark this as a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1902798 - just so we're only tracking this in one place.

Mark

*** This bug has been marked as a duplicate of bug 1902798 ***


Note You need to log in before you can comment on or make changes to this bug.