Bug 2224839 - 6.4.4 nvidia driver 535.54.03 doesnt work after kernel update
Summary: 6.4.4 nvidia driver 535.54.03 doesnt work after kernel update
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 38
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-23 13:05 UTC by aligoldenhat
Modified: 2023-08-02 15:56 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)

Description aligoldenhat 2023-07-23 13:05:57 UTC
I am using Lenovo Legion 5 Pro 16ITH6 with rtx 3050 mobile nvidia gpu
Upgrading from 6.3.12 to 6.4.4 caused the nvidia driver (nvidia-powerd.service) failed in boot, and it still works fine when i having booted in 6.3.12

lspci output 6.3.12 kernel:
00:00.0 Host bridge: Intel Corporation 11th Gen Core Processor Host Bridge/DRAM Registers (rev 05)
00:01.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe Controller #1 (rev 05)
00:02.0 VGA compatible controller: Intel Corporation TigerLake-H GT1 [UHD Graphics] (rev 01)
00:04.0 Signal processing controller: Intel Corporation TigerLake-LP Dynamic Tuning Processor Participant (rev 05)
00:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe Controller #0 (rev 05)
00:07.0 PCI bridge: Intel Corporation Tiger Lake-H Thunderbolt 4 PCI Express Root Port #0 (rev 05)
00:07.2 PCI bridge: Intel Corporation Tiger Lake-H Thunderbolt 4 PCI Express Root Port #2 (rev 05)
00:0a.0 Signal processing controller: Intel Corporation Tigerlake Telemetry Aggregator Driver (rev 01)
00:0d.0 USB controller: Intel Corporation Tiger Lake-H Thunderbolt 4 USB Controller (rev 05)
00:0d.2 USB controller: Intel Corporation Tiger Lake-H Thunderbolt 4 NHI #0 (rev 05)
00:0d.3 USB controller: Intel Corporation Tiger Lake-H Thunderbolt 4 NHI #1 (rev 05)
00:14.0 USB controller: Intel Corporation Tiger Lake-H USB 3.2 Gen 2x1 xHCI Host Controller (rev 11)
00:14.2 RAM memory: Intel Corporation Tiger Lake-H Shared SRAM (rev 11)
00:14.3 Network controller: Intel Corporation Tiger Lake PCH CNVi WiFi (rev 11)
00:15.0 Serial bus controller: Intel Corporation Tiger Lake-H Serial IO I2C Controller #0 (rev 11)
00:15.1 Serial bus controller: Intel Corporation Tiger Lake-H Serial IO I2C Controller #1 (rev 11)
00:15.2 Serial bus controller: Intel Corporation Device 43ea (rev 11)
00:16.0 Communication controller: Intel Corporation Tiger Lake-H Management Engine Interface (rev 11)
00:17.0 SATA controller: Intel Corporation Tiger Lake SATA AHCI Controller (rev 11)
00:1d.0 PCI bridge: Intel Corporation Tiger Lake-H PCI Express Root Port #9 (rev 11)
00:1d.6 PCI bridge: Intel Corporation Device 43b6 (rev 11)
00:1f.0 ISA bridge: Intel Corporation Tiger Lake-H LPC/eSPI Controller (rev 11)
00:1f.3 Audio device: Intel Corporation Tiger Lake-H HD Audio Controller (rev 11)
00:1f.4 SMBus: Intel Corporation Tiger Lake-H SMBus Controller (rev 11)
00:1f.5 Serial bus controller: Intel Corporation Tiger Lake-H SPI Controller (rev 11)
01:00.0 VGA compatible controller: NVIDIA Corporation GA107BM [GeForce RTX 3050 Mobile] (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 2291 (rev a1)
02:00.0 Non-Volatile memory controller: SK hynix Gold P31/BC711/PC711 NVMe Solid State Drive
5c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)




Reproducible: Always

Steps to Reproduce:
1. upgrade to 6.4.4
2. reboot

Comment 1 aligoldenhat 2023-07-23 13:10:41 UTC
Here is the lspci output, having booted into 6.4.4:

00:00.0 Host bridge: Intel Corporation 11th Gen Core Processor Host Bridge/DRAM Registers (rev 05)
00:01.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe Controller #1 (rev 05)
00:02.0 VGA compatible controller: Intel Corporation TigerLake-H GT1 [UHD Graphics] (rev 01)
00:04.0 Signal processing controller: Intel Corporation TigerLake-LP Dynamic Tuning Processor Participant (rev 05)
00:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe Controller #0 (rev 05)
00:07.0 PCI bridge: Intel Corporation Tiger Lake-H Thunderbolt 4 PCI Express Root Port #0 (rev 05)
00:07.2 PCI bridge: Intel Corporation Tiger Lake-H Thunderbolt 4 PCI Express Root Port #2 (rev 05)
00:0a.0 Signal processing controller: Intel Corporation Tigerlake Telemetry Aggregator Driver (rev 01)
00:0d.0 USB controller: Intel Corporation Tiger Lake-H Thunderbolt 4 USB Controller (rev 05)
00:0d.2 USB controller: Intel Corporation Tiger Lake-H Thunderbolt 4 NHI #0 (rev 05)
00:0d.3 USB controller: Intel Corporation Tiger Lake-H Thunderbolt 4 NHI #1 (rev 05)
00:14.0 USB controller: Intel Corporation Tiger Lake-H USB 3.2 Gen 2x1 xHCI Host Controller (rev 11)
00:14.2 RAM memory: Intel Corporation Tiger Lake-H Shared SRAM (rev 11)
00:14.3 Network controller: Intel Corporation Tiger Lake PCH CNVi WiFi (rev 11)
00:15.0 Serial bus controller: Intel Corporation Tiger Lake-H Serial IO I2C Controller #0 (rev 11)
00:15.1 Serial bus controller: Intel Corporation Tiger Lake-H Serial IO I2C Controller #1 (rev 11)
00:15.2 Serial bus controller: Intel Corporation Device 43ea (rev 11)
00:16.0 Communication controller: Intel Corporation Tiger Lake-H Management Engine Interface (rev 11)
00:17.0 SATA controller: Intel Corporation Tiger Lake SATA AHCI Controller (rev 11)
00:1d.0 PCI bridge: Intel Corporation Tiger Lake-H PCI Express Root Port #9 (rev 11)
00:1d.6 PCI bridge: Intel Corporation Device 43b6 (rev 11)
00:1f.0 ISA bridge: Intel Corporation Tiger Lake-H LPC/eSPI Controller (rev 11)
00:1f.3 Audio device: Intel Corporation Tiger Lake-H HD Audio Controller (rev 11)
00:1f.4 SMBus: Intel Corporation Tiger Lake-H SMBus Controller (rev 11)
00:1f.5 Serial bus controller: Intel Corporation Tiger Lake-H SPI Controller (rev 11)
01:00.0 VGA compatible controller: NVIDIA Corporation GA107BM [GeForce RTX 3050 Mobile] (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 2291 (rev a1)
02:00.0 Non-Volatile memory controller: SK hynix Gold P31/BC711/PC711 NVMe Solid State Drive
5c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)


And the "systemctl status nvidia-powerd.service" output, having booted into 6.4.4:
× nvidia-powerd.service - nvidia-powerd service
     Loaded: loaded (/usr/lib/systemd/system/nvidia-powerd.service; enabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: failed (Result: exit-code) since Sun 2023-07-23 16:36:48 +0330; 1min 43s ago
    Process: 1138 ExecStart=/usr/bin/nvidia-powerd (code=exited, status=1/FAILURE)
   Main PID: 1138 (code=exited, status=1/FAILURE)
        CPU: 8ms

Jul 23 16:36:48 fedora systemd[1]: Starting nvidia-powerd.service - nvidia-powerd service...
Jul 23 16:36:48 fedora /usr/bin/nvidia-powerd[1138]: nvidia-powerd version:1.0(build 1)
Jul 23 16:36:48 fedora /usr/bin/nvidia-powerd[1138]: Allocate client failed 38
Jul 23 16:36:48 fedora /usr/bin/nvidia-powerd[1138]: Failed to initialize RM Client
Jul 23 16:36:48 fedora systemd[1]: nvidia-powerd.service: Main process exited, code=exited, status=1/FAILURE
Jul 23 16:36:48 fedora systemd[1]: nvidia-powerd.service: Failed with result 'exit-code'.
Jul 23 16:36:48 fedora systemd[1]: Failed to start nvidia-powerd.service - nvidia-powerd service.

nvidia-smi output in 6.4.4:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Comment 2 Garrett Mitchener 2023-08-02 14:54:02 UTC
I have an Acer Predator Helios 300 from 2019, which is a laptop, with an Nvidia GTX 1660Ti.
Using Nvidia's drivers from rpmfusion, video works properly with kernel 6.3.13.
Specifically, I can plug an external monitor into the HDMI port, and it and the laptop's built-in display are both fully functional.
I'm using version 535.86.05 of the Nvidia drivers, which is a bit newer than in the original report.
After upgrading to kernel 6.4.6 and with these packages installed

nvidia-gpu-firmware-20230625-151.fc38.noarch
kmod-nvidia-6.3.11-200.fc38.x86_64-535.54.03-1.fc38.x86_64
xorg-x11-drv-nvidia-cuda-libs-535.86.05-1.fc38.x86_64
xorg-x11-drv-nvidia-libs-535.86.05-1.fc38.x86_64
xorg-x11-drv-nvidia-kmodsrc-535.86.05-1.fc38.x86_64
akmod-nvidia-535.86.05-1.fc38.x86_64
nvidia-settings-535.86.05-1.fc38.x86_64
xorg-x11-drv-nvidia-power-535.86.05-1.fc38.x86_64
xorg-x11-drv-nvidia-535.86.05-1.fc38.x86_64
nvidia-persistenced-535.86.05-1.fc38.x86_64
xorg-x11-drv-nvidia-cuda-535.86.05-1.fc38.x86_64
kmod-nvidia-6.3.12-200.fc38.x86_64-535.86.05-1.fc38.x86_64
kmod-nvidia-6.4.6-200.fc38.x86_64-535.86.05-1.fc38.x86_64

the laptop's screen is always blank after I select the new kernel in grub.

I can plug an external monitor into the laptop's HDMI port, and that works.
But the laptop's built-in screen isn't recognized by gnome or xrandr.

Kernel command line from /etc/default/grub is

pci=noaer rd.driver.blacklist=nouveau modprobe.blacklist=nouveau resume=UUID=733ad07d-7906-4fd7-baff-e0e24f52c9f6 rhgb quiet


Here's the result of lspci:

00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 07)
00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-H GT2 [UHD Graphics 630]
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 07)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:14.3 Network controller: Intel Corporation Cannon Lake PCH CNVi WiFi (rev 10)
00:15.0 Serial bus controller: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 (rev 10)
00:15.1 Serial bus controller: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller (rev 10)
00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #21 (rev f0)
00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #9 (rev f0)
00:1d.4 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #13 (rev f0)
00:1f.0 ISA bridge: Intel Corporation HM470 Chipset LPC/eSPI Controller (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
01:00.0 VGA compatible controller: NVIDIA Corporation TU116M [GeForce GTX 1660 Ti Mobile] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU116 High Definition Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU116 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller: NVIDIA Corporation TU116 USB Type-C UCSI Controller (rev a1)
06:00.0 Non-Volatile memory controller: SK hynix BC501 NVMe Solid State Drive
07:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
08:00.0 Ethernet controller: Qualcomm Atheros Killer E2500 Gigabit Ethernet Controller (rev 10)

Comment 3 Garrett Mitchener 2023-08-02 15:04:00 UTC
I just tried kernel 6.4.7 from updates-testing.
It's just like 6.4.6: built-in display is not recognized, external monitor works fine.

Comment 4 Garrett Mitchener 2023-08-02 15:53:28 UTC
I tried adding  initcall_blacklist=simpledrm_platform_driver_init and nvidia-drm.modeset=1  to the kernel command line.
They didn't improve anything.

I'm also seeing this secondary symptom that other virtual consoles, which should be accessible with Ctrl+Alt+F4 etc. are not working under kernel 6.4.7.
They work under 6.3.12.

Further discussion here:

https://forums.developer.nvidia.com/t/fedora-38-nvidia-driver-530-41-03/255724

https://forums.developer.nvidia.com/t/nvidia-driver-isnt-compatible-with-simpledrm-so-boot-output-and-ttys-are-blank/238007

Comment 5 Garrett Mitchener 2023-08-02 15:56:31 UTC
Also related:

https://bugzilla.redhat.com/show_bug.cgi?id=2071209


Note You need to log in before you can comment on or make changes to this bug.