Bug 2224839
| Summary: | 6.4.4 nvidia driver 535.54.03 doesnt work after kernel update | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | aligoldenhat |
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
| Status: | NEW --- | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 38 | CC: | acaringi, adscvr, airlied, alciregi, bskeggs, garrett.mitchener, hdegoede, hpa, jarodwilson, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, ptalbert, steved |
| Target Milestone: | --- | Keywords: | Upgrades |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | --- | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
aligoldenhat
2023-07-23 13:05:57 UTC
Here is the lspci output, having booted into 6.4.4:
00:00.0 Host bridge: Intel Corporation 11th Gen Core Processor Host Bridge/DRAM Registers (rev 05)
00:01.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe Controller #1 (rev 05)
00:02.0 VGA compatible controller: Intel Corporation TigerLake-H GT1 [UHD Graphics] (rev 01)
00:04.0 Signal processing controller: Intel Corporation TigerLake-LP Dynamic Tuning Processor Participant (rev 05)
00:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe Controller #0 (rev 05)
00:07.0 PCI bridge: Intel Corporation Tiger Lake-H Thunderbolt 4 PCI Express Root Port #0 (rev 05)
00:07.2 PCI bridge: Intel Corporation Tiger Lake-H Thunderbolt 4 PCI Express Root Port #2 (rev 05)
00:0a.0 Signal processing controller: Intel Corporation Tigerlake Telemetry Aggregator Driver (rev 01)
00:0d.0 USB controller: Intel Corporation Tiger Lake-H Thunderbolt 4 USB Controller (rev 05)
00:0d.2 USB controller: Intel Corporation Tiger Lake-H Thunderbolt 4 NHI #0 (rev 05)
00:0d.3 USB controller: Intel Corporation Tiger Lake-H Thunderbolt 4 NHI #1 (rev 05)
00:14.0 USB controller: Intel Corporation Tiger Lake-H USB 3.2 Gen 2x1 xHCI Host Controller (rev 11)
00:14.2 RAM memory: Intel Corporation Tiger Lake-H Shared SRAM (rev 11)
00:14.3 Network controller: Intel Corporation Tiger Lake PCH CNVi WiFi (rev 11)
00:15.0 Serial bus controller: Intel Corporation Tiger Lake-H Serial IO I2C Controller #0 (rev 11)
00:15.1 Serial bus controller: Intel Corporation Tiger Lake-H Serial IO I2C Controller #1 (rev 11)
00:15.2 Serial bus controller: Intel Corporation Device 43ea (rev 11)
00:16.0 Communication controller: Intel Corporation Tiger Lake-H Management Engine Interface (rev 11)
00:17.0 SATA controller: Intel Corporation Tiger Lake SATA AHCI Controller (rev 11)
00:1d.0 PCI bridge: Intel Corporation Tiger Lake-H PCI Express Root Port #9 (rev 11)
00:1d.6 PCI bridge: Intel Corporation Device 43b6 (rev 11)
00:1f.0 ISA bridge: Intel Corporation Tiger Lake-H LPC/eSPI Controller (rev 11)
00:1f.3 Audio device: Intel Corporation Tiger Lake-H HD Audio Controller (rev 11)
00:1f.4 SMBus: Intel Corporation Tiger Lake-H SMBus Controller (rev 11)
00:1f.5 Serial bus controller: Intel Corporation Tiger Lake-H SPI Controller (rev 11)
01:00.0 VGA compatible controller: NVIDIA Corporation GA107BM [GeForce RTX 3050 Mobile] (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 2291 (rev a1)
02:00.0 Non-Volatile memory controller: SK hynix Gold P31/BC711/PC711 NVMe Solid State Drive
5c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
And the "systemctl status nvidia-powerd.service" output, having booted into 6.4.4:
× nvidia-powerd.service - nvidia-powerd service
Loaded: loaded (/usr/lib/systemd/system/nvidia-powerd.service; enabled; preset: enabled)
Drop-In: /usr/lib/systemd/system/service.d
└─10-timeout-abort.conf
Active: failed (Result: exit-code) since Sun 2023-07-23 16:36:48 +0330; 1min 43s ago
Process: 1138 ExecStart=/usr/bin/nvidia-powerd (code=exited, status=1/FAILURE)
Main PID: 1138 (code=exited, status=1/FAILURE)
CPU: 8ms
Jul 23 16:36:48 fedora systemd[1]: Starting nvidia-powerd.service - nvidia-powerd service...
Jul 23 16:36:48 fedora /usr/bin/nvidia-powerd[1138]: nvidia-powerd version:1.0(build 1)
Jul 23 16:36:48 fedora /usr/bin/nvidia-powerd[1138]: Allocate client failed 38
Jul 23 16:36:48 fedora /usr/bin/nvidia-powerd[1138]: Failed to initialize RM Client
Jul 23 16:36:48 fedora systemd[1]: nvidia-powerd.service: Main process exited, code=exited, status=1/FAILURE
Jul 23 16:36:48 fedora systemd[1]: nvidia-powerd.service: Failed with result 'exit-code'.
Jul 23 16:36:48 fedora systemd[1]: Failed to start nvidia-powerd.service - nvidia-powerd service.
nvidia-smi output in 6.4.4:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
I have an Acer Predator Helios 300 from 2019, which is a laptop, with an Nvidia GTX 1660Ti. Using Nvidia's drivers from rpmfusion, video works properly with kernel 6.3.13. Specifically, I can plug an external monitor into the HDMI port, and it and the laptop's built-in display are both fully functional. I'm using version 535.86.05 of the Nvidia drivers, which is a bit newer than in the original report. After upgrading to kernel 6.4.6 and with these packages installed nvidia-gpu-firmware-20230625-151.fc38.noarch kmod-nvidia-6.3.11-200.fc38.x86_64-535.54.03-1.fc38.x86_64 xorg-x11-drv-nvidia-cuda-libs-535.86.05-1.fc38.x86_64 xorg-x11-drv-nvidia-libs-535.86.05-1.fc38.x86_64 xorg-x11-drv-nvidia-kmodsrc-535.86.05-1.fc38.x86_64 akmod-nvidia-535.86.05-1.fc38.x86_64 nvidia-settings-535.86.05-1.fc38.x86_64 xorg-x11-drv-nvidia-power-535.86.05-1.fc38.x86_64 xorg-x11-drv-nvidia-535.86.05-1.fc38.x86_64 nvidia-persistenced-535.86.05-1.fc38.x86_64 xorg-x11-drv-nvidia-cuda-535.86.05-1.fc38.x86_64 kmod-nvidia-6.3.12-200.fc38.x86_64-535.86.05-1.fc38.x86_64 kmod-nvidia-6.4.6-200.fc38.x86_64-535.86.05-1.fc38.x86_64 the laptop's screen is always blank after I select the new kernel in grub. I can plug an external monitor into the laptop's HDMI port, and that works. But the laptop's built-in screen isn't recognized by gnome or xrandr. Kernel command line from /etc/default/grub is pci=noaer rd.driver.blacklist=nouveau modprobe.blacklist=nouveau resume=UUID=733ad07d-7906-4fd7-baff-e0e24f52c9f6 rhgb quiet Here's the result of lspci: 00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 07) 00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 07) 00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-H GT2 [UHD Graphics 630] 00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 07) 00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model 00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10) 00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10) 00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10) 00:14.3 Network controller: Intel Corporation Cannon Lake PCH CNVi WiFi (rev 10) 00:15.0 Serial bus controller: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 (rev 10) 00:15.1 Serial bus controller: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 (rev 10) 00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10) 00:17.0 SATA controller: Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller (rev 10) 00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #21 (rev f0) 00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #9 (rev f0) 00:1d.4 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #13 (rev f0) 00:1f.0 ISA bridge: Intel Corporation HM470 Chipset LPC/eSPI Controller (rev 10) 00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10) 00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10) 00:1f.5 Serial bus controller: Intel Corporation Cannon Lake PCH SPI Controller (rev 10) 01:00.0 VGA compatible controller: NVIDIA Corporation TU116M [GeForce GTX 1660 Ti Mobile] (rev a1) 01:00.1 Audio device: NVIDIA Corporation TU116 High Definition Audio Controller (rev a1) 01:00.2 USB controller: NVIDIA Corporation TU116 USB 3.1 Host Controller (rev a1) 01:00.3 Serial bus controller: NVIDIA Corporation TU116 USB Type-C UCSI Controller (rev a1) 06:00.0 Non-Volatile memory controller: SK hynix BC501 NVMe Solid State Drive 07:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO 08:00.0 Ethernet controller: Qualcomm Atheros Killer E2500 Gigabit Ethernet Controller (rev 10) I just tried kernel 6.4.7 from updates-testing. It's just like 6.4.6: built-in display is not recognized, external monitor works fine. I tried adding initcall_blacklist=simpledrm_platform_driver_init and nvidia-drm.modeset=1 to the kernel command line. They didn't improve anything. I'm also seeing this secondary symptom that other virtual consoles, which should be accessible with Ctrl+Alt+F4 etc. are not working under kernel 6.4.7. They work under 6.3.12. Further discussion here: https://forums.developer.nvidia.com/t/fedora-38-nvidia-driver-530-41-03/255724 https://forums.developer.nvidia.com/t/nvidia-driver-isnt-compatible-with-simpledrm-so-boot-output-and-ttys-are-blank/238007 Also related: https://bugzilla.redhat.com/show_bug.cgi?id=2071209 |