Bug 1650224 - kernel 4.19.2 and nvidia issue
Summary: kernel 4.19.2 and nvidia issue
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 29
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1651452 1652510 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-15 15:51 UTC by Sammy
Modified: 2018-12-01 20:41 UTC (History)
22 users (show)

Fixed In Version: kernel-4.19.5-200.fc28 kernel-4.19.5-300.fc29
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-12-01 02:05:43 UTC


Attachments (Terms of Use)

Description Sammy 2018-11-15 15:51:29 UTC
I have installed the kernel 4.19.2 from koji on my system and something weird happened, no browser would start (firefox, gnome, konqueror) and when trying to reboot the system would be unresponsive requiring a powerdown. I looked int the /var/log/messages and the problem can be the interaction with the NVIDIA drivers from rpmfusion (version 410.73 latest). Going back to 4.18.18 resolves the problem.

The messages are:
kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
kernel: PGD 0 P4D 0 
kernel: Oops: 0000 [#1] SMP PTI
kernel: CPU: 11 PID: 6759 Comm: gst-plugin-scan Tainted: P           OE     4.19.2-300.fc29.x86_64 #1
kernel: Hardware name: Dell Inc. Precision 7920 Tower/0RN4PJ, BIOS 1.8.4 10/05/2018
kernel: RIP: 0010:drm_lease_owner+0xd/0x20 [drm]
kernel: Code: 83 c4 18 5b 5d c3 b8 ea ff ff ff eb e2 b8 ed ff ff ff eb db e8 b4 79 55 c8 0f 1f 40 00 0f 1f 44 00 00 48 89 f8 eb 03 48 89 d0 <48> 8b 90 80 00 00 00 48 85 d2 75 f1 c3 66 0f 1f 44 00 00 0f 1f 44
kernel: RSP: 0018:ffff9db20fbcfb90 EFLAGS: 00010202
kernel: RAX: 0000000000000000 RBX: ffff88adf1db0200 RCX: ffff88adf1db02c8
kernel: RDX: ffff88adf47a0000 RSI: 0000000000000000 RDI: 0000000000000000
kernel: RBP: ffff88ae5b93d800 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: ffffea2ba0bcf300 R11: 0000000000000002 R12: ffff88ae5b93d888
kernel: R13: 0000000000000000 R14: ffff88adf1db02c8 R15: dead000000000100
kernel: FS:  00007f70496cd740(0000) GS:ffff88a65fec0000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000080 CR3: 0000000806ef4002 CR4: 00000000007606e0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: PKRU: 55555554
kernel: Call Trace:
kernel: drm_is_current_master+0x1a/0x30 [drm]
kernel: drm_master_release+0x3e/0x140 [drm]
kernel: drm_file_free.part.4+0x2db/0x2e0 [drm]
kernel: drm_open+0x1e5/0x200 [drm]
kernel: ? drm_dev_enter+0x19/0x50 [drm]
kernel: drm_stub_open+0xaf/0xf0 [drm]
kernel: chrdev_open+0xa2/0x1c0
kernel: ? cdev_put.part.0+0x20/0x20
kernel: do_dentry_open+0x132/0x340
kernel: path_openat+0x33a/0x1610
kernel: ? sprintf+0x56/0x70
kernel: ? uevent_show+0xde/0x100
kernel: do_filp_open+0x93/0x100
kernel: ? __check_object_size+0xa3/0x181
kernel: do_sys_open+0x186/0x210
kernel: do_syscall_64+0x5b/0x160
kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
kernel: RIP: 0033:0x7f704992859f
kernel: Code: 52 89 f0 25 00 00 41 00 3d 00 00 41 00 74 44 8b 05 b6 ee 00 00 85 c0 75 65 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 9d 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
kernel: RSP: 002b:00007ffe430966a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
kernel: RAX: ffffffffffffffda RBX: 000055dd3ee2f170 RCX: 00007f704992859f
kernel: RDX: 0000000000080002 RSI: 000055dd3ee31ec0 RDI: 00000000ffffff9c
kernel: RBP: 000055dd3ee2f210 R08: 00007f7049276a60 R09: 0048544150564544
kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000055dd3ee33250
kernel: R13: 000055dd3ee2f170 R14: 000055dd3ee31be0 R15: 000055dd3ee31420
kernel: Modules linked in: fuse ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c sunrpc vfat fat nvidia_drm(POE) usblp nvidia_modeset(POE) nvidia_uvm(POE) snd_hda_codec_hdmi intel_rapl skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel nvidia(POE) kvm irqbypass snd_hda_codec_realtek crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel mei_wdt intel_cstate iTCO_wdt iTCO_vendor_support snd_hda_intel snd_usb_audio snd_hda_codec dell_smm_hwmon snd_hda_core snd_usbmidi_lib snd_rawmidi uvcvideo snd_hwdep drm_kms_helper snd_seq videobuf2_vmalloc videobuf2_memops snd_seq_device videobuf2_v4l2 videobuf2_common snd_pcm drm dell_wmi intel_uncore videodev snd_timer dell_smbios sparse_keymap
Nov 15 07:28:04 compsci kernel: dcdbas snd mei_me ipmi_devintf media video intel_rapl_perf wmi_bmof dell_wmi_descriptor intel_wmi_thunderbolt ipmi_msghandler e1000e soundcore mei lpc_ich ioatdma i2c_i801 dca pcc_cpufreq binfmt_misc crc32c_intel uas usb_storage vmd ata_generic pata_acpi wmi
kernel: CR2: 0000000000000080
kernel: ---[ end trace 933d11f6746e2b1a ]---
kernel: RIP: 0010:drm_lease_owner+0xd/0x20 [drm]
kernel: Code: 83 c4 18 5b 5d c3 b8 ea ff ff ff eb e2 b8 ed ff ff ff eb db e8 b4 79 55 c8 0f 1f 40 00 0f 1f 44 00 00 48 89 f8 eb 03 48 89 d0 <48> 8b 90 80 00 00 00 48 85 d2 75 f1 c3 66 0f 1f 44 00 00 0f 1f 44
kernel: RSP: 0018:ffff9db20fbcfb90 EFLAGS: 00010202
kernel: RAX: 0000000000000000 RBX: ffff88adf1db0200 RCX: ffff88adf1db02c8
kernel: RDX: ffff88adf47a0000 RSI: 0000000000000000 RDI: 0000000000000000
kernel: RBP: ffff88ae5b93d800 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: ffffea2ba0bcf300 R11: 0000000000000002 R12: ffff88ae5b93d888
kernel: R13: 0000000000000000 R14: ffff88adf1db02c8 R15: dead000000000100
kernel: FS:  00007f70496cd740(0000) GS:ffff88a65fec0000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000080 CR3: 0000000806ef4002 CR4: 00000000007606e0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: PKRU: 55555554

Comment 1 Jeremy Cline 2018-11-15 16:41:47 UTC
Hi Sammy,

It usually takes NVidia a while after a new kernel is released to update their closed source drivers to work with it. Unfortunately, there's nothing we can do about that. You'll need to run 4.18 in the meantime or try to use the nouveau driver.

Comment 2 Sammy 2018-11-16 18:18:52 UTC
Unfortunately nouveau also froze after an hour of using. No major activity other than browser use (firefox, google-chrome). I went back to 4.18.18 and nvidia drivers (410.78) for which everything works fine.

Comment 3 Niklas Fischer 2018-11-22 14:02:32 UTC
*** Bug 1652510 has been marked as a duplicate of this bug. ***

Comment 4 Niklas Fischer 2018-11-22 14:03:45 UTC
It seems there was a patch submitted for this issue on the LKML https://lkml.org/lkml/2018/11/19/93

Comment 5 Bernard 2018-11-23 17:52:41 UTC
Interesting. Even more interesting is that I am personally using and have access to nine other Linux distributions running kernel 4.19, all using NVIDIA drivers, and none of them exhibit this problem, only Fedora 29.

Linux Mint 19 and Ubuntu 18.10 with mainline kernel 4.19.4-041904-generic and NVIDIA 415.13, openSUSE Leap 15.0 with kernel 4.19.3-2.1.gedac906 and NVIDIA 410.78, Sabayon Linux with kernel 4.19.2 and NVIDIA 410.57, Arch Linux and Antergos Linux with kernel 4.19.2-arch1-1-ARCH and NVIDIA 415.18, openSUSE Tumbleweed with kernel 4.19.2-1.3 and NVIDIA 410.78 and Solus 3 with kernel 4.19.1 and NVIDIA 410.73.

Ironically, I've also tested Korora 28, an offshoot of Fedora, using kernel 4.19.3-200.fc28 and NVIDIA 410.78 from RPMFusion, and it worked perfectly. So it seems that only on Fedora 29 have I encountered this problem.

Good that there is a patch, but for those of us who do not compile their own systems, is this problem going to be fixed for Fedora 29? Since kernel 4.18 was just EOLed, that leaves no upgrade path forward.

Comment 6 Sammy 2018-11-24 23:12:47 UTC
Setting nvidia-drm.modeset=0 at the kernel line in grub.cfg results in working system and the bug message goes away as well. Naturally, this is not the optimal solution. Could we add this patch to the Fedora kernel?

I would build the kernel rpm myself with the patch but for some reason when I try to build the kernel it tremendous amount of disk space and fill my root partition. Is there a way to build without this much disk space?

Why was this closed as cantfix?

Comment 7 Sammy 2018-11-24 23:24:09 UTC
Longer thread for the patch:

https://patchwork.kernel.org/patch/10688303/

Comment 8 Bernard 2018-11-26 00:57:12 UTC
I believe it was closed because an earlier responder quickly blamed the NVIDIA drivers as being responsible. While it has often been true that new kernel releases occasionally broke NVIDIA drivers until such time as NVIDIA could produce an updated version, I do believe that the culprit this time is the Linux kernel itself, indicated by the fact that a kernel patch is being worked on. This wouldn't be done for a third-party driver issue, especially not for NVIDIA. There is much hostility for NVIDIA within the open-source community.

I agree that this bug should be reopened. As long as it remains as it is, no other bug can be opened for the same problem as they are then flagged as duplicates. Hopefully the patch for this problem will be eventually added to the mainstream kernel code and show up in a future release.

Comment 9 Wolfgang Ulbrich 2018-11-26 10:18:56 UTC
Please re-open.
Nvidia driver works fine with kernel-4.18.18.
And i don't see the issue in rawhide with kernel-4.20. rc1
So why this should be a problem with nvidia driver?
And using nouveau driver as alternative  is not a solution for newer cards.
With my geforce-1030 card i have serious glitches with nouveau driver.
As fedora maintainer i need a working fedora system.

Btw.
In kernel changelog for 4.19.2 are a few commits related to `drm` or `HDMI` outputs.

Comment 10 Sammy 2018-11-26 13:48:26 UTC
I built a patched kernel 4.19.4 using the above patch and Fedora koji kernel
source rpm. Everything is working well with nvidia-drm.modeset=1.

Comment 11 Jeremy Cline 2018-11-26 15:38:59 UTC
(In reply to Sammy from comment #6)
> Setting nvidia-drm.modeset=0 at the kernel line in grub.cfg results in
> working system and the bug message goes away as well. Naturally, this is not
> the optimal solution. Could we add this patch to the Fedora kernel?

The patch is Cc'd for a future v4.19 stable update. I will add it to the next kernel build (v4.19.5).

> 
> I would build the kernel rpm myself with the patch but for some reason when
> I try to build the kernel it tremendous amount of disk space and fill my
> root partition. Is there a way to build without this much disk space?
> 
> Why was this closed as cantfix?

It was closed because typically, nothing can be done from our side for NVidia driver problems.

Comment 12 Sammy 2018-11-26 16:38:13 UTC
There seems to be a v2 of the patch that is checked to drm-misc-fixes git:

https://cgit.freedesktop.org/drm-misc/commit/?h=drm-misc-fixes&id=afca3f41dc386e9020ab560937d52bb6f19bb6d4

Comment 13 Jeremy Cline 2018-11-26 19:52:57 UTC
*** Bug 1651452 has been marked as a duplicate of this bug. ***

Comment 14 Fedora Update System 2018-11-28 15:10:21 UTC
kernel-headers-4.19.5-300.fc29 kernel-4.19.5-300.fc29 kernel-tools-4.19.5-300.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2018-87ba0312c2

Comment 15 Fedora Update System 2018-11-28 15:13:36 UTC
kernel-headers-4.19.5-200.fc28 kernel-tools-4.19.5-200.fc28 kernel-4.19.5-200.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-3857a8b41a

Comment 16 Fedora Update System 2018-11-29 02:03:32 UTC
kernel-4.19.5-200.fc28, kernel-headers-4.19.5-200.fc28, kernel-tools-4.19.5-200.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-3857a8b41a

Comment 17 Fedora Update System 2018-11-29 03:18:30 UTC
kernel-4.19.5-300.fc29, kernel-headers-4.19.5-300.fc29, kernel-tools-4.19.5-300.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-87ba0312c2

Comment 18 Bernard 2018-11-29 17:25:57 UTC
I'm happy to report that kernel 4.19.5-300.fc29 does indeed seem to fix the problem, at least on my platform. Hopefully the patch will be accepted into the mainstream kernel so that the next iteration does not need to be manually patched. Thank you for helping with this.

Comment 19 Fedora Update System 2018-12-01 02:05:43 UTC
kernel-4.19.5-200.fc28, kernel-headers-4.19.5-200.fc28, kernel-tools-4.19.5-200.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Comment 20 Fedora Update System 2018-12-01 20:41:07 UTC
kernel-4.19.5-300.fc29, kernel-headers-4.19.5-300.fc29, kernel-tools-4.19.5-300.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.