Bug 1806257 - refcount_t: underflow; use-after-free
Summary: refcount_t: underflow; use-after-free
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 31
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-23 11:45 UTC by Sergei LITVINENKO
Modified: 2020-04-22 05:39 UTC (History)
29 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)
dmesg.txt (118.92 KB, text/plain)
2020-02-23 11:45 UTC, Sergei LITVINENKO
no flags Details
nvidia module build log (113.57 KB, text/plain)
2020-03-14 22:05 UTC, Sergei LITVINENKO
no flags Details

Description Sergei LITVINENKO 2020-02-23 11:45:43 UTC
Created attachment 1665189 [details]
dmesg.txt

1. Please describe the problem:

Error message in process of boot system

[   40.918156] ------------[ cut here ]------------
[   40.919555] refcount_t: underflow; use-after-free.
[   40.920923] WARNING: CPU: 1 PID: 1205 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0
[   40.922313] Modules linked in: bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter nct6775 hwmon_vid vfat fat nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(OE) nvidia(POE) intel_rapl_msr intel_rapl_common uvcvideo x86_pkg_temp_thermal videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common intel_powerclamp videodev snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic coretemp ledtrig_audio mc kvm_intel joydev snd_hda_intel kvm snd_intel_dspcfg snd_hda_codec irqbypass snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm crct10dif_pclmul drm_kms_helper crc32_pclmul drm ghash_clmulni_intel snd_timer eeepc_wmi asus_wmi sparse_keymap rfkill video iTCO_wdt intel_cstate iTCO_vendor_support intel_uncore wmi_bmof intel_wmi_thunderbolt snd ipmi_devintf ipmi_msghandler soundcore lpc_ich pcspkr intel_rapl_perf mei_me mei i2c_i801 binfmt_misc ip_tables hid_logitech_hidpp mxm_wmi e1000e crc32c_intel r8169 hid_logitech_dj wmi fuse
[   40.935326] CPU: 1 PID: 1205 Comm: Xorg Tainted: P           OE     5.5.5-200.fc31.x86_64 #1
[   40.937114] Hardware name: ASUS All Series/SABERTOOTH X99, BIOS 3801 08/10/2017
[   40.938921] RIP: 0010:refcount_warn_saturate+0xa6/0xf0
[   40.940723] Code: 05 ee 0e 2e 01 01 e8 ab 95 bc ff 0f 0b c3 80 3d dc 0e 2e 01 00 75 95 48 c7 c7 70 8f 3c b0 c6 05 cc 0e 2e 01 01 e8 8c 95 bc ff <0f> 0b c3 80 3d bb 0e 2e 01 00 0f 85 72 ff ff ff 48 c7 c7 c8 8f 3c
[   40.944525] RSP: 0018:ffffae3f8385bd80 EFLAGS: 00010282
[   40.946459] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   40.948414] RDX: ffffa02ddfc67b80 RSI: ffffa02ddfc59cc8 RDI: ffffa02ddfc59cc8
[   40.950357] RBP: ffffa02dd91b44e8 R08: 0000000000000581 R09: ffffae3f806ff01c
[   40.952306] R10: 0000000000aaaaaa R11: 0000000000000000 R12: ffffa02dc27fe2e8
[   40.954272] R13: ffffa02dc27fe000 R14: 0000000000000008 R15: 0000000000000000
[   40.956232] FS:  00007f1950dbcf00(0000) GS:ffffa02ddfc40000(0000) knlGS:0000000000000000
[   40.958211] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   40.960200] CR2: 0000557eb5c06b78 CR3: 000000080508a005 CR4: 00000000001606e0
[   40.962205] Call Trace:
[   40.964218]  nv_drm_atomic_helper_disable_all+0xec/0x290 [nvidia_drm]
[   40.966278]  nv_drm_master_drop+0x22/0x60 [nvidia_drm]
[   40.968349]  drm_drop_master+0x1e/0x30 [drm]
[   40.970401]  drm_master_release+0x9f/0xb0 [drm]
[   40.972461]  drm_file_free.part.0+0x21d/0x270 [drm]
[   40.974532]  drm_release+0xa7/0xe0 [drm]
[   40.976580]  __fput+0xc1/0x250
[   40.978635]  task_work_run+0x8a/0xb0
[   40.980685]  exit_to_usermode_loop+0x102/0x130
[   40.982736]  do_syscall_64+0x1a4/0x1c0
[   40.984787]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   40.986853] RIP: 0033:0x7f195131b8e7
[   40.988905] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 e3 fb ff ff
[   40.993202] RSP: 002b:00007ffe0e48d3d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[   40.995398] RAX: 0000000000000000 RBX: 0000557eb5be1f20 RCX: 00007f195131b8e7
[   40.997618] RDX: 0000557eb5bf5000 RSI: 0000557eb5be2090 RDI: 000000000000000c
[   40.999782] RBP: 000000000000000c R08: 0000000000000001 R09: 0000557eb5be29f0
[   41.001907] R10: fffffffffffff206 R11: 0000000000000246 R12: 0000557eb5be2090
[   41.004033] R13: 0000557eb5be1f60 R14: 0000000000000000 R15: 0000000000000000
[   41.006117] ---[ end trace 872afc79335796e0 ]---


2. What is the Version-Release number of the kernel:

5.5.5-200.fc31.x86_64


3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

5.4.19-200.fc31.x86_64 is not affected.


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

boot system in default graphic runlevel. I did not found message in case of boot in runlevel 3


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:


6. Are you running any modules that not shipped with directly Fedora's kernel?:

No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

File dmesg.txt is added

Comment 1 Justin M. Forbes 2020-03-03 16:17:52 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 31 kernel bugs.

Fedora 31 has now been rebased to 5.5.7-200.fc31.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 32, and are still experiencing this issue, please change the version to Fedora 32.

If you experience different issues, please open a new bug report for those.

Comment 2 Sergei LITVINENKO 2020-03-03 18:15:58 UTC
Kernel update did not help

[root@homedesk ~]# uname -r
5.5.7-200.fc31.x86_64


[root@homedesk ~]# uname -r
...
[   41.517145] ------------[ cut here ]------------
[   41.518530] refcount_t: underflow; use-after-free.
[   41.519933] WARNING: CPU: 2 PID: 1214 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0
[   41.521357] Modules linked in: bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter nct6775 hwmon_vid vfat fat nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(OE) uvcvideo intel_rapl_msr videobuf2_vmalloc videobuf2_memops intel_rapl_common nvidia(POE) videobuf2_v4l2 videobuf2_common videodev x86_pkg_temp_thermal mc snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio intel_powerclamp snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core coretemp joydev snd_hwdep kvm_intel drm_kms_helper snd_seq kvm eeepc_wmi snd_seq_device snd_pcm asus_wmi drm sparse_keymap irqbypass rfkill snd_timer crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_wmi_thunderbolt wmi_bmof snd ipmi_devintf intel_cstate video iTCO_wdt iTCO_vendor_support ipmi_msghandler intel_uncore soundcore mei_me pcspkr i2c_i801 intel_rapl_perf mei lpc_ich binfmt_misc ip_tables hid_logitech_hidpp mxm_wmi crc32c_intel e1000e r8169 wmi hid_logitech_dj fuse
[   41.534492] CPU: 2 PID: 1214 Comm: Xorg Tainted: P           OE     5.5.7-200.fc31.x86_64 #1
[   41.536307] Hardware name: ASUS All Series/SABERTOOTH X99, BIOS 3801 08/10/2017
[   41.538164] RIP: 0010:refcount_warn_saturate+0xa6/0xf0
[   41.540004] Code: 05 ee 07 2e 01 01 e8 6b 90 bc ff 0f 0b c3 80 3d dc 07 2e 01 00 75 95 48 c7 c7 c0 96 3c a7 c6 05 cc 07 2e 01 01 e8 4c 90 bc ff <0f> 0b c3 80 3d bb 07 2e 01 00 0f 85 72 ff ff ff 48 c7 c7 18 97 3c
[   41.543863] RSP: 0018:ffffb7fb0299bd80 EFLAGS: 00010282
[   41.545824] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   41.547821] RDX: ffff95871fca7bc0 RSI: ffff95871fc99cc8 RDI: ffff95871fc99cc8
[   41.549839] RBP: ffff958713021ce8 R08: 000000000000058a R09: ffffb7fb006ff01c
[   41.551881] R10: 0000000000aaaaaa R11: 0000000000000000 R12: ffff95871267eae8
[   41.553919] R13: ffff95871267e800 R14: 0000000000000008 R15: 0000000000000000
[   41.555958] FS:  00007f7ab1ea1f00(0000) GS:ffff95871fc80000(0000) knlGS:0000000000000000
[   41.558025] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   41.560107] CR2: 0000558fc0273b78 CR3: 000000080311e004 CR4: 00000000001626e0
[   41.562245] Call Trace:
[   41.564375]  nv_drm_atomic_helper_disable_all+0xec/0x290 [nvidia_drm]
[   41.566495]  nv_drm_master_drop+0x22/0x60 [nvidia_drm]
[   41.568568]  drm_drop_master+0x1e/0x30 [drm]
[   41.570625]  drm_master_release+0x9f/0xb0 [drm]
[   41.572669]  drm_file_free.part.0+0x21d/0x270 [drm]
[   41.574634]  drm_release+0xa7/0xe0 [drm]
[   41.576586]  __fput+0xc1/0x250
[   41.578509]  task_work_run+0x8a/0xb0
[   41.580424]  exit_to_usermode_loop+0x102/0x130
[   41.582347]  do_syscall_64+0x1a4/0x1c0
[   41.584241]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   41.586142] RIP: 0033:0x7f7ab24008e7
[   41.588036] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 e3 fb ff ff
[   41.592020] RSP: 002b:00007fffabff7b98 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[   41.594016] RAX: 0000000000000000 RBX: 0000558fc024ef20 RCX: 00007f7ab24008e7
[   41.596011] RDX: 0000558fc0262000 RSI: 0000558fc024f090 RDI: 000000000000000c
[   41.597986] RBP: 000000000000000c R08: 0000000000000001 R09: 0000558fc024f9f0
[   41.599976] R10: fffffffffffff206 R11: 0000000000000246 R12: 0000558fc024f090
[   41.601978] R13: 0000558fc024ef60 R14: 0000000000000000 R15: 0000000000000000
[   41.603971] ---[ end trace 404712e0d1005a69 ]---
[   42.652326] usb 5-1: reset high-speed USB device number 2 using xhci_hcd
[   51.803617] logitech-hidpp-device 0003:046D:4019.0005: HID++ 2.0 device connected.
[   54.061047] br0: port 1(eno1) entered learning state
[   69.420855] br0: port 1(eno1) entered forwarding state
[   69.420869] br0: topology change detected, propagating

Comment 3 Artem Silenkov 2020-03-04 10:19:00 UTC
Could confirm the same crash

We have the same crash on Fedora31 with latest drivers from Fusion repo
https://ask.fedoraproject.org/t/kernel-tainted-after-running-updates/5487/7

The crash was introduced with 5.5.5 upgrade and it's still here on 5.5.6 and 5.5.7-200.fc31.x86_64


[code][ 7.503213] CPU: 1 PID: 414 Comm: plymouthd Tainted: P OE 5.5.6-201.fc31.x86_64 #1
[ 7.503218] Hardware name: Micro-Star International Co., Ltd. PS42 Modern 8RC/MS-14B2, BIOS E14B2IMS.106 12/06/2018
[ 7.503230] RIP: 0010:refcount_warn_saturate+0xa6/0xf0
[ 7.503239] Code: 05 fe 09 2e 01 01 e8 bb 92 bc ff 0f 0b c3 80 3d ec 09 2e 01 00 75 95 48 c7 c7 08 95 3c bb c6 05 dc 09 2e 01 01 e8 9c 92 bc ff <0f> 0b c3 80 3d cb 09 2e 01 00 0f 85 72 ff ff ff 48 c7 c7 60 95 3c
[ 7.503244] RSP: 0018:ffffb290407cbcb8 EFLAGS: 00010286
[ 7.503250] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000007
[ 7.503254] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff97732ec59cc0
[ 7.503258] RBP: ffff9773150874e8 R08: 0000000000000382 R09: 0000000000000003
[ 7.503262] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9773257282e8
[ 7.503265] R13: ffff977325728000 R14: 0000000000000000 R15: ffff977325750a00
[ 7.503272] FS: 00007fe4f52e9f00(0000) GS:ffff97732ec40000(0000) knlGS:0000000000000000
[ 7.503276] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7.503280] CR2: 00007f7ed9315a30 CR3: 0000000464a1e001 CR4: 00000000003606e0
[ 7.503284] Call Trace:
[ 7.503313] nv_drm_atomic_helper_disable_all+0xec/0x290 [nvidia_drm]
[ 7.503333] nv_drm_master_drop+0x22/0x60 [nvidia_drm]
[ 7.503396] drm_drop_master+0x1e/0x30 [drm]
[ 7.503452] drm_dropmaster_ioctl+0x4c/0x90 [drm]
[ 7.503506] ? drm_setmaster_ioctl+0xb0/0xb0 [drm]
[ 7.503565] drm_ioctl_kernel+0xaa/0xf0 [drm]
[ 7.503631] drm_ioctl+0x208/0x390 [drm]
[ 7.503686] ? drm_setmaster_ioctl+0xb0/0xb0 [drm]
[ 7.503701] ? do_filp_open+0xa5/0x100
[ 7.503718] do_vfs_ioctl+0x461/0x6d0
[ 7.503743] ksys_ioctl+0x5e/0x90
[ 7.503756] __x64_sys_ioctl+0x16/0x20
[ 7.503769] do_syscall_64+0x5b/0x1c0
[ 7.503785] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 7.503794] RIP: 0033:0x7fe4f55a738b
[ 7.503802] Code: 0f 1e fa 48 8b 05 fd 9a 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d cd 9a 0c 00 f7 d8 64 89 01 48
[ 7.503806] RSP: 002b:00007ffc4d2ede78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 7.503813] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe4f55a738b
[ 7.503817] RDX: 0000000000000000 RSI: 000000000000641f RDI: 000000000000000b
[ 7.503821] RBP: 000000000000641f R08: 0000555981b9bd50 R09: 00007fe4f56ba380
[ 7.503824] R10: 0000000000000000 R11: 0000000000000246 R12: 0000555981b9bd80
[ 7.503828] R13: 000000000000000b R14: 0000000000000000 R15: 0000000000000000
[ 7.503839] —[ end trace fe605e9abea0643f ]—

[/code]

The good news that this crash is happening only once during initial boot. Everything is working fine after the initial boot.

~/# dnf list installed akmod-nvidia
Installed Packages
akmod-nvidia.x86_64 3:440.59-1.fc31 @rpmfusion-nonfree-updates

And we got crash in nv_drm_atomic_helper_disable_all function during plymouth init or Xorg if plymouth is disabled. 

Related thread on nvidia devs forum 
https://devtalk.nvidia.com/default/topic/1071120/linux/-bug-nvidia-440-64-kernel-5-5-6-stable-boot-trace-was-nvidia-440-59-kernel-5-5-1-stable-boot-trace/2

Comment 4 David Juran 2020-03-06 15:26:08 UTC
Clearing need_info based on Artem's comment

Comment 5 Steve 2020-03-14 18:44:20 UTC
(In reply to Sergei LITVINENKO from comment #0)
...
> [   40.922313] Modules linked in: ... nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(OE) nvidia(POE) ...
...

(In reply to Artem Silenkov from comment #3)
...
> [code][ 7.503213] CPU: 1 PID: 414 Comm: plymouthd Tainted: P OE 5.5.6-201.fc31.x86_64 #1
...

The kernel is tainted with non-Fedora modules. Can you reproduce the problem with all non-Fedora packages removed?

Also, Fedora kernel-5.5.9-200.fc31 is currently in updates-testing:

# dnf update kernel --enablerepo=updates-testing

kernel-5.5.9-200.fc31, kernel-headers-5.5.9-200.fc31, & 1 more 
https://bodhi.fedoraproject.org/updates/FEDORA-2020-90a64eda89

Comment 6 Steve 2020-03-14 19:24:56 UTC
BTW, if anyone wants to test with Fedora pre-release kernels, they are here:

https://bodhi.fedoraproject.org/updates/?packages=kernel

(They have "gitN" in the version string.)

You have to manually install them:

1. Click "Builds".
2. Click the "kernel-..." link.
3. Download kernel-core and kernel-modules (at a minimum) into an empty directory.
4. Install with:

# dnf install kernel*.rpm

Pre-release kernels are not thoroughly tested, so take pertinent precautions.

Comment 7 Sergei LITVINENKO 2020-03-14 22:05:35 UTC
Created attachment 1670204 [details]
nvidia module build log

Comment 8 Sergei LITVINENKO 2020-03-14 22:08:06 UTC
List of components installed, but no luck. nvidia module can't be build



[root@homedesk kernel]# ls -1
kernel-5.6.0-0.rc5.git2.1.fc33.x86_64.rpm
kernel-core-5.6.0-0.rc5.git2.1.fc33.x86_64.rpm
kernel-devel-5.6.0-0.rc5.git2.1.fc33.x86_64.rpm
kernel-modules-5.6.0-0.rc5.git2.1.fc33.x86_64.rpm
kernel-modules-extra-5.6.0-0.rc5.git2.1.fc33.x86_64.rpm
kernel-modules-internal-5.6.0-0.rc5.git2.1.fc33.x86_64.rpm

Comment 9 Steve 2020-03-14 22:20:32 UTC
(In reply to Sergei LITVINENKO from comment #8)
> ... nvidia module can't be build ...

I'm sorry, but I don't know anything about building nvidia software. That's a tech support issue for nvidia. Did you try posting your log on the nvidia web site?

Comment 3 has link at the end that might be useful.

Comment 10 Artem Silenkov 2020-03-15 07:06:30 UTC
Tried 5.6.0-0.rc5.git0.2.fc32.x86_64 #1 SMP Tue Mar 10 19:09:42 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Yes, NVIDIA 440.64 drivers are partly compatible. 
Patch is here https://gitlab.com/snippets/1945940 btw, building atm. 

@steve - we can't reproduce with all proprietory stuff removed, this bug is nvidia+kernel thing. I know we can't report it because of tainted modules but we might try to identify it to help nvidia folks fixing it. Kernel is evolving very fast our days. They keep removing and refactoring a lot of things.

Comment 11 Artem Silenkov 2020-03-15 07:18:46 UTC
No luck, crash is still here on 5.6.0-0.rc5.git0.2.fc32.x86_64 #1

[   10.461553] ------------[ cut here ]------------
[   10.461554] refcount_t: underflow; use-after-free.
[   10.461573] WARNING: CPU: 4 PID: 1183 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0
[   10.461574] Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc vfat fat xfs snd_hda_codec_hdmi snd_soc_skl snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_codec_realtek intel_rapl_msr snd_hda_ext_core intel_rapl_common snd_soc_acpi_intel_match snd_soc_acpi snd_hda_codec_generic ledtrig_audio snd_soc_core x86_pkg_temp_thermal intel_powerclamp snd_compress nvidia_drm(POE) ac97_bus iwlmvm nvidia_modeset(POE) snd_pcm_dmaengine coretemp snd_hda_intel kvm_intel mac80211 nvidia_uvm(OE) snd_intel_dspcfg snd_hda_codec rtsx_usb_sdmmc iTCO_wdt rtsx_usb_ms kvm iTCO_vendor_support libarc4 mmc_core snd_hda_core memstick msi_wmi mei_hdcp mxm_wmi i915 snd_hwdep irqbypass
[   10.461608]  sparse_keymap iwlwifi snd_seq crct10dif_pclmul crc32_pclmul snd_seq_device btusb snd_pcm nvidia(POE) btrtl ghash_clmulni_intel intel_cstate btbcm btintel intel_uncore cec snd_timer bluetooth intel_rapl_perf i2c_algo_bit snd ipmi_devintf cfg80211 i2c_i801 soundcore drm_kms_helper ipmi_msghandler rtsx_usb ecdh_generic joydev drm mei_me intel_xhci_usb_role_switch mei roles rfkill intel_pch_thermal ecc wmi video acpi_pad binfmt_misc ip_tables btrfs blake2b_generic xor zstd_decompress zstd_compress raid6_pq libcrc32c nvme crc32c_intel nvme_core serio_raw fuse
[   10.461633] CPU: 4 PID: 1183 Comm: Xorg Tainted: P           OE     5.6.0-0.rc5.git0.2.fc32.x86_64 #1
[   10.461634] Hardware name: Micro-Star International Co., Ltd. PS42 Modern 8RC/MS-14B2, BIOS E14B2IMS.106 12/06/2018
[   10.461636] RIP: 0010:refcount_warn_saturate+0xa6/0xf0
[   10.461638] Code: 05 d8 66 30 01 01 e8 63 03 bd ff 0f 0b c3 80 3d c6 66 30 01 00 75 95 48 c7 c7 b0 3a 3b a1 c6 05 b6 66 30 01 01 e8 44 03 bd ff <0f> 0b c3 80 3d a5 66 30 01 00 0f 85 72 ff ff ff 48 c7 c7 08 3b 3b
[   10.461640] RSP: 0018:ffff99bd415b3db0 EFLAGS: 00010282
[   10.461641] RAX: 0000000000000026 RBX: 0000000000000000 RCX: 00000000000003ae
[   10.461642] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff89486ed19cc0
[   10.461643] RBP: ffff89486aa35ce8 R08: 00000000000003ae R09: 0000000000000003
[   10.461644] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8948660c5ae8
[   10.461645] R13: ffff8948660c5800 R14: 0000000000000000 R15: dead000000000100
[   10.461647] FS:  00007f127b787f00(0000) GS:ffff89486ed00000(0000) knlGS:0000000000000000
[   10.461648] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.461649] CR2: 0000559649ba89e8 CR3: 000000040d8f0002 CR4: 00000000003606e0
[   10.461650] Call Trace:
[   10.461658]  nv_drm_atomic_helper_disable_all+0xec/0x290 [nvidia_drm]
[   10.461679]  nv_drm_master_drop+0x22/0x60 [nvidia_drm]
[   10.461704]  drm_master_release+0xd1/0x130 [drm]
[   10.461724]  drm_file_free.part.0+0x228/0x280 [drm]
[   10.461744]  drm_release+0xa8/0x120 [drm]
[   10.461748]  __fput+0xc1/0x250
[   10.461752]  task_work_run+0x8a/0xb0
[   10.461756]  prepare_exit_to_usermode+0x1ba/0x1e0
[   10.461760]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   10.461762] RIP: 0033:0x7f127bce68e7
[   10.461764] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 e3 fb ff ff
[   10.461765] RSP: 002b:00007ffebd5ffa98 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[   10.461766] RAX: 0000000000000000 RBX: 0000559649b954f0 RCX: 00007f127bce68e7
[   10.461768] RDX: 0000559649b94ce0 RSI: 0000559649b95660 RDI: 000000000000000c
[   10.461769] RBP: 000000000000000c R08: 0000000000000006 R09: 0000559649b95ea0
[   10.461770] R10: 0000000000000000 R11: 0000000000000246 R12: 0000559649b95660
[   10.461771] R13: 0000559649b95530 R14: 0000000000000000 R15: 0000000000000000
[   10.461773] ---[ end trace 86bd2e7c3aaf4ad1 ]---

I think we should wait a bit until nvidia devs could sort out their forum migrations, it's readonly now.

Comment 12 Sergei LITVINENKO 2020-03-15 09:01:43 UTC
As for me, first we need to find the answer to another question. Is this a new bug or does the new kernel have better diagnostics?
Perhaps the error in the nvidia module is outdated, but due to the lack of an effective diagnosis, it has never been seen.

Comment 13 Steve 2020-03-15 13:58:27 UTC
(In reply to Sergei LITVINENKO from comment #12)

Could you update the bug summary so that it is clear that this involves nvidia code:

"[TAINTED] [nvidia_drm] refcount_t: underflow; use-after-free"

> As for me, first we need to find the answer to another question. Is this a
> new bug or does the new kernel have better diagnostics?
> Perhaps the error in the nvidia module is outdated, but due to the lack of
> an effective diagnosis, it has never been seen.

If you are referring to the "use-after-free" error, the best way to answer your question is to clone the kernel git repo and use git to research the kernel code yourself. Here is what I have installed:

$ rpm -qa git\* | sort
git-2.21.1-1.fc30.x86_64
git-core-2.21.1-1.fc30.x86_64
git-core-doc-2.21.1-1.fc30.noarch
git-gui-2.21.1-1.fc30.noarch
gitk-2.21.1-1.fc30.noarch

See, also:
https://git-scm.com/

As for your specific question, here is what I did:

$ git grep -l use-after-free

Found, among other files:

include/linux/refcount.h

I have a shallow git repo, which lacks a complete history, so here is a link to the history of refcount.h in the online kernel git repo:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/include/linux/refcount.h

Comment 14 Steve 2020-03-15 14:38:12 UTC
(In reply to Artem Silenkov from comment #10)
...
> @steve - we can't reproduce with all proprietory stuff removed, this bug is nvidia+kernel thing.

nvidia is in the best position to debug their drivers.

> I know we can't report it because of tainted modules
> but we might try to identify it to help nvidia folks fixing it.

OK, you can test with vanilla (non-Fedora) kernels built as RPM packages:

Kernel Vanilla Repositories
https://fedoraproject.org/wiki/Kernel_Vanilla_Repositories

Those appear to be getting updated, and there are a lot of variants. I suggest the latest from kernel-vanilla-mainline:
https://repos.fedorapeople.org/repos/thl/

> Kernel is evolving very fast our days. They keep removing and refactoring a lot of things.

True enough:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/

Comment 15 Steve 2020-03-15 15:46:09 UTC
[   10.461554] refcount_t: underflow; use-after-free.

Here is some technical documentation on that kernel feature:

refcount_t API compared to atomic_t
https://www.kernel.org/doc/html/latest/core-api/refcount-vs-atomic.html

The API itself is documented in include/linux/refcount.h:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/refcount.h

Comment 16 Robert 2020-03-24 09:34:36 UTC
Same here on a Lenovo E93Z AIO. Whenever it happens my second display blanks for a moment.

$ uname -r
5.5.10-200.fc31.x86_64

$ inxi -Gx
Graphics:  Device-1: Intel Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics vendor: Lenovo driver: i915 
           v: kernel bus ID: 00:02.0 
           Device-2: NVIDIA GF117M [GeForce 610M/710M/810M/820M / GT 620M/625M/630M/720M] vendor: Lenovo driver: nvidia 
           v: 390.132 bus ID: 01:00.0 
           Display: x11 server: Fedora Project X.org 1.20.6 driver: modesetting,nvidia unloaded: fbdev,nouveau,vesa 
           resolution: 1920x1080~60Hz, 1920x1200~60Hz 
           OpenGL: renderer: GeForce GT 720A/PCIe/SSE2 v: 4.6.0 NVIDIA 390.132 direct render: Yes

akmod-nvidia-390xx-390.132-4.fc31.x86_64
kmod-nvidia-390xx-5.5.10-200.fc31.x86_64-390.132-4.fc31.x86_64
kmod-nvidia-390xx-390.132-4.fc31.x86_64

[   10.357250] ------------[ cut here ]------------
[   10.357251] refcount_t: underflow; use-after-free.
[   10.357264] WARNING: CPU: 0 PID: 439 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0
[   10.357264] Modules linked in: intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mei_wdt mei_hdcp kvm iTCO_wdt iTCO_vendor_support uvcvideo irqbypass iwldvm rtsx_usb_ms videobuf2_vmalloc videobuf2_memops memstick videobuf2_v4l2 mac80211 videobuf2_common crct10dif_pclmul videodev crc32_pclmul btusb mc libarc4 btrtl btbcm btintel bluetooth ghash_clmulni_intel intel_cstate joydev snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi ecdh_generic iwlwifi ecc snd_hda_intel intel_uncore intel_rapl_perf nvidia_drm(POE) snd_intel_dspcfg nvidia_modeset(POE) snd_hda_codec snd_hda_core cfg80211 snd_hwdep nvidia_uvm(POE) snd_seq wmi_bmof snd_seq_device nvidia(POE) snd_pcm asus_wmi pcspkr sparse_keymap i2c_i801 rfkill snd_timer lpc_ich mei_me snd ipmi_devintf mei ipmi_msghandler soundcore vhba(OE) vboxnetadp(OE) vboxnetflt(OE) binfmt_misc vboxdrv(OE) ip_tables i915 uas usb_storage crc32c_intel i2c_algo_bit drm_kms_helper drm e1000e wmi
[   10.357290]  rtsx_usb_sdmmc mmc_core video rtsx_usb hid_multitouch hid_lenovo fuse
[   10.357293] CPU: 0 PID: 439 Comm: plymouthd Tainted: P           OE     5.5.10-200.fc31.x86_64 #1
[   10.357294] Hardware name: LENOVO 10BA003MMB/SHARKBAY, BIOS FFKT49AUS 06/29/2018
[   10.357296] RIP: 0010:refcount_warn_saturate+0xa6/0xf0
[   10.357297] Code: 05 2e 03 2e 01 01 e8 7b 8e bc ff 0f 0b c3 80 3d 1c 03 2e 01 00 75 95 48 c7 c7 18 99 3c ad c6 05 0c 03 2e 01 01 e8 5c 8e bc ff <0f> 0b c3 80 3d fb 02 2e 01 00 0f 85 72 ff ff ff 48 c7 c7 70 99 3c
[   10.357297] RSP: 0018:ffffba51c046fcb8 EFLAGS: 00010286
[   10.357298] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000007
[   10.357299] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff8f921ea19cc0
[   10.357299] RBP: ffff8f921470dc28 R08: 00000000000003b0 R09: 0000000000000003
[   10.357300] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8f921cff2ae8
[   10.357300] R13: ffff8f921cff2800 R14: 0000000000000000 R15: ffff8f9217397e00
[   10.357301] FS:  00007f39022b9f00(0000) GS:ffff8f921ea00000(0000) knlGS:0000000000000000
[   10.357302] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.357302] CR2: 00007f761e170be0 CR3: 000000041466c002 CR4: 00000000001606f0
[   10.357303] Call Trace:
[   10.357308]  nv_drm_atomic_helper_disable_all+0xec/0x290 [nvidia_drm]
[   10.357312]  nv_drm_master_drop+0x22/0x60 [nvidia_drm]
[   10.357326]  drm_drop_master+0x1e/0x30 [drm]
[   10.357333]  drm_dropmaster_ioctl+0x4c/0x90 [drm]
[   10.357339]  ? drm_setmaster_ioctl+0xb0/0xb0 [drm]
[   10.357346]  drm_ioctl_kernel+0xaa/0xf0 [drm]
[   10.357354]  drm_ioctl+0x208/0x390 [drm]
[   10.357360]  ? drm_setmaster_ioctl+0xb0/0xb0 [drm]
[   10.357362]  ? do_filp_open+0xa5/0x100
[   10.357364]  do_vfs_ioctl+0x461/0x6d0
[   10.357366]  ksys_ioctl+0x5e/0x90
[   10.357367]  __x64_sys_ioctl+0x16/0x20
[   10.357369]  do_syscall_64+0x5b/0x1c0
[   10.357372]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   10.357374] RIP: 0033:0x7f390257738b
[   10.357375] Code: 0f 1e fa 48 8b 05 fd 9a 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d cd 9a 0c 00 f7 d8 64 89 01 48
[   10.357375] RSP: 002b:00007ffdbc472b98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   10.357376] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f390257738b
[   10.357377] RDX: 0000000000000000 RSI: 000000000000641f RDI: 000000000000000b
[   10.357377] RBP: 000000000000641f R08: 000055ba59f02540 R09: 00007f390268a380
[   10.357377] R10: 0000000000000000 R11: 0000000000000246 R12: 000055ba59f04160
[   10.357378] R13: 000000000000000b R14: 0000000000000000 R15: 0000000000000000
[   10.357379] ---[ end trace 7937ac512bd46031 ]---

Comment 17 Daiver 2020-03-28 15:36:18 UTC
inxi -Gx
Graphics:  Device-1: NVIDIA TU106 [GeForce RTX 2060 Rev. A] vendor: ASUSTeK driver: nvidia v: 440.64 bus ID: 01:00.0 
           Display: server: Fedora Project X.org 1.20.6 driver: nvidia resolution: 3840x2160~60Hz 
           OpenGL: renderer: GeForce RTX 2060/PCIe/SSE2 v: 4.6.0 NVIDIA 440.64 direct render: Yes

uname -r
5.5.11-200.fc31.x86_64

[  128.081419] ------------[ cut here ]------------
[  128.081420] refcount_t: underflow; use-after-free.
[  128.081429] WARNING: CPU: 1 PID: 8484 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0
[  128.081430] Modules linked in: nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) drm_kms_helper drm ipmi_devintf ipmi_msghandler vfio_iommu_type1 vfio ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi kvm irqbypass iTCO_wdt iTCO_vendor_support mei_hdcp crct10dif_pclmul snd_hda_intel crc32_pclmul snd_intel_dspcfg snd_hda_codec ghash_clmulni_intel intel_cstate btusb snd_hda_core intel_uncore btrtl snd_hwdep btbcm eeepc_wmi btintel snd_seq asus_wmi snd_seq_device sparse_keymap intel_rapl_perf intel_wmi_thunderbolt wmi_bmof pcspkr i2c_i801 bluetooth
[  128.081445]  snd_pcm snd_timer snd mei_me ecdh_generic rfkill joydev ecc mei soundcore ie31200_edac acpi_pad acpi_tad ip_tables uas hid_logitech_hidpp usb_storage mxm_wmi e1000e crc32c_intel nvme nvme_core wmi video pinctrl_cannonlake pinctrl_intel hid_logitech_dj fuse [last unloaded: ipmi_msghandler]
[  128.081451] CPU: 1 PID: 8484 Comm: Xorg Tainted: P           OE     5.5.11-200.fc31.x86_64 #1
[  128.081452] Hardware name: System manufacturer System Product Name/PRIME Z390-A, BIOS 1401 11/26/2019
[  128.081453] RIP: 0010:refcount_warn_saturate+0xa6/0xf0
[  128.081454] Code: 05 1e 02 2e 01 01 e8 6b 8d bc ff 0f 0b c3 80 3d 0c 02 2e 01 00 75 95 48 c7 c7 68 99 3c 82 c6 05 fc 01 2e 01 01 e8 4c 8d bc ff <0f> 0b c3 80 3d eb 01 2e 01 00 0f 85 72 ff ff ff 48 c7 c7 c0 99 3c
[  128.081454] RSP: 0018:ffffa72780767d80 EFLAGS: 00010282
[  128.081455] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000007
[  128.081455] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff941f5ea59cc0
[  128.081456] RBP: ffff941ecc567ce8 R08: 000000000000041c R09: 0000000000000001
[  128.081456] R10: 0000000000000000 R11: 0000000000000001 R12: ffff941ef68742e8
[  128.081456] R13: ffff941ef6874000 R14: 0000000000000008 R15: 0000000000000000
[  128.081457] FS:  00007f19c0f21f00(0000) GS:ffff941f5ea40000(0000) knlGS:0000000000000000
[  128.081457] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  128.081458] CR2: 000055e9f779c318 CR3: 00000003c1b76003 CR4: 00000000003606e0
[  128.081458] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  128.081458] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  128.081459] Call Trace:
[  128.081463]  nv_drm_atomic_helper_disable_all+0xec/0x290 [nvidia_drm]
[  128.081466]  ? nv_drm_master_drop+0x22/0x60 [nvidia_drm]
[  128.081475]  ? drm_drop_master+0x1e/0x30 [drm]
[  128.081479]  ? drm_master_release+0x9f/0xb0 [drm]
[  128.081483]  ? drm_file_free.part.0+0x21d/0x270 [drm]
[  128.081487]  ? drm_release+0xa7/0xe0 [drm]
[  128.081489]  ? __fput+0xc1/0x250
[  128.081491]  ? task_work_run+0x8a/0xb0
[  128.081492]  ? exit_to_usermode_loop+0x102/0x130
[  128.081493]  ? do_syscall_64+0x1a6/0x1c0
[  128.081495]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  128.081497] ---[ end trace 894678ced787e8b3 ]---

Comment 18 Heiko 2020-04-05 05:56:27 UTC
Hi, I am experiencing the same issue since a couple of recent kernel versions.

$ inxi -Gx
Graphics:  Device-1: NVIDIA GK208B [GeForce GT 730] vendor: ASUSTeK driver: nvidia v: 440.64 bus ID: 2e:00.0 
           Display: x11 server: Fedora Project X.org 1.20.6 driver: nvidia unloaded: fbdev,modesetting,vesa 
           resolution: 1280x1024~60Hz, 1920x1080~60Hz 
           OpenGL: renderer: GeForce GT 730/PCIe/SSE2 v: 4.6.0 NVIDIA 440.64 direct render: Yes 

$ uname -r
5.5.13-200.fc31.x86_64

 kernel: ------------[ cut here ]------------
 kernel: refcount_t: underflow; use-after-free.
 kernel: WARNING: CPU: 7 PID: 2465 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0
 kernel: Modules linked in: vboxsf vboxguest vboxvideo drm_vram_helper drm_ttm_helper ttm rfkill ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(OE) snd_hda_codec_hdmi edac_mce_amd snd_hda_codec_realtek kvm_amd ppdev snd_hda_codec_generic ledtrig_audio kvm nvidia(POE) uvcvideo snd_hda_intel irqbypass videobuf2_vmalloc snd_intel_dspcfg videobuf2_memops snd_hda_codec snd_usb_audio videobuf2_v4l2 videobuf2_common snd_hda_core snd_usbmidi_lib crct10dif_pclmul crc32_pclmul drm_kms_helper videodev snd_hwdep snd_seq snd_rawmidi ghash_clmulni_intel drm snd_seq_device mc snd_pcm ipmi_devintf sp5100_tco ipmi_msghandler
 kernel: i2c_piix4 snd_timer wmi_bmof k10temp pcspkr snd soundcore ccp parport_pc parport gpio_amdpt gpio_generic acpi_cpufreq vboxnetadp(OE) vboxnetflt(OE) binfmt_misc vboxdrv(OE) ip_tables raid1 mxm_wmi crc32c_intel r8169 uas usb_storage wmi pinctrl_amd fuse
 kernel: CPU: 7 PID: 2465 Comm: Xorg Tainted: P           OE     5.5.13-200.fc31.x86_64 #1
 kernel: Hardware name: Micro-Star International Co., Ltd. MS-7A33/X370 SLI PLUS (MS-7A33), BIOS 3.JR 11/29/2019
 kernel: RIP: 0010:refcount_warn_saturate+0xa6/0xf0
 kernel: Code: 05 3e fe 2d 01 01 e8 0b 8a bc ff 0f 0b c3 80 3d 2c fe 2d 01 00 75 95 48 c7 c7 08 9a 3c 97 c6 05 1c fe 2d 01 01 e8 ec 89 bc ff <0f> 0b c3 80 3d 0b fe 2d 01 00 0f 85 72 ff ff ff 48 c7 c7 60 9a 3c
 kernel: RSP: 0018:ffffc0e600717d80 EFLAGS: 00010282
 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000007
 kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff9b934e9d9cc0
 kernel: RBP: ffff9b9344e05ce8 R08: 000000000000049c R09: ffffc0e600aff01c
 kernel: R10: 0000000000aaaaaa R11: 0000000000000000 R12: ffff9b9347264ae8
 kernel: R13: ffff9b9347264800 R14: 0000000000000008 R15: 0000000000000000
 kernel: FS:  00007ff34df2ef00(0000) GS:ffff9b934e9c0000(0000) knlGS:0000000000000000
 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: CR2: 000055cde943eb48 CR3: 00000003c855e000 CR4: 00000000003406e0
 kernel: Call Trace:
 kernel: nv_drm_atomic_helper_disable_all+0xec/0x290 [nvidia_drm]
 kernel: nv_drm_master_drop+0x22/0x60 [nvidia_drm]
 kernel: drm_drop_master+0x1e/0x30 [drm]
 kernel: drm_master_release+0x9f/0xb0 [drm]
 kernel: drm_file_free.part.0+0x21d/0x270 [drm]
 kernel: drm_release+0xa7/0xe0 [drm]
 kernel: __fput+0xc1/0x250
 kernel: task_work_run+0x8a/0xb0
 kernel: exit_to_usermode_loop+0x102/0x130
 kernel: do_syscall_64+0x1a6/0x1c0
 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
 kernel: RIP: 0033:0x7ff34e48d8e7
 kernel: Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 e3 fb ff ff
 kernel: RSP: 002b:00007ffea8e53ba8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
 kernel: RAX: 0000000000000000 RBX: 000055cde941a580 RCX: 00007ff34e48d8e7
 kernel: RDX: 000055cde9417620 RSI: 000055cde941a6f0 RDI: 000000000000000c
 kernel: RBP: 000000000000000c R08: 0000000000000001 R09: 000055cde941a970
 kernel: R10: fffffffffffff206 R11: 0000000000000246 R12: 000055cde941a6f0
 kernel: R13: 000055cde941a5c0 R14: 0000000000000000 R15: 0000000000000000
 kernel: ---[ end trace ccfd00bf51ecd556 ]---

Comment 19 Michael 2020-04-07 11:05:27 UTC
Same here on my Desktop.

$ uname -r
5.6.2-300.fc32.x86_64

$ inxi -Gx
Graphics:  Device-1: NVIDIA GK208 [GeForce GT 630 Rev. 2] vendor: Palit Microsystems driver: nvidia v: 440.64 bus ID: 01:00.0 
           Display: server: Fedora Project X.org 1.20.8 driver: nvidia unloaded: fbdev,modesetting,nouveau,vesa 
           resolution: 1920x1080~60Hz 
           OpenGL: renderer: GeForce GT 630/PCIe/SSE2 v: 4.6.0 NVIDIA 440.64 direct render: Yes

$ dmesg | sed -n -e '/cut here/,/end trace/ p'
[   15.381590] ------------[ cut here ]------------
[   15.381697] refcount_t: underflow; use-after-free.
[   15.381808] WARNING: CPU: 2 PID: 1007 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0
[   15.381938] Modules linked in: rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter zstd zstd_compress zstd_decompress sunrpc nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(OE) snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel nvidia(POE) edac_mce_amd kvm_amd snd_intel_dspcfg ccp snd_hda_codec kvm snd_hda_core snd_hwdep snd_seq irqbypass snd_seq_device snd_pcm joydev drm_kms_helper wmi_bmof snd_timer snd k10temp pcspkr ipmi_devintf ipmi_msghandler soundcore sp5100_tco i2c_piix4 acpi_cpufreq drm ip_tables uas usb_storage ata_generic serio_raw pata_acpi pata_atiixp atl1c wmi fuse
[   15.382642] CPU: 2 PID: 1007 Comm: Xorg Tainted: P           OE     5.6.2-300.fc32.x86_64 #1
[   15.382772] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./960GM-VGS3 FX, BIOS P1.40 07/23/2015
[   15.382904] RIP: 0010:refcount_warn_saturate+0xa6/0xf0
[   15.382991] Code: 05 58 63 30 01 01 e8 e0 fd bc ff 0f 0b c3 80 3d 46 63 30 01 00 75 95 48 c7 c7 b8 d9 3a b2 c6 05 36 63 30 01 01 e8 c1 fd bc ff <0f> 0b c3 80 3d 25 63 30 01 00 0f 85 72 ff ff ff 48 c7 c7 10 da 3a
[   15.383175] RSP: 0018:ffffb671020b7db0 EFLAGS: 00010282
[   15.383260] RAX: 0000000000000026 RBX: 0000000000000000 RCX: 0000000000000007
[   15.383347] RDX: 00000000fffffff8 RSI: 0000000000000082 RDI: ffff8de553c99cc0
[   15.383434] RBP: ffff8de52dd9d4e8 R08: 00000000000003bb R09: 0000000000000019
[   15.383521] R10: 000000000000072e R11: 0000000000000000 R12: ffff8de54c0bcae8
[   15.383609] R13: ffff8de54c0bc800 R14: 0000000000000008 R15: 0000000000000000
[   15.383697] FS:  00007fded6389f00(0000) GS:ffff8de553c80000(0000) knlGS:0000000000000000
[   15.383826] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   15.383911] CR2: 000055c1e078a9d8 CR3: 000000030d71e000 CR4: 00000000000006e0
[   15.383999] Call Trace:
[   15.384088]  nv_drm_atomic_helper_disable_all+0xec/0x290 [nvidia_drm]
[   15.384178]  nv_drm_master_drop+0x22/0x60 [nvidia_drm]
[   15.384293]  drm_master_release+0xd0/0x130 [drm]
[   15.384389]  drm_file_free.part.0+0x229/0x290 [drm]
[   15.384485]  drm_release+0xa7/0x110 [drm]
[   15.384570]  __fput+0xc1/0x250
[   15.384654]  task_work_run+0x8a/0xb0
[   15.384739]  prepare_exit_to_usermode+0x198/0x1c0
[   15.384825]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   15.384911] RIP: 0033:0x7fded68f7a17
[   15.384994] Code: 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 f3 fb ff ff
[   15.385177] RSP: 002b:00007ffe953b24a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[   15.385306] RAX: 0000000000000000 RBX: 000055c1e0767560 RCX: 00007fded68f7a17
[   15.385393] RDX: 000055c1e074c010 RSI: 000055c1e07676d0 RDI: 000000000000000c
[   15.385480] RBP: 000000000000000c R08: 000055c1e07799e0 R09: 00007fded68dda40
[   15.385567] R10: fffffffffffff26e R11: 0000000000000246 R12: 000055c1e07676d0
[   15.385654] R13: 000055c1e07675a0 R14: 0000000000000000 R15: 0000000000000000
[   15.385745] ---[ end trace 3f53978e00a7cb4f ]---

Comment 20 Patrick Dung 2020-04-08 12:37:38 UTC
I had upgraded to F32 beta (kernel 5.6.2) and use the Nvidia driver (440.82) released on 7-April, it is fine now.

Comment 21 Christoph Maser 2020-04-15 06:56:20 UTC
1st impression: Nvidia driver 440.82 seems to fix the issue also on F31 (5.5.15-200.fc31.x86_64)

Comment 22 Sergei LITVINENKO 2020-04-16 15:33:06 UTC
issue is looks as fixed for f31

5.5.16-200.fc31.x86_64
akmod-nvidia-440.82-1.fc31.x86_64

kmod-nvidia-5.5.16-200.fc31.x86_64-440.82-1.fc31.x86_64

Comment 23 Robert 2020-04-16 16:07:09 UTC
The issue is NOT solved, at least not here. I got another underflow; use-after-free error after upgrading.

kernel-5.5.16-200.fc31.x86_64
akmod-nvidia-390xx-390.132-4.fc31.x86_64

Comment 24 Ioannis Panteleakis 2020-04-16 18:40:33 UTC
(In reply to Robert from comment #23)
> The issue is NOT solved, at least not here. I got another underflow;
> use-after-free error after upgrading.
> 
> kernel-5.5.16-200.fc31.x86_64
> akmod-nvidia-390xx-390.132-4.fc31.x86_64

The issue is solved by upgrading to the 440.82 nvidia drivers. Hopefully nvidia will provide an update for the 390 series of the drivers fixing the issue for the 5.5+ kernels (although, kinda unlikely imo).

Comment 25 Artem Silenkov 2020-04-16 20:58:11 UTC
Could confirm, it is fixed for me with the latest fedora updates and rpm-fusion binaries. 

artem  mylaptop  ~  %  uname -a 
Linux mylaptop 5.5.16-200.fc31.x86_64 #1 SMP Wed Apr 8 16:43:33 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
 artem  mylaptop  ~  %  modinfo nvidia-drm
filename:       /lib/modules/5.5.16-200.fc31.x86_64/extra/nvidia/nvidia-drm.ko
version:        440.82
supported:      external
license:        MIT
srcversion:     D65980CAE08DFB7B82FE1D1
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:        drm,drm_kms_helper,nvidia-modeset
retpoline:      Y
name:           nvidia_drm
vermagic:       5.5.16-200.fc31.x86_64 SMP mod_unload 
parm:           modeset:Enable atomic kernel modesetting (1 = enable, 0 = disable (default)) (bool)

Comment 26 Heiko 2020-04-17 04:35:36 UTC
Works now for me as well using the latest nvidia driver and kernel.

$ uname -r
5.5.16-200.fc31.x86_64

$ inxi -Gx
Graphics:  Device-1: NVIDIA GK208B [GeForce GT 730] vendor: ASUSTeK driver: nvidia v: 440.82 bus ID: 2e:00.0 
           Display: x11 server: Fedora Project X.org 1.20.6 driver: nvidia unloaded: fbdev,modesetting,vesa 
           resolution: 1280x1024~60Hz, 1920x1080~60Hz 
           OpenGL: renderer: GeForce GT 730/PCIe/SSE2 v: 4.6.0 NVIDIA 440.82 direct render: Yes

Comment 27 Sergei LITVINENKO 2020-04-22 05:39:26 UTC
Good,

That was the answer to my question.

The problem is on the side of NVIDIA, and the new core provides better diagnostics.

Nothing for kernel developers to do.


Note You need to log in before you can comment on or make changes to this bug.