Bug 2012882 - WARNING: CPU: 1 PID: 407 at drivers/gpu/drm/ttm/ttm_bo.c:409 ttm_bo_release+0x2d2/0x300 [ttm] [amdgpu]
Summary: WARNING: CPU: 1 PID: 407 at drivers/gpu/drm/ttm/ttm_bo.c:409 ttm_bo_release+0...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 36
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL: https://retrace.fedoraproject.org/faf...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-11 14:17 UTC by Dominik 'Rathann' Mierzejewski
Modified: 2022-11-07 13:03 UTC (History)
19 users (show)

Fixed In Version: kernel-6.0.5-200.fc36
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-07 13:03:06 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
kernel-5.14.9 dmesg (journalctl -b0 --no-hostname --output=short-monotonic -k) (106.63 KB, text/plain)
2021-10-11 14:17 UTC, Dominik 'Rathann' Mierzejewski
no flags Details


Links
System ID Private Priority Status Summary Last Updated
freedesktop.org Gitlab drm amd issues 1813 0 None opened [5.14 regression] WARNING: CPU: 1 PID: 408 at drivers/gpu/drm/ttm/ttm_bo.c:409 ttm_bo_release+0x2d2/0x300 [ttm] 2022-05-31 09:24:36 UTC

Description Dominik 'Rathann' Mierzejewski 2021-10-11 14:17:07 UTC
Created attachment 1831891 [details]
kernel-5.14.9 dmesg (journalctl -b0 --no-hostname --output=short-monotonic -k)

1. Please describe the problem:
Since upgrading to 5.14.9-200.fc34, I'm getting this WARNING on every boot.

2. What is the Version-Release number of the kernel:
5.14.9-200.fc34

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
Yes. No WARNING with 5.13.x and earlier kernels. I haven't tried earlier 5.14.x koji kernels yet.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
Yes, it happens on every boot.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
Unknown, I haven't tried yet.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.
Attached.

Additional info:
This looks similar to bug 1985880, but the stack trace after task_work_run:

WARNING: CPU: 1 PID: 407 at drivers/gpu/drm/ttm/ttm_bo.c:409 ttm_bo_release+0x2d2/0x300 [ttm]
Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables rfkill nfnetlink ip6table_filter ip6_tables iptable_filter drivetemp f71882fg sunrpc intel_rapl_msr intel_rapl_common vfat fat x86_pkg_temp_thermal pktcdvd intel_powerclamp coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio mei_hdcp snd_usb_audio snd_hda_intel at24 snd_intel_dspcfg iTCO_wdt intel_pmc_bxt snd_intel_sdw_acpi iTCO_vendor_support snd_hda_codec snd_usbmidi_lib irqbypass rapl intel_cstate snd_hda_core snd_hwdep snd_rawmidi snd_seq intel_uncore uvcvideo snd_seq_device snd_pcm videobuf2_vmalloc videobuf2_memops mxm_wmi videobuf2_v4l2 videobuf2_common snd_timer mei_me videodev snd
 mc joydev i2c_i801 mei soundcore i2c_smbus lpc_ich binfmt_misc zram ip_tables hid_logitech_hidpp hid_jabra hid_logitech_dj r8152 mii amdgpu i915 iommu_v2 gpu_sched drm_ttm_helper i2c_algo_bit ttm crct10dif_pclmul crc32_pclmul crc32c_intel drm_kms_helper ghash_clmulni_intel uas e1000e usb_storage cec drm wmi video i2c_dev fuse
CPU: 1 PID: 407 Comm: plymouthd Tainted: G        W         5.14.9-200.fc34.x86_64 #1
Hardware name: MSI MS-7751/Z77A-GD65 (MS-7751), BIOS V10.11 10/09/2013
RIP: 0010:ttm_bo_release+0x2d2/0x300 [ttm]
Code: 8d b6 b8 fe ff ff e8 dd ea dd ff 49 8b 76 08 48 89 ef e8 91 21 00 00 49 8b 7e 98 e9 6f fd ff ff e8 93 82 19 cc e9 a4 fd ff ff <0f> 0b e9 4f fd ff ff e8 c2 80 19 cc e9 f6 fe ff ff be 03 00 00 00
RSP: 0018:ffffa95540453d10 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffffa95540453d58 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8a09cb4b79b8
RBP: ffff8a09cb545288 R08: ffff8a09cb4b79b8 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8a09c9682000
R13: ffff8a09cb4b7858 R14: ffff8a09cb4b79b8 R15: ffff8a09c05484b8
FS:  0000000000000000(0000) GS:ffff8a0cdf680000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4840aeb000 CR3: 00000003a7c10005 CR4: 00000000001706e0
Call Trace:
 amdgpu_bo_unref+0x1a/0x30 [amdgpu]
 amdgpu_gem_object_free+0x20/0x30 [amdgpu]
 drm_gem_object_release_handle+0x6b/0x80 [drm]
 ? drm_gem_object_handle_put_unlocked+0xd0/0xd0 [drm]
 idr_for_each+0x4e/0xc0
 drm_gem_release+0x1c/0x30 [drm]
 drm_file_free.part.0+0x1e3/0x250 [drm]
 drm_release+0x65/0x110 [drm]
 __fput+0x94/0x240
 task_work_run+0x65/0xa0
 do_exit+0x33d/0xa90
 ? __audit_syscall_entry+0x100/0x130
 do_group_exit+0x33/0xa0
 __x64_sys_exit_group+0x14/0x20
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f4841954021
Code: Unable to access opcode bytes at RIP 0x7f4841953ff7.
RSP: 002b:00007ffd554cc208 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007f4841a4c470 RCX: 00007f4841954021
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffffffffff88 R09: 0000000000000001
R10: 00007f484189a468 R11: 0000000000000246 R12: 00007f4841a4c470
R13: 0000000000000001 R14: 00007f4841a4c948 R15: 0000000000000000

Comment 1 Dominik 'Rathann' Mierzejewski 2021-10-21 21:40:31 UTC
Still reproducible on F35 kernel 5.14.14-300.fc35

Comment 2 Dominik 'Rathann' Mierzejewski 2021-10-27 11:59:43 UTC
Note that kernel is "tainted" only because I'm getting hit by bug 1985090 on ever boot as well.

Comment 3 Dominik 'Rathann' Mierzejewski 2021-12-02 15:47:23 UTC
5.15.4 is still showing the issue, but it is no longer reproducible with 5.15.6 (I haven't tested 5.15.5).

Comment 4 Dominik 'Rathann' Mierzejewski 2022-05-31 09:24:02 UTC
Still reproducible on F36 with kernel 5.17.11-300.fc36.x86_64.

Comment 5 Dominik 'Rathann' Mierzejewski 2022-06-13 09:11:11 UTC
I think I forgot to mention that this is on a TAHITI Pro GPU that is driven by radeon module by default, but I enabled si_support in amdgpu module instead:
$ cat /etc/modprobe.d/amdgpu.conf 
blacklist radeon
options amdgpu si_support=1
options amdgpu cik_support=1
options amdgpu hw_i2c=1

Still reproducible with 5.17.14-300.fc36.x86_64:

[   10.559066] ------------[ cut here ]------------
[   10.559069] WARNING: CPU: 2 PID: 412 at drivers/gpu/drm/ttm/ttm_bo.c:411 ttm_bo_release+0x34d/0x370 [ttm]
[   10.559078] Modules linked in: nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security rfkill ip_set nf_tables nfnetlink ip6table_filter iptable_filter drivetemp f71882fg sunrpc binfmt_misc vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp at24 kvm_intel mei_hdcp iTCO_wdt intel_pmc_bxt mei_pxp iTCO_vendor_support pktcdvd kvm snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi irqbypass rapl snd_hda_intel snd_intel_dspcfg snd_usb_audio snd_intel_sdw_acpi snd_hda_codec snd_usbmidi_lib intel_cstate snd_hda_core snd_rawmidi snd_hwdep uvcvideo intel_uncore videobuf2_vmalloc snd_seq videobuf2_memops mxm_wmi videobuf2_v4l2 snd_seq_device videobuf2_common snd_pcm joydev videodev snd_timer mei_me snd mc i2c_i801 soundcore lpc_ich
[   10.559112]  i2c_smbus mei zram hid_logitech_hidpp amdgpu hid_logitech_dj hid_jabra i915 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel e1000e uas usb_storage iommu_v2 gpu_sched drm_ttm_helper ttm wmi video r8152 mii ip6_tables ip_tables ipmi_devintf ipmi_msghandler fuse i2c_dev
[   10.559126] CPU: 2 PID: 412 Comm: plymouthd Not tainted 5.17.14-300.fc36.x86_64 #1
[   10.559128] Hardware name: MSI MS-7751/Z77A-GD65 (MS-7751), BIOS V10.11 10/09/2013
[   10.559129] RIP: 0010:ttm_bo_release+0x34d/0x370 [ttm]
[   10.559134] Code: 00 e8 97 47 49 d3 48 8b 43 e8 eb a8 be 03 00 00 00 e8 e7 eb 21 d3 e9 96 fd ff ff e8 2d 26 49 d3 e9 8c fd ff ff 48 89 e8 eb 8a <0f> 0b e9 d6 fc ff ff e8 17 26 49 d3 e9 dd fe ff ff be 03 00 00 00
[   10.559136] RSP: 0018:ffffb735c18e3cf8 EFLAGS: 00010202
[   10.559137] RAX: 0000000000000001 RBX: ffff9beb6dc211b8 RCX: 0000000000000000
[   10.559139] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9beb6dc211b8
[   10.559139] RBP: ffff9beb6ed25280 R08: 0000000000000000 R09: 000000008040003f
[   10.559140] R10: ffff9beb6d08cfc0 R11: 0000000000000000 R12: ffff9beb6dc21058
[   10.559141] R13: 0000000000000001 R14: ffff9be840d0de40 R15: ffff9be843eec700
[   10.559142] FS:  0000000000000000(0000) GS:ffff9beb5c500000(0000) knlGS:0000000000000000
[   10.559143] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.559145] CR2: 00007ff6c4f0f000 CR3: 00000003f6e10005 CR4: 00000000001706e0
[   10.559146] Call Trace:
[   10.559148]  <TASK>
[   10.559149]  ? drm_vma_node_revoke+0x63/0x70
[   10.559154]  ? kfree+0x1eb/0x220
[   10.559158]  amdgpu_bo_unref+0x1a/0x30 [amdgpu]
[   10.559318]  amdgpu_gem_object_free+0x20/0x30 [amdgpu]
[   10.559458]  drm_gem_object_release_handle+0x69/0x80
[   10.559463]  ? drm_gem_object_handle_put_unlocked+0xe0/0xe0
[   10.559465]  idr_for_each+0x4e/0xb0
[   10.559468]  drm_gem_release+0x1c/0x30
[   10.559470]  drm_file_free.part.0+0x1e1/0x250
[   10.559473]  drm_release+0x65/0x110
[   10.559475]  __fput+0x91/0x250
[   10.559479]  task_work_run+0x5c/0x90
[   10.559483]  do_exit+0x31d/0xad0
[   10.559486]  ? __audit_syscall_entry+0xec/0x130
[   10.559490]  do_group_exit+0x2d/0x90
[   10.559491]  __x64_sys_exit_group+0x14/0x20
[   10.559493]  do_syscall_64+0x3a/0x80
[   10.559496]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   10.559500] RIP: 0033:0x7ff6c648a711
[   10.559516] Code: Unable to access opcode bytes at RIP 0x7ff6c648a6e7.
[   10.559517] RSP: 002b:00007ffc043845d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[   10.559518] RAX: ffffffffffffffda RBX: 00007ff6c65a09e0 RCX: 00007ff6c648a711
[   10.559520] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
[   10.559520] RBP: 0000000000000000 R08: ffffffffffffff80 R09: 00007ff6c65abb20
[   10.559521] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ff6c65a09e0
[   10.559522] R13: 0000000000000000 R14: 00007ff6c65a5ee8 R15: 00007ff6c65a5f00
[   10.559525]  </TASK>
[   10.559525] ---[ end trace 0000000000000000 ]---

Comment 6 Dominik 'Rathann' Mierzejewski 2022-11-07 13:03:06 UTC
FWIW, this seems to be gone in 6.0.5 and 6.0.7 F36 kernels. I haven't tested any other versions. 5.9.16 seems to be the last version where this is occurring, so I'm closing this.


Note You need to log in before you can comment on or make changes to this bug.