With 7900 GPU installed and running Einstein@Home, system is unstable/unusable. The system will run for as little as 15 min or up to several hours, even a whole day, but that is rare. I will include logs, but AMDGPU will report an error, then some type of hardware resets are attempted, but these fail, and all video on the system hangs and then crashes, taking down the desktop. The whole system doesn't crash or reset, though; it stays running and on the network. Reproducible: Always Steps to Reproduce: 1. Install BOINC & Einstein@Home 2. Install 7900 3. Crashes randomly, but often, under load. All *I* have to do is swap out my 6800XT for my 7900XTX. Actual Results: Faults reported in/by AMDGPU and OpenCL application. Expected Results: This is well know load that hasn't changed in over a decade, although it is not a benchmark. The software should work as it has in the past, and does work on older hardware. I have finally confirmed that the same GPU works with the same OpenCL application on Ubuntu LTS with the official AMDGPU installer. It's not clear how much of the OSS stack is used in that setup, but clearly the official ROCm pkgs are being installed. AMDGPU-PRO may or may not be used; can't tell. This may be an AMDGPU issue or ROCm issue. I realize Fedora isn't officially supported by AMD, but past experience shows that Fedora is usually quite stable with the OSS stack for AMD GPUs.
Created attachment 1972901 [details] Journal logs of a representative but elaborated failure
Sorry I'm a bit busy, can you try installing the 5.6 packages from rawhide and see if that works? I don't mind back-porting ROCm 5.6 to Fedora 38 if it resolves this issue easily.
Yes! I will do that; more soon...
Thanks for your help, Jeremy. Well, I tried, but rawhide ROCm pkgs depend on newer glibc, so dnf pulled them in from rawhide, too. Consequently, the whole device was unavailable in clinfo; it didn't even show up as a platform. glixinfo still saw it. I reverted back with distrosync, and things are back to normal. (Although, I have a slightly newer glibc, now, than I did before; I think 2.37 must have dropped for f38 very recently because I did an update just two days ago.) I was told by another Fedora maintainer *never* to pull from another release version in this way, and I can see why doing so for libs, especially libc, is a bad idea. But, I couldn't think of another way to do what you asked. I think a testing build would be better. I assume from the fact that you didn't do this already that it is not as easy as it should be? The other option is that I commit to rawhide. I keep meaning to do that, but, as a test, I've been running rawhide in a VM for many years, and it periodically experiences fatal consequences after weekly updates. This makes me concerned. In any case, let me know what you think.
Ah sorry, I should have thought of that. Please use my copr: https://copr.fedorainfracloud.org/coprs/mystro256/rocm-hip/ I use it for testing the packages before I put them into rawhide, as I don't actually drive fedora rawhide on my local machine. The repo is usually in flux, so your millage may vary.
I installed your ROCm. Thank you for making this copr. They seem to work very well on my 6800XT. But, I haven't had time to swap in the 7900. But, I'll try to do that this week or weekend. Fingers crossed.
No problem! I use it for my own testing, I might advertise it more for people looking to use the latest ROCm. I'm trying to avoid back-porting ROCm too much unless there's something broken, e.g. this ticket.
Just an update: I'm still working on testing it. A new problem has cropped up, so testing is delayed. I'll get to it ASAP. I'm on your upstream repo, though, so I'm prepared. I did want to say that rc1 & rc2 have been unstable. The OpenCL problem causes a video crash, but doesn't really hurt the system. When rc1/rc2 have crashed, it has caused FS errors that require fsck, which isn't the worst thing, but is concerning, takes a bit of effort to fix, and could potentially cause a serious problem. These crashes also produce no information; the journal is damaged and nothing useful survives. FYI.
Okay, I'm finally getting back to this. I have now replaced 100% of my computer and the problem seems to persist with this card + BOINC. It has now been nearly 1 year since this card was released and all the evidence still points to a driver problem. I see people using RDNA3 for the same work, but from windows. I cannot find anyone using linux for this workload. Unfortunately, the original workload that first exhibited these symptoms is no longer available from BOINC, but a different workload from the same project is now causing similar symptoms. Kernel 6.6.0-0rc7 ROCm 5.6.0-1 See latest log, attached...
Created attachment 1996165 [details] Latest log Oct 29 10:51:00 host sshd[551613]: Unable to negotiate with 222.116.73.78 port 63426: no matching host key type found. Their offer: ssh-rsa,ssh-dss [preauth] Oct 29 10:51:00 host audit[551613]: CRYPTO_KEY_USER pid=551613 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:32:b6:97:6a:73:a4:af:7d:3c:c9:53:e0:ba:dc:28:65:e8:61:fa:c4:60:bd:7c:85:b7:ec:2d:f2:59:a2:bc:22 direction=? spid=551614 suid=74 exe="/usr/sbin/sshd" hostname=? addr=222.116.73.78 terminal=? res=success' Oct 29 10:51:00 host audit[551613]: CRYPTO_KEY_USER pid=551613 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:32:b6:97:6a:73:a4:af:7d:3c:c9:53:e0:ba:dc:28:65:e8:61:fa:c4:60:bd:7c:85:b7:ec:2d:f2:59:a2:bc:22 direction=? spid=551613 suid=0 exe="/usr/sbin/sshd" hostname=? addr=222.116.73.78 terminal=? res=success' Oct 29 10:51:00 host audit[551613]: USER_LOGIN pid=551613 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login acct="(unknown)" exe="/usr/sbin/sshd" hostname=? addr=222.116.73.78 terminal=ssh res=failed' Oct 29 10:51:06 host kernel: amdgpu: manual fan speed control should be enabled first Oct 29 10:51:07 host kernel: amdgpu: manual fan speed control should be enabled first Oct 29 10:51:11 host audit[551777]: CRYPTO_KEY_USER pid=551777 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:32:b6:97:6a:73:a4:af:7d:3c:c9:53:e0:ba:dc:28:65:e8:61:fa:c4:60:bd:7c:85:b7:ec:2d:f2:59:a2:bc:22 direction=? spid=551777 suid=0 exe="/usr/sbin/sshd" hostname=? addr=161.132.180.115 terminal=? res=success' Oct 29 10:51:12 host audit[551776]: CRYPTO_SESSION pid=551776 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=start direction=from-server cipher=aes256-gcm ksize=256 mac=<implicit> pfs=curve25519-sha256 spid=551777 suid=74 rport=2636 laddr=192.168.20.110 lport=22 exe="/usr/sbin/sshd" hostname=? addr=161.132.180.115 terminal=? res=success' Oct 29 10:51:12 host audit[551776]: CRYPTO_SESSION pid=551776 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=start direction=from-client cipher=aes256-gcm ksize=256 mac=<implicit> pfs=curve25519-sha256 spid=551777 suid=74 rport=2636 laddr=192.168.20.110 lport=22 exe="/usr/sbin/sshd" hostname=? addr=161.132.180.115 terminal=? res=success' Oct 29 10:51:12 host audit[551776]: USER_AUTH pid=551776 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=PAM:authentication grantors=? acct="?" exe="/usr/sbin/sshd" hostname=161.132.180.115 addr=161.132.180.115 terminal=ssh res=failed' Oct 29 10:51:34 host sshd[551776]: Failed password for root from 161.132.180.115 port 2636 ssh2 Oct 29 10:51:34 host kernel: amdgpu: manual fan speed control should be enabled first Oct 29 10:51:35 host audit[552139]: CRYPTO_KEY_USER pid=552139 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:32:b6:97:6a:73:a4:af:7d:3c:c9:53:e0:ba:dc:28:65:e8:61:fa:c4:60:bd:7c:85:b7:ec:2d:f2:59:a2:bc:22 direction=? spid=552139 suid=0 exe="/usr/sbin/sshd" hostname=? addr=170.64.158.52 terminal=? res=success' Oct 29 10:51:35 host audit[552138]: CRYPTO_SESSION pid=552138 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=start direction=from-server cipher=aes128-gcm ksize=128 mac=<implicit> pfs=curve25519-sha256 spid=552139 suid=74 rport=50068 laddr=192.168.20.110 lport=22 exe="/usr/sbin/sshd" hostname=? addr=170.64.158.52 terminal=? res=success' Oct 29 10:51:35 host audit[552138]: CRYPTO_SESSION pid=552138 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=start direction=from-client cipher=aes128-gcm ksize=128 mac=<implicit> pfs=curve25519-sha256 spid=552139 suid=74 rport=50068 laddr=192.168.20.110 lport=22 exe="/usr/sbin/sshd" hostname=? addr=170.64.158.52 terminal=? res=success' Oct 29 10:51:36 host sshd[552138]: Invalid user user from 170.64.158.52 port 50068 Oct 29 10:51:36 host audit[552138]: USER_AUTH pid=552138 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=PAM:authentication grantors=? acct="?" exe="/usr/sbin/sshd" hostname=170.64.158.52 addr=170.64.158.52 terminal=ssh res=failed' Oct 29 10:51:38 host kernel: amdgpu: manual fan speed control should be enabled first Oct 29 10:51:42 host sshd[551776]: Received disconnect from 161.132.180.115 port 2636:11: Bye Bye [preauth] Oct 29 10:51:42 host audit[551776]: CRYPTO_KEY_USER pid=551776 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=session fp=? direction=both spid=551777 suid=74 rport=2636 laddr=192.168.20.110 lport=22 exe="/usr/sbin/sshd" hostname=? addr=161.132.180.115 terminal=? res=success' Oct 29 10:51:42 host audit[551776]: CRYPTO_KEY_USER pid=551776 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:32:b6:97:6a:73:a4:af:7d:3c:c9:53:e0:ba:dc:28:65:e8:61:fa:c4:60:bd:7c:85:b7:ec:2d:f2:59:a2:bc:22 direction=? spid=551777 suid=74 exe="/usr/sbin/sshd" hostname=? addr=161.132.180.115 terminal=? res=success' Oct 29 10:51:42 host sshd[551776]: Disconnected from authenticating user root 161.132.180.115 port 2636 [preauth] Oct 29 10:51:42 host audit[551776]: CRYPTO_KEY_USER pid=551776 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:32:b6:97:6a:73:a4:af:7d:3c:c9:53:e0:ba:dc:28:65:e8:61:fa:c4:60:bd:7c:85:b7:ec:2d:f2:59:a2:bc:22 direction=? spid=551776 suid=0 exe="/usr/sbin/sshd" hostname=? addr=161.132.180.115 terminal=? res=success' Oct 29 10:51:42 host audit[551776]: USER_LOGIN pid=551776 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login acct="root" exe="/usr/sbin/sshd" hostname=? addr=161.132.180.115 terminal=ssh res=failed' Oct 29 10:51:48 host kernel: amdgpu: manual fan speed control should be enabled first Oct 29 10:51:49 host kernel: amdgpu: manual fan speed control should be enabled first Oct 29 10:52:01 host sshd[552138]: Failed password for invalid user user from 170.64.158.52 port 50068 ssh2 Oct 29 10:52:04 host kernel: amdgpu: manual fan speed control should be enabled first Oct 29 10:52:05 host kernel: amdgpu: manual fan speed control should be enabled first Oct 29 10:52:06 host kernel: amdgpu: manual fan speed control should be enabled first Oct 29 10:52:07 host kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0 Oct 29 10:52:11 host kernel: amdgpu 0000:11:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005 Oct 29 10:52:11 host kernel: amdgpu 0000:11:00.0: amdgpu: Failed to export SMU metrics table! Oct 29 10:52:13 host audit[552138]: CRYPTO_KEY_USER pid=552138 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=session fp=? direction=both spid=552139 suid=74 rport=50068 laddr=192.168.20.110 lport=22 exe="/usr/sbin/sshd" hostname=? addr=170.64.158.52 terminal=? res=success' Oct 29 10:52:13 host audit[552138]: CRYPTO_KEY_USER pid=552138 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:32:b6:97:6a:73:a4:af:7d:3c:c9:53:e0:ba:dc:28:65:e8:61:fa:c4:60:bd:7c:85:b7:ec:2d:f2:59:a2:bc:22 direction=? spid=552139 suid=74 exe="/usr/sbin/sshd" hostname=? addr=170.64.158.52 terminal=? res=success' Oct 29 10:52:13 host sshd[552138]: Connection closed by invalid user user 170.64.158.52 port 50068 [preauth] Oct 29 10:52:13 host audit[552138]: CRYPTO_KEY_USER pid=552138 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:32:b6:97:6a:73:a4:af:7d:3c:c9:53:e0:ba:dc:28:65:e8:61:fa:c4:60:bd:7c:85:b7:ec:2d:f2:59:a2:bc:22 direction=? spid=552138 suid=0 exe="/usr/sbin/sshd" hostname=? addr=170.64.158.52 terminal=? res=success' Oct 29 10:52:13 host audit[552138]: USER_LOGIN pid=552138 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login acct="(unknown)" exe="/usr/sbin/sshd" hostname=? addr=170.64.158.52 terminal=ssh res=failed' Oct 29 10:52:15 host kernel: amdgpu 0000:11:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005 Oct 29 10:52:15 host kernel: amdgpu 0000:11:00.0: amdgpu: Failed to export SMU metrics table! Oct 29 10:52:17 host kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=48969795, emitted seq=48969797 Oct 29 10:52:17 host kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xwayland pid 7841 thread Xwayland:cs0 pid 7871 Oct 29 10:52:17 host kernel: amdgpu 0000:11:00.0: amdgpu: GPU reset begin! Oct 29 10:52:17 host kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 Oct 29 10:52:17 host kernel: amdgpu: failed to remove hardware queue from MES, doorbell=0x1216 Oct 29 10:52:17 host kernel: amdgpu: MES might be in unrecoverable state, issue a GPU reset Oct 29 10:52:17 host kernel: amdgpu: Failed to evict queue 4 Oct 29 10:52:17 host kernel: amdgpu: Failed to evict process queues Oct 29 10:52:17 host kernel: amdgpu: Failed to suspend process 0x802b Oct 29 10:52:17 host kernel: amdgpu: Failed to evict queue 4 Oct 29 10:52:17 host kernel: amdgpu: Failed to suspend process 0x802d Oct 29 10:52:17 host kernel: amdgpu: Failed to evict queue 4 Oct 29 10:52:17 host kernel: amdgpu: Failed to suspend process 0x8008 Oct 29 10:52:17 host kernel: amdgpu: Failed to evict queue 4 Oct 29 10:52:17 host kernel: amdgpu: Failed to suspend process 0x8029 Oct 29 10:52:18 host kernel: amdgpu 0000:11:00.0: amdgpu: IP block:gfx_v11_0 is hung! Oct 29 10:52:19 host kernel: [drm:sdma_v6_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out Oct 29 10:52:19 host kernel: amdgpu 0000:11:00.0: amdgpu: IP block:sdma_v6_0 is hung! Oct 29 10:52:20 host audit[7794]: ANOM_ABEND auid=13013 uid=13013 gid=13013 ses=3 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=7794 comm="radeon-profile" exe="/usr/bin/radeon-profile" sig=11 res=1 Oct 29 10:52:20 host kernel: amdgpu 0000:11:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005 Oct 29 10:52:20 host kernel: amdgpu 0000:11:00.0: amdgpu: [smu_v13_0_0_get_power_profile_mode] Failed to get activity monitor! Oct 29 10:52:20 host kernel: radeon-profile[7794]: segfault at 8 ip 00007fbe798ab4b6 sp 00007ffe7f1b4a90 error 4 in libQt5Widgets.so.5.15.10[7fbe7976f000+417000] likely on CPU 25 (core 9, socket 0) Oct 29 10:52:20 host kernel: Code: 1f 44 00 00 49 89 d4 eb e9 e8 36 2c ee ff 66 0f 1f 44 00 00 f3 0f 1e fa 55 48 89 e5 41 56 41 55 49 89 fd 41 54 53 48 83 ec 40 <4c> 8b 77 08 64 48 8b 04 25 28 00 00 00 48 89 45 d8 31 c0 41 0f b6 Oct 29 10:52:20 host audit: BPF prog-id=108 op=LOAD Oct 29 10:52:20 host audit: BPF prog-id=109 op=LOAD Oct 29 10:52:20 host audit: BPF prog-id=110 op=LOAD Oct 29 10:52:20 host systemd[1]: Started systemd-coredump - Process Core Dump (PID 552763/UID 0). Oct 29 10:52:20 host audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@1-552763-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Oct 29 10:52:20 host kernel: Failed to wait all pipes clean Oct 29 10:52:20 host kernel: amdgpu 0000:11:00.0: amdgpu: soft reset failed, will fallback to full reset! Oct 29 10:52:21 host systemd-coredump[552764]: Process 7794 (radeon-profile) of user 13013 dumped core. Module libpk-gtk-module.so from rpm PackageKit-1.2.6-6.fc38.x86_64 Module libogg.so.0 from rpm libogg-1.3.5-5.fc38.x86_64 Module libvorbis.so.0 from rpm libvorbis-1.3.7-7.fc38.x86_64 Module libltdl.so.7 from rpm libtool-2.4.7-6.fc38.x86_64 Module libtdb.so.1 from rpm libtdb-1.4.8-1.fc38.x86_64 Module libvorbisfile.so.3 from rpm libvorbis-1.3.7-7.fc38.x86_64 Module libgthread-2.0.so.0 from rpm glib2-2.76.5-2.fc38.x86_64 Module libqadwaitadecorations.so from rpm qadwaitadecorations-0.1.2-5.fc38.x86_64 Module libxdg-shell.so from rpm qt5-qtwayland-5.15.10-1.fc38.x86_64 Module libnss_sss.so.2 from rpm sssd-2.9.1-1.fc38.x86_64 Module libpciaccess.so.0 from rpm libpciaccess-0.16-8.fc38.x86_64 Module libtinfo.so.6 from rpm ncurses-6.4-3.20230114.fc38.x86_64 Module libedit.so.0 from rpm libedit-3.1-45.20221030cvs.fc38.x86_64 Module libdrm_intel.so.1 from rpm libdrm-2.4.117-1.fc38.x86_64 Module libdrm_nouveau.so.2 from rpm libdrm-2.4.117-1.fc38.x86_64 Module libdrm_amdgpu.so.1 from rpm libdrm-2.4.117-1.fc38.x86_64 Module libelf.so.1 from rpm elfutils-0.189-3.fc38.x86_64 Module libdrm_radeon.so.1 from rpm libdrm-2.4.117-1.fc38.x86_64 Module libsensors.so.4 from rpm lm_sensors-3.6.0-13.fc38.x86_64 Module radeonsi_dri.so from rpm mesa-23.1.9-1.fc38.x86_64 Module libxshmfence.so.1 from rpm libxshmfence-1.3-12.fc38.x86_64 Module libxcb-sync.so.1 from rpm libxcb-1.13.1-11.fc38.x86_64 Module libxcb-present.so.0 from rpm libxcb-1.13.1-11.fc38.x86_64 Module libxcb-dri3.so.0 from rpm libxcb-1.13.1-11.fc38.x86_64 Module libwayland-server.so.0 from rpm wayland-1.22.0-1.fc38.x86_64 Module libdrm.so.2 from rpm libdrm-2.4.117-1.fc38.x86_64 Module libxcb-xfixes.so.0 from rpm libxcb-1.13.1-11.fc38.x86_64 Module libxcb-randr.so.0 from rpm libxcb-1.13.1-11.fc38.x86_64 Module libxcb-dri2.so.0 from rpm libxcb-1.13.1-11.fc38.x86_64 Module libX11-xcb.so.1 from rpm libX11-1.8.7-1.fc38.x86_64 Module libexpat.so.1 from rpm expat-2.5.0-2.fc38.x86_64 Module libglapi.so.0 from rpm mesa-23.1.9-1.fc38.x86_64 Module libgbm.so.1 from rpm mesa-23.1.9-1.fc38.x86_64 Module libEGL_mesa.so.0 from rpm mesa-23.1.9-1.fc38.x86_64 Module libEGL.so.1 from rpm libglvnd-1.6.0-2.fc38.x86_64 Module libqt-plugin-wayland-egl.so from rpm qt5-qtwayland-5.15.10-1.fc38.x86_64 Module libwebpdemux.so.2 from rpm libwebp-1.3.2-2.fc38.x86_64 Module libwebpmux.so.3 from rpm libwebp-1.3.2-2.fc38.x86_64 Module libqwebp.so from rpm qt5-qtimageformats-5.15.10-2.fc38.x86_64 Module libqwbmp.so from rpm qt5-qtimageformats-5.15.10-2.fc38.x86_64 Module libjbig.so.2.1 from rpm jbigkit-2.1-25.fc38.x86_64 Module libLerc.so.4 from rpm liblerc-4.0.0-3.fc38.x86_64 Module libwebp.so.7 from rpm libwebp-1.3.2-2.fc38.x86_64 Module libtiff.so.5 from rpm libtiff-4.4.0-8.fc38.x86_64 Module libqtiff.so from rpm qt5-qtimageformats-5.15.10-2.fc38.x86_64 Module libqtga.so from rpm qt5-qtimageformats-5.15.10-2.fc38.x86_64 Module libQt5Svg.so.5 from rpm qt5-qtsvg-5.15.10-1.fc38.x86_64 Module libqsvg.so from rpm qt5-qtsvg-5.15.10-1.fc38.x86_64 Module liblcms2.so.2 from rpm lcms2-2.15-1.fc38.x86_64 Module libmng.so.2 from rpm libmng-2.0.3-17.fc38.x86_64 Module libqmng.so from rpm qt5-qtimageformats-5.15.10-2.fc38.x86_64 Module libqjpeg.so from rpm qt5-qtbase-5.15.10-9.fc38.x86_64 Module libjasper.so.6 from rpm jasper-3.0.6-2.fc38.x86_64 Module libqjp2.so from rpm qt5-qtimageformats-5.15.10-2.fc38.x86_64 Module libqico.so from rpm qt5-qtbase-5.15.10-9.fc38.x86_64 Module libqicns.so from rpm qt5-qtimageformats-5.15.10-2.fc38.x86_64 Module libbrotlienc.so.1 from rpm brotli-1.0.9-11.fc38.x86_64 Module libvmaf.so.1 from rpm vmaf-2.3.0-5.fc38.x86_64 Module libjxl.so.0.7 from rpm jpegxl-0.7.0-6.fc38.x86_64 Module libsharpyuv.so.0 from rpm libwebp-1.3.2-2.fc38.x86_64 Module libaom.so.3 from rpm aom-3.7.0-1.fc38.x86_64 Module libdav1d.so.6 from rpm dav1d-1.2.1-1.fc38.x86_64 Module libheif.so.1 from rpm libheif-1.16.2-2.fc38.x86_64 Module libqheif.so from rpm qt-heif-image-plugin-0.3.4-1.fc38.x86_64 Module libqgif.so from rpm qt5-qtbase-5.15.10-9.fc38.x86_64 Module libQt5X11Extras.so.5 from rpm qt5-qtx11extras-5.15.10-1.fc38.x86_64 Module adwaita.so from rpm adwaita-qt-1.4.2-2.fc38.x86_64 Module libgvfscommon.so from rpm gvfs-1.50.6-1.fc38.x86_64 Module libgvfsdbus.so from rpm gvfs-1.50.6-1.fc38.x86_64 Module libdconfsettings.so from rpm dconf-0.40.0-8.fc38.x86_64 Module libblkid.so.1 from rpm util-linux-2.38.1-4.fc38.x86_64 Module libdatrie.so.1 from rpm libdatrie-0.2.13-5.fc38.x86_64 Module libjson-glib-1.0.so.0 from rpm json-glib-1.6.6-4.fc38.x86_64 Module libatspi.so.0 from rpm at-spi2-core-2.48.3-1.fc38.x86_64 Module libpixman-1.so.0 from rpm pixman-0.42.2-1.fc38.x86_64 Module libxcb-shm.so.0 from rpm libxcb-1.13.1-11.fc38.x86_64 Module libxcb-render.so.0 from rpm libxcb-1.13.1-11.fc38.x86_64 Module libQt5QmlModels.so.5 from rpm qt5-qtdeclarative-5.15.10-1.fc38.x86_64 Module libmount.so.1 from rpm util-linux-2.38.1-4.fc38.x86_64 Module libadwaitaqtpriv.so.1 from rpm adwaita-qt-1.4.2-2.fc38.x86_64 Module libjpeg.so.62 from rpm libjpeg-turbo-2.1.4-2.fc38.x86_64 Module libthai.so.0 from rpm libthai-0.1.29-4.fc38.x86_64 Module libXinerama.so.1 from rpm libXinerama-1.1.5-2.fc38.x86_64 Module libXcomposite.so.1 from rpm libXcomposite-0.4.5-9.fc38.x86_64 Module libXdamage.so.1 from rpm libXdamage-1.1.5-9.fc38.x86_64 Module libXcursor.so.1 from rpm libXcursor-1.2.1-3.fc38.x86_64 Module libwayland-egl.so.1 from rpm wayland-1.22.0-1.fc38.x86_64 Module libXfixes.so.3 from rpm libXfixes-6.0.0-5.fc38.x86_64 Module libtracker-sparql-3.0.so.0 from rpm tracker-3.5.3-2.fc38.x86_64 Module libcloudproviders.so.0 from rpm libcloudproviders-0.3.2-1.fc38.x86_64 Module libatk-bridge-2.0.so.0 from rpm at-spi2-core-2.48.3-1.fc38.x86_64 Module libXi.so.6 from rpm libXi-1.8.1-1.fc38.x86_64 Module libepoxy.so.0 from rpm libepoxy-1.5.10-3.fc38.x86_64 Module libatk-1.0.so.0 from rpm at-spi2-core-2.48.3-1.fc38.x86_64 Module libcairo-gobject.so.2 from rpm cairo-1.17.8-4.fc38.x86_64 Module libfribidi.so.0 from rpm fribidi-1.0.12-3.fc38.x86_64 Module libpangoft2-1.0.so.0 from rpm pango-1.50.14-1.fc38.x86_64 Module libcairo.so.2 from rpm cairo-1.17.8-4.fc38.x86_64 Module libpangocairo-1.0.so.0 from rpm pango-1.50.14-1.fc38.x86_64 Module libgmodule-2.0.so.0 from rpm glib2-2.76.5-2.fc38.x86_64 Module libQt5QuickTemplates2.so.5 from rpm qt5-qtquickcontrols2-5.15.10-1.fc38.x86_64 Module libQt5Qml.so.5 from rpm qt5-qtdeclarative-5.15.10-1.fc38.x86_64 Module libQt5Quick.so.5 from rpm qt5-qtdeclarative-5.15.10-1.fc38.x86_64 Module libgio-2.0.so.0 from rpm glib2-2.76.5-2.fc38.x86_64 Module libadwaitaqt.so.1 from rpm adwaita-qt-1.4.2-2.fc38.x86_64 Module libgobject-2.0.so.0 from rpm glib2-2.76.5-2.fc38.x86_64 Module libgdk_pixbuf-2.0.so.0 from rpm gdk-pixbuf2-2.42.10-2.fc38.x86_64 Module libpango-1.0.so.0 from rpm pango-1.50.14-1.fc38.x86_64 Module libgdk-3.so.0 from rpm gtk3-3.24.38-1.fc38.x86_64 Module libgtk-3.so.0 from rpm gtk3-3.24.38-1.fc38.x86_64 Module libQt5QuickControls2.so.5 from rpm qt5-qtquickcontrols2-5.15.10-1.fc38.x86_64 Module libqgnomeplatform.so from rpm qgnomeplatform-0.9.1-8.fc38.x86_64 Module libqgnomeplatformtheme.so from rpm qgnomeplatform-0.9.1-8.fc38.x86_64 Module libibusplatforminputcontextplugin.so from rpm qt5-qtbase-5.15.10-9.fc38.x86_64 Module libdbus-1.so.3 from rpm dbus-1.14.10-1.fc38.x86_64 Module libxml2.so.2 from rpm libxml2-2.10.4-1.fc38.x86_64 Module libffi.so.8 from rpm libffi-3.4.4-2.fc38.x86_64 Module libxkbcommon.so.0 from rpm libxkbcommon-1.5.0-2.fc38.x86_64 Module libQt5DBus.so.5 from rpm qt5-qtbase-5.15.10-9.fc38.x86_64 Module libfontconfig.so.1 from rpm fontconfig-2.14.2-1.fc38.x86_64 Module libwayland-client.so.0 from rpm wayland-1.22.0-1.fc38.x86_64 Module libwayland-cursor.so.0 from rpm wayland-1.22.0-1.fc38.x86_64 Module libQt5WaylandClient.so.5 from rpm qt5-qtwayland-5.15.10-1.fc38.x86_64 Module libqwayland-generic.so from rpm qt5-qtwayland-5.15.10-1.fc38.x86_64 Module libbrotlicommon.so.1 from rpm brotli-1.0.9-11.fc38.x86_64 Module libselinux.so.1 from rpm libselinux-3.5-1.fc38.x86_64 Module libbrotlidec.so.1 from rpm brotli-1.0.9-11.fc38.x86_64 Module libbz2.so.1 from rpm bzip2-1.0.8-13.fc38.x86_64 Module libpcre2-8.so.0 from rpm pcre2-10.42-1.fc38.1.x86_64 Module libicudata.so.72 from rpm icu-72.1-2.fc38.x86_64 Module liblz4.so.1 from rpm lz4-1.9.4-2.fc38.x86_64 Module liblzma.so.5 from rpm xz-5.4.1-1.fc38.x86_64 Module libcap.so.2 from rpm libcap-2.48-6.fc38.x86_64 Module libkeyutils.so.1 from rpm keyutils-1.6.1-6.fc38.x86_64 Module libkrb5support.so.0 from rpm krb5-1.21-3.fc38.x86_64 Module libcom_err.so.2 from rpm e2fsprogs-1.46.5-4.fc38.x86_64 Module libk5crypto.so.3 from rpm krb5-1.21-3.fc38.x86_64 Module libkrb5.so.3 from rpm krb5-1.21-3.fc38.x86_64 Module libgraphite2.so.3 from rpm graphite2-1.3.14-11.fc38.x86_64 Module libfreetype.so.6 from rpm freetype-2.13.0-2.fc38.x86_64 Module libGLdispatch.so.0 from rpm libglvnd-1.6.0-2.fc38.x86_64 Module libGLX.so.0 from rpm libglvnd-1.6.0-2.fc38.x86_64 Module libXau.so.6 from rpm libXau-1.0.11-2.fc38.x86_64 Module libglib-2.0.so.0 from rpm glib2-2.76.5-2.fc38.x86_64 Module libzstd.so.1 from rpm zstd-1.5.5-1.fc38.x86_64 Module libpcre2-16.so.0 from rpm pcre2-10.42-1.fc38.1.x86_64 Module libicuuc.so.72 from rpm icu-72.1-2.fc38.x86_64 Module libicui18n.so.72 from rpm icu-72.1-2.fc38.x86_64 Module libdouble-conversion.so.3 from rpm double-conversion-3.1.5-8.fc38.x86_64 Module libsystemd.so.0 from rpm systemd-253.12-1.fc38.x86_64 Module libcrypto.so.3 from rpm openssl-3.0.9-2.fc38.x86_64 Module libssl.so.3 from rpm openssl-3.0.9-2.fc38.x86_64 Module libproxy.so.1 from rpm libproxy-0.4.18-6.fc38.x86_64 Module libgssapi_krb5.so.2 from rpm krb5-1.21-3.fc38.x86_64 Module libharfbuzz.so.0 from rpm harfbuzz-7.1.0-1.fc38.x86_64 Module libz.so.1 from rpm zlib-1.2.13-3.fc38.x86_64 Module libpng16.so.16 from rpm libpng-1.6.37-14.fc38.x86_64 Module libGL.so.1 from rpm libglvnd-1.6.0-2.fc38.x86_64 Module libxcb.so.1 from rpm libxcb-1.13.1-11.fc38.x86_64 Module libXrender.so.1 from rpm libXrender-0.9.11-2.fc38.x86_64 Module libXext.so.6 from rpm libXext-1.3.5-2.fc38.x86_64 Module libQt5Core.so.5 from rpm qt5-qtbase-5.15.10-9.fc38.x86_64 Module libQt5Network.so.5 from rpm qt5-qtbase-5.15.10-9.fc38.x86_64 Module libQt5Gui.so.5 from rpm qt5-qtbase-5.15.10-9.fc38.x86_64 Module libQt5Widgets.so.5 from rpm qt5-qtbase-5.15.10-9.fc38.x86_64 Module libQt5Charts.so.5 from rpm qt5-qtcharts-5.15.10-1.fc38.x86_64 Module libX11.so.6 from rpm libX11-1.8.7-1.fc38.x86_64 Module libXrandr.so.2 from rpm libXrandr-1.5.2-10.fc38.x86_64 Module radeon-profile from rpm radeon-profile-20200824-8.fc38.x86_64 Stack trace of thread 7794: #0 0x00007fbe798ab4b6 _ZN15QAbstractButton10setCheckedEb (libQt5Widgets.so.5 + 0x2ab4b6) #1 0x000055cecccbfa38 _ZN14radeon_profile9refreshUIEv (radeon-profile + 0x44a38) #2 0x000055ceccccb2ba _ZN14radeon_profile14mainTimerEventEv (radeon-profile + 0x502ba) #3 0x00007fbe78ae8608 _Z10doActivateILb0EEvP7QObjectiPPv (libQt5Core.so.5 + 0x2e8608) #4 0x00007fbe78aeb9fd _ZN6QTimer7timeoutENS_14QPrivateSignalE (libQt5Core.so.5 + 0x2eb9fd) #5 0x00007fbe78adecab _ZN7QObject5eventEP6QEvent (libQt5Core.so.5 + 0x2decab) #6 0x00007fbe797aeb75 _ZN19QApplicationPrivate13notify_helperEP7QObjectP6QEvent (libQt5Widgets.so.5 + 0x1aeb75) #7 0x00007fbe78ab41a8 _ZN16QCoreApplication15notifyInternal2EP7QObjectP6QEvent (libQt5Core.so.5 + 0x2b41a8) #8 0x00007fbe78b05a9b _ZN14QTimerInfoList14activateTimersEv (libQt5Core.so.5 + 0x305a9b) #9 0x00007fbe78b063d1 _ZL23idleTimerSourceDispatchP8_GSourcePFiPvES1_ (libQt5Core.so.5 + 0x3063d1) #10 0x00007fbe773134fc g_main_context_dispatch (libglib-2.0.so.0 + 0x5c4fc) #11 0x00007fbe773716b8 g_main_context_iterate.isra.0 (libglib-2.0.so.0 + 0xba6b8) #12 0x00007fbe77310b83 g_main_context_iteration (libglib-2.0.so.0 + 0x59b83) #13 0x00007fbe78b06749 _ZN20QEventDispatcherGlib13processEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE (libQt5Core.so.5 + 0x306749) #14 0x00007fbe78ab2b6b _ZN10QEventLoop4execE6QFlagsINS_17ProcessEventsFlagEE (libQt5Core.so.5 + 0x2b2b6b) #15 0x00007fbe78abadfb _ZN16QCoreApplication4execEv (libQt5Core.so.5 + 0x2badfb) #16 0x000055cecccaa2cc main (radeon-profile + 0x2f2cc) #17 0x00007fbe78249b8a __libc_start_call_main (libc.so.6 + 0x27b8a) #18 0x00007fbe78249c4b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x27c4b) #19 0x000055cecccacba5 _start (radeon-profile + 0x31ba5) Stack trace of thread 7796: #0 0x00007fbe7832734d __poll (libc.so.6 + 0x10534d) #1 0x00007fbe77371629 g_main_context_iterate.isra.0 (libglib-2.0.so.0 + 0xba629) #2 0x00007fbe77310b83 g_main_context_iteration (libglib-2.0.so.0 + 0x59b83) #3 0x00007fbe78b06749 _ZN20QEventDispatcherGlib13processEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE (libQt5Core.so.5 + 0x306749) #4 0x00007fbe78ab2b6b _ZN10QEventLoop4execE6QFlagsINS_17ProcessEventsFlagEE (libQt5Core.so.5 + 0x2b2b6b) #5 0x00007fbe788f45d0 _ZN7QThread4execEv (libQt5Core.so.5 + 0xf45d0) #6 0x00007fbe67985dab _ZN22QDBusConnectionManager3runEv (libQt5DBus.so.5 + 0x1bdab) #7 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #8 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #9 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7798: #0 0x00007fbe7832734d __poll (libc.so.6 + 0x10534d) #1 0x00007fbe750cfc6c _ZN15QtWaylandClient11EventThread3runEv (libQt5WaylandClient.so.5 + 0x7fc6c) #2 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #3 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #4 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7799: #0 0x00007fbe7832734d __poll (libc.so.6 + 0x10534d) #1 0x00007fbe750cfc6c _ZN15QtWaylandClient11EventThread3runEv (libQt5WaylandClient.so.5 + 0x7fc6c) #2 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #3 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #4 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7800: #0 0x00007fbe7832cb4d syscall (libc.so.6 + 0x10ab4d) #1 0x00007fbe773687bd g_cond_wait (libglib-2.0.so.0 + 0xb17bd) #2 0x00007fbe772de13b g_async_queue_pop_intern_unlocked (libglib-2.0.so.0 + 0x2713b) #3 0x00007fbe773435d3 g_thread_pool_spawn_thread (libglib-2.0.so.0 + 0x8c5d3) #4 0x00007fbe773419f3 g_thread_proxy (libglib-2.0.so.0 + 0x8a9f3) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7804: #0 0x00007fbe7832734d __poll (libc.so.6 + 0x10534d) #1 0x00007fbe77371629 g_main_context_iterate.isra.0 (libglib-2.0.so.0 + 0xba629) #2 0x00007fbe77310b83 g_main_context_iteration (libglib-2.0.so.0 + 0x59b83) #3 0x00007fbe66efa5c5 dconf_gdbus_worker_thread (libdconfsettings.so + 0x75c5) #4 0x00007fbe773419f3 g_thread_proxy (libglib-2.0.so.0 + 0x8a9f3) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7801: #0 0x00007fbe7832734d __poll (libc.so.6 + 0x10534d) #1 0x00007fbe77371629 g_main_context_iterate.isra.0 (libglib-2.0.so.0 + 0xba629) #2 0x00007fbe77310b83 g_main_context_iteration (libglib-2.0.so.0 + 0x59b83) #3 0x00007fbe77310bd9 glib_worker_main (libglib-2.0.so.0 + 0x59bd9) #4 0x00007fbe773419f3 g_thread_proxy (libglib-2.0.so.0 + 0x8a9f3) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7803: #0 0x00007fbe7832734d __poll (libc.so.6 + 0x10534d) #1 0x00007fbe77371629 g_main_context_iterate.isra.0 (libglib-2.0.so.0 + 0xba629) #2 0x00007fbe77312aff g_main_loop_run (libglib-2.0.so.0 + 0x5baff) #3 0x00007fbe663c67b2 gdbus_shared_thread_func.lto_priv.0 (libgio-2.0.so.0 + 0x11a7b2) #4 0x00007fbe773419f3 g_thread_proxy (libglib-2.0.so.0 + 0x8a9f3) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7811: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adf22 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8bf22) #2 0x00007fbe788fb627 _ZN14QWaitCondition4waitEP6QMutex14QDeadlineTimer (libQt5Core.so.5 + 0xfb627) #3 0x00007fbe788f8d91 _ZN17QThreadPoolThread3runEv (libQt5Core.so.5 + 0xf8d91) #4 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7808: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adf22 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8bf22) #2 0x00007fbe788fb627 _ZN14QWaitCondition4waitEP6QMutex14QDeadlineTimer (libQt5Core.so.5 + 0xfb627) #3 0x00007fbe788f8d91 _ZN17QThreadPoolThread3runEv (libQt5Core.so.5 + 0xf8d91) #4 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7806: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adf22 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8bf22) #2 0x00007fbe788fb627 _ZN14QWaitCondition4waitEP6QMutex14QDeadlineTimer (libQt5Core.so.5 + 0xfb627) #3 0x00007fbe788f8d91 _ZN17QThreadPoolThread3runEv (libQt5Core.so.5 + 0xf8d91) #4 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7809: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adf22 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8bf22) #2 0x00007fbe788fb627 _ZN14QWaitCondition4waitEP6QMutex14QDeadlineTimer (libQt5Core.so.5 + 0xfb627) #3 0x00007fbe788f8d91 _ZN17QThreadPoolThread3runEv (libQt5Core.so.5 + 0xf8d91) #4 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7819: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adf22 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8bf22) #2 0x00007fbe788fb627 _ZN14QWaitCondition4waitEP6QMutex14QDeadlineTimer (libQt5Core.so.5 + 0xfb627) #3 0x00007fbe788f8d91 _ZN17QThreadPoolThread3runEv (libQt5Core.so.5 + 0xf8d91) #4 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7812: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adf22 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8bf22) #2 0x00007fbe788fb627 _ZN14QWaitCondition4waitEP6QMutex14QDeadlineTimer (libQt5Core.so.5 + 0xfb627) #3 0x00007fbe788f8d91 _ZN17QThreadPoolThread3runEv (libQt5Core.so.5 + 0xf8d91) #4 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7825: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adf22 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8bf22) #2 0x00007fbe788fb627 _ZN14QWaitCondition4waitEP6QMutex14QDeadlineTimer (libQt5Core.so.5 + 0xfb627) #3 0x00007fbe788f8d91 _ZN17QThreadPoolThread3runEv (libQt5Core.so.5 + 0xf8d91) #4 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7810: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adf22 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8bf22) #2 0x00007fbe788fb627 _ZN14QWaitCondition4waitEP6QMutex14QDeadlineTimer (libQt5Core.so.5 + 0xfb627) #3 0x00007fbe788f8d91 _ZN17QThreadPoolThread3runEv (libQt5Core.so.5 + 0xf8d91) #4 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7821: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adf22 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8bf22) #2 0x00007fbe788fb627 _ZN14QWaitCondition4waitEP6QMutex14QDeadlineTimer (libQt5Core.so.5 + 0xfb627) #3 0x00007fbe788f8d91 _ZN17QThreadPoolThread3runEv (libQt5Core.so.5 + 0xf8d91) #4 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7830: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adf22 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8bf22) #2 0x00007fbe788fb627 _ZN14QWaitCondition4waitEP6QMutex14QDeadlineTimer (libQt5Core.so.5 + 0xfb627) #3 0x00007fbe788f8d91 _ZN17QThreadPoolThread3runEv (libQt5Core.so.5 + 0xf8d91) #4 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7828: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adf22 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8bf22) #2 0x00007fbe788fb627 _ZN14QWaitCondition4waitEP6QMutex14QDeadlineTimer (libQt5Core.so.5 + 0xfb627) #3 0x00007fbe788f8d91 _ZN17QThreadPoolThread3runEv (libQt5Core.so.5 + 0xf8d91) #4 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7833: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adf22 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8bf22) #2 0x00007fbe788fb627 _ZN14QWaitCondition4waitEP6QMutex14QDeadlineTimer (libQt5Core.so.5 + 0xfb627) #3 0x00007fbe788f8d91 _ZN17QThreadPoolThread3runEv (libQt5Core.so.5 + 0xf8d91) #4 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7837: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adbb9 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8bbb9) #2 0x00007fbdb1912d9d cnd_wait (radeonsi_dri.so + 0x112d9d) #3 0x00007fbdb18c3c5b util_queue_thread_func (radeonsi_dri.so + 0xc3c5b) #4 0x00007fbdb1912ccc impl_thrd_routine (radeonsi_dri.so + 0x112ccc) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7835: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adf22 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8bf22) #2 0x00007fbe788fb627 _ZN14QWaitCondition4waitEP6QMutex14QDeadlineTimer (libQt5Core.so.5 + 0xfb627) #3 0x00007fbe788f8d91 _ZN17QThreadPoolThread3runEv (libQt5Core.so.5 + 0xf8d91) #4 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7840: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adbb9 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8bbb9) #2 0x00007fbdb1912d9d cnd_wait (radeonsi_dri.so + 0x112d9d) #3 0x00007fbdb18c3c5b util_queue_thread_func (radeonsi_dri.so + 0xc3c5b) #4 0x00007fbdb1912ccc impl_thrd_routine (radeonsi_dri.so + 0x112ccc) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7832: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adf22 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8bf22) #2 0x00007fbe788fb627 _ZN14QWaitCondition4waitEP6QMutex14QDeadlineTimer (libQt5Core.so.5 + 0xfb627) #3 0x00007fbe788f8d91 _ZN17QThreadPoolThread3runEv (libQt5Core.so.5 + 0xf8d91) #4 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7838: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adbb9 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8bbb9) #2 0x00007fbdb1912d9d cnd_wait (radeonsi_dri.so + 0x112d9d) #3 0x00007fbdb18c3c5b util_queue_thread_func (radeonsi_dri.so + 0xc3c5b) #4 0x00007fbdb1912ccc impl_thrd_routine (radeonsi_dri.so + 0x112ccc) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7813: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adf22 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8bf22) #2 0x00007fbe788fb627 _ZN14QWaitCondition4waitEP6QMutex14QDeadlineTimer (libQt5Core.so.5 + 0xfb627) #3 0x00007fbe788f8d91 _ZN17QThreadPoolThread3runEv (libQt5Core.so.5 + 0xf8d91) #4 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7805: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adf22 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8bf22) #2 0x00007fbe788fb627 _ZN14QWaitCondition4waitEP6QMutex14QDeadlineTimer (libQt5Core.so.5 + 0xfb627) #3 0x00007fbe788f8d91 _ZN17QThreadPoolThread3runEv (libQt5Core.so.5 + 0xf8d91) #4 0x00007fbe788f59dd _ZN14QThreadPrivate5startEPv (libQt5Core.so.5 + 0xf59dd) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) Stack trace of thread 7839: #0 0x00007fbe782ab219 __futex_abstimed_wait_common (libc.so.6 + 0x89219) #1 0x00007fbe782adbb9 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8bbb9) #2 0x00007fbdb1912d9d cnd_wait (radeonsi_dri.so + 0x112d9d) #3 0x00007fbdb18c3c5b util_queue_thread_func (radeonsi_dri.so + 0xc3c5b) #4 0x00007fbdb1912ccc impl_thrd_routine (radeonsi_dri.so + 0x112ccc) #5 0x00007fbe782ae947 start_thread (libc.so.6 + 0x8c947) #6 0x00007fbe78334860 __clone3 (libc.so.6 + 0x112860) ELF object binary architecture: AMD x86-64 Oct 29 10:52:21 host systemd[1]: systemd-coredump: Deactivated successfully. Oct 29 10:52:21 host radeon-profile-daemon[2419]: Client disconnected Oct 29 10:52:21 host radeon-profile-daemon[2419]: Awaiting connections... Oct 29 10:52:21 host audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@1-552763-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Oct 29 10:52:21 host audit: BPF prog-id=110 op=UNLOAD Oct 29 10:52:21 host audit: BPF prog-id=109 op=UNLOAD Oct 29 10:52:21 host audit: BPF prog-id=108 op=UNLOAD Oct 29 10:52:24 host kernel: amdgpu 0000:11:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005 Oct 29 10:52:24 host kernel: amdgpu 0000:11:00.0: amdgpu: Failed to export SMU metrics table! Oct 29 10:52:29 host kernel: amdgpu 0000:11:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005 Oct 29 10:52:29 host kernel: amdgpu 0000:11:00.0: amdgpu: Failed to export SMU metrics table! Oct 29 10:52:33 host kernel: amdgpu 0000:11:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005 Oct 29 10:52:33 host kernel: amdgpu 0000:11:00.0: amdgpu: Failed to export SMU metrics table! Oct 29 10:52:36 host audit[113200]: ANOM_ABEND auid=13013 uid=13013 gid=13013 ses=3 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=113200 comm="GpuWatchdog" exe="/home/<username>/.var/app/com.valvesoftware.Steam/.local/share/Steam/ubuntu12_64/steamwebhelper" sig=11 res=1 Oct 29 10:52:36 host kernel: GpuWatchdog[113236]: segfault at 0 ip 00007f0ebb992b86 sp 00007f0eb2750960 error 6 in libcef.so[7f0eb74ef000+7770000] likely on CPU 27 (core 11, socket 0) Oct 29 10:52:36 host kernel: Code: 89 de e8 5d ee 6e ff 80 7d cf 00 79 09 48 8b 7d b8 e8 2e 66 2c 03 41 8b 84 24 e0 00 00 00 89 45 b8 48 8d 7d b8 e8 7a d1 b5 fb <c7> 04 25 00 00 00 00 37 13 00 00 48 83 c4 38 5b 41 5c 41 5d 41 5e Oct 29 10:52:36 host audit: BPF prog-id=111 op=LOAD Oct 29 10:52:36 host audit: BPF prog-id=112 op=LOAD Oct 29 10:52:36 host audit: BPF prog-id=113 op=LOAD Oct 29 10:52:36 host systemd[1]: Started systemd-coredump - Process Core Dump (PID 553080/UID 0). Oct 29 10:52:36 host audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@2-553080-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Oct 29 10:52:37 host systemd-coredump[553082]: Process 113200 (steamwebhelper) of user 13013 dumped core. Stack trace of thread 2360: #0 0x00007f0ebb992b86 n/a (/home/<username>/.var/app/com.valvesoftware.Steam/.local/share/Steam/ubuntu12_64/libcef.so + 0x5f92b86) #1 0x00007f0ebb992413 n/a (/home/<username>/.var/app/com.valvesoftware.Steam/.local/share/Steam/ubuntu12_64/libcef.so + 0x5f92413) #2 0x00007f0eb9e16136 n/a (/home/<username>/.var/app/com.valvesoftware.Steam/.local/share/Steam/ubuntu12_64/libcef.so + 0x4416136) #3 0x00007f0eb9e26b7c n/a (/home/<username>/.var/app/com.valvesoftware.Steam/.local/share/Steam/ubuntu12_64/libcef.so + 0x4426b7c) #4 0x00007f0eb9ddf6ea n/a (/home/<username>/.var/app/com.valvesoftware.Steam/.local/share/Steam/ubuntu12_64/libcef.so + 0x43df6ea) #5 0x00007f0eb9e27244 n/a (/home/<username>/.var/app/com.valvesoftware.Steam/.local/share/Steam/ubuntu12_64/libcef.so + 0x4427244) #6 0x00007f0eb9dfedfe n/a (/home/<username>/.var/app/com.valvesoftware.Steam/.local/share/Steam/ubuntu12_64/libcef.so + 0x43fedfe) #7 0x00007f0eb9e40f07 n/a (/home/<username>/.var/app/com.valvesoftware.Steam/.local/share/Steam/ubuntu12_64/libcef.so + 0x4440f07) #8 0x00007f0eb9e62a05 n/a (/home/<username>/.var/app/com.valvesoftware.Steam/.local/share/S
Ok so I've been talking with another person with similar HW (I'm having trouble sourcing it for myself), and it seems upgrading to ROCm 5.7 solves most of the issues. Right now I have a ROCm 5.7 backport for Fedora 39 available: https://bodhi.fedoraproject.org/updates/FEDORA-2023-56199fe8e2 I haven't tried backporting to Fedora 38 yet, but the another person is using Fedora 39 and said this works really well, excluding some issues around efficiency. I assume ROCm 6.0 will be more fine tuned for the HW, but it should be workable with ROCm 5.7.
Excellent news! Oh, I'm pleased to hear that. Okay, I guess I need to upgrade to F39 or force install it? I think I can make it do it, but, probably wouldn't be a strictly valid test, anyway. ...Ah, okay, I checked the schedule; F39 is just around the corner. Great, that'll work out fine. Thanks, Jeremy. I appreciate just getting taken seriously about this brutally difficult troubleshooting situation I'm in. I will test ROCm 5.7 ASAP after F39 goes official.
I upgraded to F39, which I *thought* had gone well. I remember checking and it seemed like things were working as before the upgrade, with ROCm 5.7.1 installed. I noticed a few unrelated things that broke, and spent some time fixing them, which involved a post-upgrade pkg update of about 10-15 pkgs, they didn't seem related to any graphics/opencl stuff. However, after a few hours, I realized that the OpenCL wasn't working right, power use was down, tasks were taking a long time, and there were a lot of errors in the journal: Nov 19 04:35:44 <host> kernel: amdgpu 0000:11:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0024 address=0xbe42f000 flags=0x0000] Nov 19 04:35:44 <host> kernel: amdgpu 0000:11:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0024 address=0xbe42f000 flags=0x0000] Nov 19 04:35:44 <host> kernel: amdgpu 0000:11:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0024 address=0xbe42f000 flags=0x0000] Nov 19 04:35:44 <host> kernel: amdgpu 0000:11:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0024 address=0xbe42f000 flags=0x0000] Nov 19 04:35:44 <host> kernel: amdgpu 0000:11:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0024 address=0xbe42f000 flags=0x0000] Nov 19 04:35:44 <host> kernel: amdgpu 0000:11:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0024 address=0xbe42f000 flags=0x0000] Nov 19 04:35:44 <host> kernel: amdgpu 0000:11:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0024 address=0xbe42f000 flags=0x0000] Nov 19 04:35:44 <host> kernel: amdgpu 0000:11:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0024 address=0xbe42f000 flags=0x0000] Nov 19 04:35:44 <host> kernel: amdgpu 0000:11:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0024 address=0xbe42f000 flags=0x0000] Nov 19 04:35:44 <host> kernel: amdgpu 0000:11:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0024 address=0xbe42f000 flags=0x0000] Nov 19 04:35:49 <host> kernel: amdgpu 0000:11:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0024 address=0xbe42f000 flags=0x0000] Nov 19 04:35:49 <host> kernel: amdgpu 0000:11:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0024 address=0xbe42f000 flags=0x0000] Nov 19 04:35:49 <host> kernel: amdgpu 0000:11:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0024 address=0xbe42f000 flags=0x0000] Nov 19 04:35:49 <host> kernel: amdgpu 0000:11:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0024 address=0xbe42f000 flags=0x0000] Now, this symptom is different than the one I'm reporting in this bug. It wasn't crashing the GPU or the system, either, which is indicative of this bug. (Also, some games that worked right before the upgrade weren't working, but some were.) These symptoms remind me of when MESA OpenCL is installed and conflicting, or when something is messed up with the ICD. So, I downgraded my kernel and ROCm to 5.6, and now things are back to working as they were before the upgrade. I'm hoping there was a problem with the installation of these packages or maybe the icd, in which case, I can upgrade to 5.7, as you suggest, and resume testing. *I have not yet tested the application that was reliably causing the bug to manifest with ROCm 5.7.* More soon.
Dang, I cannot catch a break. I returned to my system today, and it was not crashed, but again not crunching properly and throwing yet a new error in dmesg. Nov 20 20:14:45 wrangler kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait Nov 20 20:14:45 wrangler kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14 Nov 20 20:14:45 wrangler kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait Nov 20 20:14:45 wrangler kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14 Nov 20 20:14:45 wrangler kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait Nov 20 20:14:45 wrangler kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14 Nov 20 20:14:45 wrangler kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait Nov 20 20:14:46 wrangler kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14 Nov 20 20:14:46 wrangler kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait Nov 20 20:14:46 wrangler kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14 Nov 20 20:14:46 wrangler kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait Nov 20 20:14:46 wrangler kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14 So, I'm still not able to get a good baseline since upgrading to F39. Part of the issue is that I'm also fighting a problem with the COPR_copr:copr.fedorainfracloud.org:group_kernel-vanilla:mainline-wo-mergew . Have you seen this problem where the latest rc kernels will not boot? This started on F38, before the upgrade. I got kernel-6.7-rc0 or something, and it wouldn't boot. I just tried to install the latest wo-mergew kernel, and got the exact same result. The system boots (via UEFI & systemd-boot), but hangs quite early, after plymoth-something and after paths start. Normally, the system would prompt me for dm-crypt passwd, but it just stalls there and never prompts. I'm really glad you recommended this COPR, because I do want to be using the very latest AMDGPU kernel driver, generally, and for this troubleshooting, specfiically. But, this problem is preventing me from completing the testing you asked me to do with ROCm 5.7. If you have an idea about this other issue, please let me know. I had hoped it would just go away on the next rc release, but it didn't. I'm going to try to research it, next.
Okay, I upgraded to ROCm 5.7 and the symptoms are unchanged. I upgraded and then started crunching E@H, and, in less than 10 minutes, the system graphics subsystem failed. I tired to CTRL-ALT-DELETE, but reboot got hung too and I had to hardware reset. After rebooting, I tried again, and the same thing happened after about 5 minutes of crunching. By contrast, PrimeGrid is running fine, but that's just integer work, AFAIK. I'll attach new logs. They look similar, but a little different. I would like to be on kernel 6.7, but I've explained why I cannot do that at the moment. I would also like to try ROCm 6 that you mentioned. Is that in the pipeline?
Created attachment 2001345 [details] Journal logs Latest logs from kernel 6.5.12 & ROCm 5.7.1.
More bad news. I got kernel 6.7rc4 installed and it boots, so that is actually good news, but not for this issue. Then, I tried running Einstein@Home with ROCm 5.7.1 & kernel 6.7 and it crashed my system hard. CTRL-ALT-DEL and power button did nothing; I had to force a poweroff. So an even a harder lockup than before, when it was just crashing the video subsystem. I've been doing OpenCL crunching via BOINC on AMD for 13 years or so and this is by far the longest outage I've ever experience. I just had no idea it could break this badly. Doesn't seem like much hope on the horizon. I'm assuming ROCm 6.0 is a long way away?
So a few quick updates: - There are some known LLVM related issues that I need to work on getting fixes pulled into Fedora 39/40, see https://bugzilla.redhat.com/show_bug.cgi?id=2216594 - ROCm 6.0 is due soon, but will only be in Fedora 40+ due to breaking interface changes with 5.7
Okay, got it. Sounds good. F40 is pretty far away, but not as far off as major releases used to be, IIRC. As always, I'm happy to help test if that's useful for you.
So quick update, I spoke with upstream, it seems like in ROCm 6.1, they're going to reorg the source code relating to LLVM, so it'll make fixing LLVM issues much much easier. I hope that helps us debug issues like this quicker.
Ah, nice. Good news. Thanks for the update!
Fedora Linux 38 entered end-of-life (EOL) status on 2024-05-21. Fedora Linux 38 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed.
I just wanted to report back here that, on kernel 6.10, I finally got up the courage to try Einstein@Home again. It seems okay, now. I'm back on the default packages, no copr.
Spoke too soon. Nearly instant crash is observed with one of two tested Einstein@Home apps. This is ridiculous. It's been nearly two years since this product was released and OCL still causes whole system crashes.
I'm not sure how to set this properly in the bug report, but this is still happening with F40.
I have reported it upstream, though, and it might be receiving some attention.
After getting someone to recreate my environment (as well as possible), upstream was NOT able to reproduce the problems that I reproduced quite easily about four weeks ago. I was told that upstream was aware of modifications made by Fedora maintainers to ROCm pkgs that are distributed. They made a point that such code would not receive the testing that they do for supported distros. Is there any reason to suspect that such changes affect the stability of AMDGPU+ROCm under heavy OpenCL workloads?
I'm going to close this. I assume it's a duplicate of 2330958? Also Fedora 38 is long past EOL and "rocm-opencl" was replaced with "rocclr". Feel free to reopen against rocclr if 2330958 doesn't capture the issue. I believe Tom Rix was helping out there. Note that we've sort of redone a lot of the packaging to be more aligned with upstream (we originally used a different compiler). *** This bug has been marked as a duplicate of bug 2330958 ***