Bug 2218022
| Summary: | Frequent Graphics System Crashes with AMD 7900 + OpenCL application E@H | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Paul DeStefano <prd-fedora> | ||||
| Component: | rocm-opencl | Assignee: | Jeremy Newton <alexjnewt> | ||||
| Status: | NEW --- | QA Contact: | |||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 38 | CC: | alexjnewt, dkxls23 | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | Type: | --- | |||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Paul DeStefano
2023-06-27 21:07:54 UTC
Created attachment 1972901 [details]
Journal logs of a representative but elaborated failure
Sorry I'm a bit busy, can you try installing the 5.6 packages from rawhide and see if that works? I don't mind back-porting ROCm 5.6 to Fedora 38 if it resolves this issue easily. Yes! I will do that; more soon... Thanks for your help, Jeremy. Well, I tried, but rawhide ROCm pkgs depend on newer glibc, so dnf pulled them in from rawhide, too. Consequently, the whole device was unavailable in clinfo; it didn't even show up as a platform. glixinfo still saw it. I reverted back with distrosync, and things are back to normal. (Although, I have a slightly newer glibc, now, than I did before; I think 2.37 must have dropped for f38 very recently because I did an update just two days ago.) I was told by another Fedora maintainer *never* to pull from another release version in this way, and I can see why doing so for libs, especially libc, is a bad idea. But, I couldn't think of another way to do what you asked. I think a testing build would be better. I assume from the fact that you didn't do this already that it is not as easy as it should be? The other option is that I commit to rawhide. I keep meaning to do that, but, as a test, I've been running rawhide in a VM for many years, and it periodically experiences fatal consequences after weekly updates. This makes me concerned. In any case, let me know what you think. Ah sorry, I should have thought of that. Please use my copr: https://copr.fedorainfracloud.org/coprs/mystro256/rocm-hip/ I use it for testing the packages before I put them into rawhide, as I don't actually drive fedora rawhide on my local machine. The repo is usually in flux, so your millage may vary. I installed your ROCm. Thank you for making this copr. They seem to work very well on my 6800XT. But, I haven't had time to swap in the 7900. But, I'll try to do that this week or weekend. Fingers crossed. No problem! I use it for my own testing, I might advertise it more for people looking to use the latest ROCm. I'm trying to avoid back-porting ROCm too much unless there's something broken, e.g. this ticket. Just an update: I'm still working on testing it. A new problem has cropped up, so testing is delayed. I'll get to it ASAP. I'm on your upstream repo, though, so I'm prepared. I did want to say that rc1 & rc2 have been unstable. The OpenCL problem causes a video crash, but doesn't really hurt the system. When rc1/rc2 have crashed, it has caused FS errors that require fsck, which isn't the worst thing, but is concerning, takes a bit of effort to fix, and could potentially cause a serious problem. These crashes also produce no information; the journal is damaged and nothing useful survives. FYI. |