Bug 2187824
Summary: | Folding@Home client crashes after Fedora 38 upgrade | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Brad Jackson <bjackson0971> |
Component: | llvm | Assignee: | Tulio Magno Quites Machado Filho <tuliom> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 38 | CC: | dmalcolm, fzatlouk, jakub, jchecahi, jistone, kkleine, npopov, rh, scottt.tw, sergesanspaille, siddharth.kde, tbaeder, tstellar, tuliom |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-04-27 12:26:05 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Brad Jackson
2023-04-18 19:32:02 UTC
@bjackson0971 , if this has been fixed with LLVM 16.0.1, could you check if the following update fixes the issue for you, please? https://bodhi.fedoraproject.org/updates/FEDORA-2023-36b95f852a (In reply to Tulio Magno Quites Machado Filho from comment #1) > @bjackson0971 , if this has been fixed with LLVM 16.0.1, could you > check if the following update fixes the issue for you, please? > https://bodhi.fedoraproject.org/updates/FEDORA-2023-36b95f852a The crash still happens with this llvm and llvm-libs 16.0.1-1.fc38 build and the stock spirv-llvm-translator package. I have to also install my patched spirv-llvm-translator srpm build to fix the crash. I also tried downgrading llvm and llvm-libs to 16.0.0-2.fc38 and kept my patched spirv-llvm-translator, and that also fixes it. It appears the problem is actually in the translator package. @bjackson0971 , Could you confirm which version of spirv-llvm-translator is installed when the issue happens, please? (In reply to Tulio Magno Quites Machado Filho from comment #3) > @bjackson0971 , Could you confirm which version of > spirv-llvm-translator is installed when the issue happens, please? Version spirv-llvm-translator-16.0.0-1.fc38.x86_64 is the only stock version available for Fedora 38 and the crash happens with it installed. My patched srpm build is the only fix I've found. FEDORA-2023-36b95f852a has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-36b95f852a I've updated spirv-llvm-translator downstream patch which should, together with llvm 16.0.1, address the issue. Confirmed that spirv-llvm-translator-16.0.0-2.fc38 fixes the crash with both llvm-16.0.0-2.fc38 and llvm-16.0.1-1.fc38. FEDORA-2023-36b95f852a has been pushed to the Fedora 38 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-36b95f852a` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-36b95f852a See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. This seems to fix my pyopencl program ( https://github.com/ali1234/vhs-teletext ) - but the performance is AWFUL - at least 3/4 times slower than f37. (In reply to Dr. David Alan Gilbert from comment #9) > This seems to fix my pyopencl program ( > https://github.com/ali1234/vhs-teletext ) - but the performance is AWFUL - > at least 3/4 times slower than f37. David, could you report this performance issue in a new bug, please? We'll need some details, e.g.: 1. How can I reproduce this slowdown? i.e. which steps do I have to execute. A small reproducer is ideal. 2. If you can profile the code before and after is even better. 3. Details about the execution, e.g. processor, OS used before and after. (In reply to Tulio Magno Quites Machado Filho from comment #10) > (In reply to Dr. David Alan Gilbert from comment #9) > > This seems to fix my pyopencl program ( > > https://github.com/ali1234/vhs-teletext ) - but the performance is AWFUL - > > at least 3/4 times slower than f37. > > David, could you report this performance issue in a new bug, please? Sure, will do - what component would you like it against? > We'll need some details, e.g.: > 1. How can I reproduce this slowdown? i.e. which steps do I have to execute. > A small reproducer is ideal. It's tricky, since I've only got the one OpenCL application I've been using regularly and have perf numbers for; it is open but you need a datafile to process with it. > 2. If you can profile the code before and after is even better. There's very little host CPU usage (before or after), so I assume it's one of: a) The SPIR code generated (except that I tried forcing the old code in and that's still slow as far as I can tell) b) The translation of the SPIR to the native Radeon c) Something else in the environment (but I have tried downgrading the kernel to f37) Tips on profiling of the GPU behaviour are welcome. > 3. Details about the execution, e.g. processor, OS used before and after. Sure. Dave (In reply to Dr. David Alan Gilbert from comment #11) > Sure, will do - what component would you like it against? LLVM is fine. We can change that later as we get more details. > It's tricky, since I've only got the one OpenCL application I've been using > regularly and have perf numbers for; it is open but you need a datafile to > process with it. No problem. All we need is to reproduce and debug the issue. FEDORA-2023-36b95f852a has been pushed to the Fedora 38 stable repository. If problem still persists, please make note of it in this bug report. (In reply to Tulio Magno Quites Machado Filho from comment #12) > (In reply to Dr. David Alan Gilbert from comment #11) > > Sure, will do - what component would you like it against? > > LLVM is fine. We can change that later as we get more details. > > > It's tricky, since I've only got the one OpenCL application I've been using > > regularly and have perf numbers for; it is open but you need a datafile to > > process with it. > > No problem. All we need is to reproduce and debug the issue. This was a red herring; so it's actually all fine - sorry for the noise. (The speed is data dependent, and I'd previously seen ranges of 700-1200 lps on this test; the recovered data on the day I upgraded to f38 triggered a case of ~200 lps which I'd never seen something anywhere that bad before; bad luck it happened the same day) (In reply to Dr. David Alan Gilbert from comment #14) > This was a red herring; so it's actually all fine - sorry for the noise. > (The speed is data dependent, and I'd previously seen ranges of 700-1200 lps > on this test; the recovered data on the day I upgraded to f38 triggered a > case of ~200 lps which I'd never seen something anywhere that bad before; > bad luck it happened the same day) Great! Thanks for the update! |