Bug 2156595
| Summary: | rocm-opencl makes clinfo crash when installed in parallel with mesa-libOpenCL | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Dominik 'Rathann' Mierzejewski <dominik> |
| Component: | rocm-opencl | Assignee: | Jeremy Newton <alexjnewt> |
| Status: | CLOSED ERRATA | QA Contact: | |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 37 | CC: | alexjnewt, chplee, dkxls23, maigurs, obmun.h, vovkap97 |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | rocm-opencl-5.5.1-1.fc38 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-06-03 02:44:28 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Dominik 'Rathann' Mierzejewski
2022-12-27 22:03:48 UTC
Same issue still present with the more recent rocm-opencl-5.4.1-1.fc37.x86_64 *** Bug 2143687 has been marked as a duplicate of this bug. *** This error: > mesa: CommandLine Error: Option 'h' registered more than once! > LLVM ERROR: inconsistency in registered CommandLine options > Aborted (core dumped) Is fixed in this Fedora 38 update: https://bodhi.fedoraproject.org/updates/FEDORA-2023-05720f124e If you already upgraded to Fedora 38, please test. I'll see if I can backport it to Fedora 37. It's showing another error, but the problem is still present: sudo dnf install mesa-libOpenCL rocm-clinfo : CommandLine Error: Option 'abort-on-max-devirt-iterations-reached' registered more than once! LLVM ERROR: inconsistency in registered CommandLine options fish: Job 1, 'rocm-clinfo' terminated by signal SIGABRT (Abort) sudo dnf rm mesa-libOpenCL rocm-clinfo Number of platforms: 2 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.1 AMD-APP (3513.0) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback Platform Profile: FULL_PROFILE Platform Version: OpenCL 3.0 PoCL 3.1 Linux, Release, RELOC, SPIR, LLVM 16.0.0, SLEEF, FP16, DISTRO, POCL_DEBUG Platform Name: Portable Computing Language Platform Vendor: The pocl project Platform Extensions: cl_khr_icd cl_pocl_content_size Platform Name: AMD Accelerated Parallel Processing Number of devices: 2 Device Type: CL_DEVICE_TYPE_GPU Vendor ID: 1002h Board name: AMD Radeon RX 6900 XT ... Versions: dnf list --installed | grep rocm rocm-clinfo.x86_64 5.4.3-2.fc38 @updates rocm-comgr.x86_64 16.0-2.fc38 @updates rocm-comgr-debuginfo.x86_64 5.3.0-1.fc37 @updates-debuginfo rocm-comgr-devel.x86_64 16.0-2.fc38 @updates rocm-compilersupport-debugsource.x86_64 5.3.0-1.fc37 @updates-debuginfo rocm-device-libs.x86_64 16.0-1.fc38 @fedora rocm-opencl.x86_64 5.4.3-2.fc38 @updates rocm-opencl-devel.x86_64 5.4.3-2.fc38 @updates rocm-runtime.x86_64 5.4.1-3.fc38 @fedora rocm-runtime-devel.x86_64 5.4.1-3.fc38 @fedora rocm-smi.noarch 4.0.0-8.fc38 @fedora rocminfo.x86_64 5.4.1-2.fc38 @fedora Thanks for the feedback, I'll contact upstream with this info. So I back-ported the fix to f37 and I can't reproduce any error right now with this update: https://bodhi.fedoraproject.org/updates/FEDORA-2023-994e29c721 It's possible the LLVM 16 upgrade in Fedora 38 causes a regression (as compared to Fedora 37's LLVM 15), or maybe there's something unique to your system that makes it not reproduce on my end. I think I might have a AMD Radeon RX 6700 accessible to me that I can test out, as the current HW on my system is from the RX 5xxx series. I also spoke to the upstream developers and the fix that they suggested might require major packaging changes in other fedora packages. Either way, I'll need to reproduce before I can proceed with any fix yet. I've tested with f37 in toolbox:
toolbox create --release 37
sudo dnf install 'rocm-*'
rocm-clinfo
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1 AMD-APP (3513.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
sudo dnf install mesa-libOpenCL
rocm-clinfo
Segmentation fault (core dumped)
backtrace:
Thread 1 "rocm-clinfo" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff72cc2de in clover::device::supports_ir (ir=PIPE_SHADER_IR_NATIVE,
this=0x55555569dc10) at ../src/gallium/frontends/clover/core/device.cpp:502
#2 clover::device::device (this=this@entry=0x55555569dc10, platform=..., ldev=0x5555556a1210)
at ../src/gallium/frontends/clover/core/device.cpp:165
#3 0x00007ffff72db5cf in clover::create<clover::device, clover::platform&, pipe_loader_device*&>
() at ../src/gallium/frontends/clover/util/pointer.hpp:240
#4 clover::platform::platform (
this=this@entry=0x7ffff7537100 <(anonymous namespace)::_clover_platform>)
at ../src/gallium/frontends/clover/core/platform.cpp:41
#5 0x00007ffff729f7fd in __static_initialization_and_destruction_0 (__priority=65535,
__initialize_p=1) at ../src/gallium/frontends/clover/api/platform.cpp:34
#6 0x00007ffff7fcccde in call_init (env=0x7fffffffe1d8, argv=0x7fffffffe1c8, argc=1,
l=<optimized out>) at dl-init.c:70
#7 call_init (l=<optimized out>, argc=1, argv=0x7fffffffe1c8, env=0x7fffffffe1d8)
at dl-init.c:26
#8 0x00007ffff7fccdcc in _dl_init (main_map=0x555555614f20, argc=1, argv=0x7fffffffe1c8,
env=0x7fffffffe1d8) at dl-init.c:117
#9 0x00007ffff7ca8f14 in __GI__dl_catch_exception (exception=<optimized out>,
operate=<optimized out>, args=<optimized out>)
at /usr/src/debug/glibc-2.36-9.fc37.x86_64/elf/dl-error-skeleton.c:182
#10 0x00007ffff7fd3736 in dl_open_worker (a=a@entry=0x7fffffffd7c0) at dl-open.c:808
#11 0x00007ffff7ca8ebe in __GI__dl_catch_exception (exception=<optimized out>,
operate=<optimized out>, args=<optimized out>)
at /usr/src/debug/glibc-2.36-9.fc37.x86_64/elf/dl-error-skeleton.c:208
#12 0x00007ffff7fd3acc in _dl_open (file=0x555555613970 "libMesaOpenCL.so.1",
mode=<optimized out>, caller_dlopen=0x7ffff7f9789f <_open_driver+303>, nsid=<optimized out>,
argc=1, argv=0x7fffffffe1c8, env=0x7fffffffe1d8) at dl-open.c:884
#13 0x00007ffff7be123c in dlopen_doit (a=a@entry=0x7fffffffda30) at dlopen.c:56
#14 0x00007ffff7ca8ebe in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffd990,
operate=<optimized out>, args=<optimized out>)
at /usr/src/debug/glibc-2.36-9.fc37.x86_64/elf/dl-error-skeleton.c:208
#15 0x00007ffff7ca8f73 in __GI__dl_catch_error (objname=0x7fffffffd9e8,
errstring=0x7fffffffd9f0, mallocedp=0x7fffffffd9e7, operate=<optimized out>,
args=<optimized out>) at /usr/src/debug/glibc-2.36-9.fc37.x86_64/elf/dl-error-skeleton.c:227
#16 0x00007ffff7be0d0f in _dlerror_run (operate=operate@entry=0x7ffff7be11e0 <dlopen_doit>,
args=args@entry=0x7fffffffda30) at dlerror.c:138
#17 0x00007ffff7be12f1 in dlopen_implementation (dl_caller=<optimized out>,
--Type <RET> for more, q to quit, c to continue without paging--c
mode=<optimized out>, file=<optimized out>) at dlopen.c:71
#18 ___dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:81
#19 0x00007ffff7f9789f in _load_icd (lib_path=0x555555613970 "libMesaOpenCL.so.1", num_icds=1)
at /usr/src/debug/ocl-icd-2.3.1-2.fc37.x86_64/ocl_icd_loader.c:208
#20 _open_driver (num_icds=num_icds@entry=1,
dir_path=dir_path@entry=0x7ffff7fac0a4 "/etc/OpenCL/vendors",
file_path=file_path@entry=0x555555578f43 "mesa.icd")
at /usr/src/debug/ocl-icd-2.3.1-2.fc37.x86_64/ocl_icd_loader.c:261
#21 0x00007ffff7f9ad16 in _open_drivers (dir_path=<optimized out>, dir=<optimized out>)
at /usr/src/debug/ocl-icd-2.3.1-2.fc37.x86_64/ocl_icd_loader.c:274
#22 __initClIcd () at /usr/src/debug/ocl-icd-2.3.1-2.fc37.x86_64/ocl_icd_loader.c:767
#23 _initClIcd_real () at /usr/src/debug/ocl-icd-2.3.1-2.fc37.x86_64/ocl_icd_loader.c:824
#24 0x00007ffff7f9ce14 in _initClIcd ()
at /usr/src/debug/ocl-icd-2.3.1-2.fc37.x86_64/ocl_icd_loader.c:853
#25 clGetPlatformIDs (num_entries=0, platforms=0x0, num_platforms=0x7fffffffdc14)
at /usr/src/debug/ocl-icd-2.3.1-2.fc37.x86_64/ocl_icd_loader.c:1018
#26 0x000055555555e547 in cl::Platform::get (platforms=platforms@entry=0x7fffffffdd90)
at /usr/src/debug/rocm-opencl-5.4.3-1.fc37.x86_64/tools/clinfo/../../khronos/headers/opencl2.2/CL/../CL/cl2.hpp:2474
#27 0x0000555555556f58 in main (argc=<optimized out>, argv=<optimized out>)
at /usr/src/debug/rocm-opencl-5.4.3-1.fc37.x86_64/tools/clinfo/clinfo.cpp:75
*** Bug 2149162 has been marked as a duplicate of this bug. *** *** Bug 2157619 has been marked as a duplicate of this bug. *** Some observations: - I can't reproduce this on up to date Fedora 37 system - I can reproduce with a RX 6750 XT on Fedora 38 - I can't reproduce on Fedora 37 with Fedora 38 toolbox with the same HW Seems strange. I'll update this if I ever figure it out. * note I can't reproduce on other HW period. I believe this update fixes the issue: https://bodhi.fedoraproject.org/updates/FEDORA-2023-68012d0819 I can't reproduce it anymore now. Can anyone confirm? It's working now, with and without mesa. Good job! FEDORA-2023-68012d0819 has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-68012d0819 No problem! I tagged it on the update. FEDORA-2023-68012d0819 has been pushed to the Fedora 38 stable repository. If problem still persists, please make note of it in this bug report. |