Bug 2207599
Summary: | linking module flags 'amdgpu_code_object_version': IDs have conflicting values in '' and 'llvm-link' | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | szymondudek9 |
Component: | rocm-opencl | Assignee: | Jeremy Newton <alexjnewt> |
Status: | CLOSED ERRATA | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 38 | CC: | adrian.gygax, alexjnewt, dkxls23, mejh63, philipp-dev |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | rocm-opencl-5.5.1-1.fc39 rocm-opencl-5.5.1-1.fc38 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-05-27 01:19:35 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
szymondudek9
2023-05-16 10:35:04 UTC
Same problem here after upgrading from F37 to F38. So to be clear, installing 5.5.0-1.fc38 on an up-to-date Fedora 38 system does not work, but taking the same system and installing the rocm-opencl-5.4.3-2.fc37 package works? Just to confirm as well, can you get me the output of: sudo yum list installed rocm* (In reply to Jeremy Newton from comment #2) > So to be clear, installing 5.5.0-1.fc38 on an up-to-date Fedora 38 system > does not work, but taking the same system and installing the > rocm-opencl-5.4.3-2.fc37 package works? > > Just to confirm as well, can you get me the output of: > > sudo yum list installed rocm* Right, this package ( rocm-opencl-5.4.3-2.fc37 ) work on the same system, i wasn't change system version, just install one package from F37 repo. I can't right now give to you output from this command line, but i can tell u in 1000% that after this command : sudo dnf downgrade --releasever=37 rocm-opencl i have only this selected package to reinstall (which is rocm-opencl ), so rest of this package's dependency is stil the same as before, nothing else was changed, just rocm-opencl-5.5.0-1.fc38 to rocm-opencl-5.4.3-2.fc37. I hope now it's clear and understood. BEFORE sudo yum list installed rocm* Installed Packages rocm-comgr.x86_64 16.1-2.fc38 @updates rocm-opencl.x86_64 5.5.0-1.fc38 @updates rocm-runtime.x86_64 5.5.0-1.fc38 @updates AFTER sudo yum list installed rocm* Installed Packages rocm-comgr.x86_64 16.1-2.fc38 @updates rocm-opencl.x86_64 5.4.3-2.fc37 @updates rocm-runtime.x86_64 5.5.0-1.fc38 @updates There is a similar issue on github: https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/53 Following the procedure in that issue I get the following output for clinfo AMD_COMGR_SAVE_TEMPS=1 AMD_COMGR_REDIRECT_LOGS=stdout AMD_COMGR_EMIT_VERBOSE_LOGS=1 clinfo amd_comgr_do_action: ActionKind: AMD_COMGR_ACTION_ADD_PRECOMPILED_HEADERS IsaName: amdgcn-amd-amdhsa--gfx1010:xnack- Options: "-O3" "-cl-kernel-arg-info" "-D__OPENCL_VERSION__=200" "-D__IMAGE_SUPPORT__=1" "-Xclang" "-cl-ext=+cl_khr_fp64,+cl_khr_global_int32_base_atomics,+cl_khr_global_int32_extended_atomics,+cl_khr_local_int32_base_atomics,+cl_khr_local_int32_extended_atomics,+cl_khr_int64_base_atomics,+cl_khr_int64_extended_atomics,+cl_khr_3d_image_writes,+cl_khr_byte_addressable_store,+cl_khr_fp16,+cl_khr_gl_sharing,+cl_amd_device_attribute_query,+cl_amd_media_ops,+cl_amd_media_ops2,+cl_khr_image2d_from_buffer,+cl_khr_subgroups,+cl_amd_copy_buffer_p2p,+cl_amd_assembly_program" "-mllvm" "-amdgpu-prelink" "-mcode-object-version=5" Path: Language: AMD_COMGR_LANGUAGE_OPENCL_1_2 ReturnStatus: AMD_COMGR_STATUS_SUCCESS amd_comgr_do_action: ActionKind: AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC IsaName: amdgcn-amd-amdhsa--gfx1010:xnack- Options: "-O3" "-cl-kernel-arg-info" "-D__OPENCL_VERSION__=200" "-D__IMAGE_SUPPORT__=1" "-Xclang" "-cl-ext=+cl_khr_fp64,+cl_khr_global_int32_base_atomics,+cl_khr_global_int32_extended_atomics,+cl_khr_local_int32_base_atomics,+cl_khr_local_int32_extended_atomics,+cl_khr_int64_base_atomics,+cl_khr_int64_extended_atomics,+cl_khr_3d_image_writes,+cl_khr_byte_addressable_store,+cl_khr_fp16,+cl_khr_gl_sharing,+cl_amd_device_attribute_query,+cl_amd_media_ops,+cl_amd_media_ops2,+cl_khr_image2d_from_buffer,+cl_khr_subgroups,+cl_amd_copy_buffer_p2p,+cl_amd_assembly_program" "-mllvm" "-amdgpu-prelink" "-mcode-object-version=5" Path: Language: AMD_COMGR_LANGUAGE_OPENCL_1_2 COMGR::executeInProcessDriver argv: clang "-cc1" "-mcode-object-version=5" "-mllvm" "--amdhsa-code-object-version=5" "-triple" "amdgcn-amd-amdhsa" "-emit-llvm-bc" "-emit-llvm-uselists" "-clear-ast-before-backend" "-disable-llvm-verifier" "-discard-value-names" "-main-file-name" "CompileSource" "-mrelocation-model" "pic" "-pic-level" "1" "-fhalf-no-semantic-interposition" "-mframe-pointer=none" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-fvisibility=hidden" "-fapply-global-visibility-to-externs" "-target-cpu" "gfx1010" "-target-feature" "-xnack" "-mllvm" "-treat-scalable-fixed-error-as-warning" "-debugger-tuning=gdb" "-resource-dir" "lib64/clang/16" "-include-pch" "/tmp/comgr-b43a8d/include/opencl1.2-c.pch" "-I" "/tmp/comgr-b43a8d/include" "-D" "__OPENCL_VERSION__=200" "-D" "__IMAGE_SUPPORT__=1" "-O3" "-std=cl1.2" "-fdebug-compilation-dir=/home/Philipp/Test/2020-11-22_ROCM/ROCclr" "-ferror-limit" "19" "-cl-kernel-arg-info" "-nogpulib" "-fgnuc-version=4.2.1" "-fno-threadsafe-statics" "-fcolor-diagnostics" "-vectorize-loops" "-vectorize-slp" "-fno-validate-pch" "-cl-ext=+cl_khr_fp64,+cl_khr_global_int32_base_atomics,+cl_khr_global_int32_extended_atomics,+cl_khr_local_int32_base_atomics,+cl_khr_local_int32_extended_atomics,+cl_khr_int64_base_atomics,+cl_khr_int64_extended_atomics,+cl_khr_3d_image_writes,+cl_khr_byte_addressable_store,+cl_khr_fp16,+cl_khr_gl_sharing,+cl_amd_device_attribute_query,+cl_amd_media_ops,+cl_amd_media_ops2,+cl_khr_image2d_from_buffer,+cl_khr_subgroups,+cl_amd_copy_buffer_p2p,+cl_amd_assembly_program" "-mllvm" "-amdgpu-prelink" "-faddrsig" "-o" "/tmp/comgr-b43a8d/output/CompileSource.bc" "-x" "cl" "/tmp/comgr-b43a8d/input/CompileSource" ReturnStatus: AMD_COMGR_STATUS_SUCCESS amd_comgr_do_action: ActionKind: AMD_COMGR_ACTION_ADD_DEVICE_LIBRARIES IsaName: amdgcn-amd-amdhsa--gfx1010:xnack- Options: "code_object_v5" Path: Language: AMD_COMGR_LANGUAGE_OPENCL_1_2 ReturnStatus: AMD_COMGR_STATUS_SUCCESS amd_comgr_do_action: ActionKind: AMD_COMGR_ACTION_LINK_BC_TO_BC IsaName: amdgcn-amd-amdhsa--gfx1010:xnack- Options: "code_object_v5" Path: Language: AMD_COMGR_LANGUAGE_OPENCL_1_2 ERROR: linking module flags 'amdgpu_code_object_version': IDs have conflicting values in '' and 'llvm-link' ReturnStatus: AMD_COMGR_STATUS_ERROR === CL_PROGRAM_BUILD_LOG === Error: Linking bitcode failed: linking source & IR libraries. It looks like most stuff is compiled as a code object v5, but linking with device libs fails. Interestingly llvm itself does not make v5 the default (at least according to the docs): https://www.llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata But even with v5 code objects built, comgr should in theory link in matching v5 device libs: https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/commit/5a1beb6417b7680c29ac131933dd99791141995e However, if I run llvm-dis -o - /usr/lib64/amdgcn/bitcode/oclc_abi_version_500.bc I get ; ModuleID = '/usr/lib64/amdgcn/bitcode/oclc_abi_version_500.bc' source_filename = "llvm-link" target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7" target triple = "amdgcn-amd-amdhsa" @__oclc_ABI_version = linkonce_odr protected local_unnamed_addr addrspace(4) constant i32 500, align 4 !opencl.ocl.version = !{!0} !llvm.ident = !{!1} !llvm.module.flags = !{!2, !3, !4} !0 = !{i32 2, i32 0} !1 = !{!"clang version 16.0.0 (Fedora 16.0.0-2.fc38)"} !2 = !{i32 1, !"amdgpu_code_object_version", i32 400} !3 = !{i32 1, !"wchar_size", i32 4} !4 = !{i32 8, !"PIC Level", i32 1} Wondering where this goes wrong that amdgpu_code_object_version is set to 400 even for the v5 file. I suspect that clang-16 does not support the -mcode-object-version=none command which is used in https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/087bef746cd0422b0bfef2f3f713e4deb38803d1/cmake/OCL.cmake#L42 That's at least what the kodi builds suggest: https://kojipkgs.fedoraproject.org//packages/rocm-device-libs/16.1/1.fc38/data/logs/x86_64/build.log Until a workaround for -mcode-object-version=none becomes generally available, I tried to patch the .bc files and strip the amdgpu_code_object_version line with sth like for file in oclc*; do llvm-dis -o - $file | sed 's/^!2.*//g' | sed 's/!{!2, !3/!{!3/g' | llvm-as -o $file -; done and editing the remaining bc files likewise. However, the issue persists. Maybe sb else has an idea? Hi. I have had this problem on my fedora box too. I grabbed the source rpm and hardcoded the 2 references to "-mcode-object-version=" in ROCclr-rocm-5.5.0/device/devprogram.cpp to be 4, and built it and it works. // driverOptions.push_back("-mcode-object-version=" + std::to_string(options->oVariables->LCCodeObjectVersion)); driverOptions.push_back("-mcode-object-version=" + std::to_string(4)); and // codegenOptions.push_back("-mcode-object-version=" + std::to_string(options->oVariables->LCCodeObjectVersion)); codegenOptions.push_back("-mcode-object-version=" + std::to_string(4)); Actually, it seems to work if you hardcode the version in both places to '5'. Sorry, I missed a step in rebuilding the rpm.. it does not work with both set to 5. Wow thanks! That really helped narrow down the issue for me. I was busy trying to get HIP working in Fedora (RHBZ#2209759), so I wasn't paying enough attention to this. I'll push an update later today and link this bug. I believe I just need to cherry-pick this to comgr: https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/commit/79948e1807bca7108722982b9018d61dde9420f2 Mike, if you can, can you try applying that? Or wait for my bodhi update and test that? It's building: https://koji.fedoraproject.org/koji/taskinfo?taskID=101538720 If that doesn't work, I'll revert and just hardcode to 4 in rocclr until I can contact upstream for suggestions. FEDORA-2023-f4164e5c06 has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2023-f4164e5c06 FEDORA-2023-f4164e5c06 has been pushed to the Fedora 39 stable repository. If problem still persists, please make note of it in this bug report. Hi. I can test, but I need a fedora 38 build. I installed rocm-comgr-16.1-3.fc39.x86_64 and still am getting the errors. ok, I am not sure what I need to do here too. In the spec file for opencl, I see these dependencies. BuildRequires: rocm-comgr-devel BuildRequires: rocm-runtime-devel I assume for you fix to work that opencl needs to be rebuilt with the fixed comgr? I don't think you need to rebuild openCL on comgr change. I might need to just sit down and experiment with a few combinations to understand the real cause. I'm curious why upstream is not seeing this issue, but maybe opencl from 5.5.0 is just buggy with object version 5, and we need to force it to use 4. I think for now I'll roll back comgr, since that seems to have done nothing, and just revert this change in my next update: https://github.com/ROCm-Developer-Tools/ROCclr/commit/041c00465b7adcee78085dc42253d42d1bb1f250 It doesn't effectively the same thing as hardcoding to 4. FEDORA-2023-68012d0819 has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-68012d0819 I pushed the F38 update first, please try that and let me know if that resolves your issue. Looks good to me! I installed this over the one I built and everything is still working. FEDORA-2023-80eb7f41de has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2023-80eb7f41de FEDORA-2023-80eb7f41de has been pushed to the Fedora 39 stable repository. If problem still persists, please make note of it in this bug report. FEDORA-2023-68012d0819 has been pushed to the Fedora 38 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-68012d0819` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-68012d0819 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. Thanks! I pushed the fix to rawhide, but the f38 update will be in testing for the next week FEDORA-2023-68012d0819 has been pushed to the Fedora 38 stable repository. If problem still persists, please make note of it in this bug report. FEDORA-2023-d024444040 has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2023-d024444040 FEDORA-2023-d024444040 has been pushed to the Fedora 39 stable repository. If problem still persists, please make note of it in this bug report. |