Bug 2209759 - Review Request: rocclr - ROCm Compute Language Runtime
Summary: Review Request: rocclr - ROCm Compute Language Runtime
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: Package Review
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Tom Rix
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-24 16:32 UTC by Jeremy Newton
Modified: 2023-06-19 02:09 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-06-06 16:24:08 UTC
Type: ---
Embargoed:
trix: fedora-review+


Attachments (Terms of Use)

Description Jeremy Newton 2023-05-24 16:32:52 UTC
Spec URL: https://mystro256.fedorapeople.org/rocclr.spec
SRPM URL: https://mystro256.fedorapeople.org/rocclr-5.5.0-1.fc39.src.rpm
Description: ROCm Compute Language Runtime
Fedora Account System Username: mystro256
Copr Build: https://copr.fedorainfracloud.org/coprs/mystro256/rocm-hip/build/5952345/

Background:
I maintain rocm-opencl in Fedora and wanted to introduce rocm-hip to Fedora too. I noticed that rocm-hip shares a lot of code with rocm-opencl, so I was working with upstream via email to help clarify the source. They have agreed that for ROCm 5.7.0, they will merge all the source into one rocclr tree to reduce confusion and allow for easier distribution of source code. As a result, I think I makes more sense to introduce a ROCclr package with opencl/hip as subpackages.

The "preview" of rocm 5.7's new source organisation is located here:
https://github.com/ROCm-Developer-Tools/clr

This spec file emulates the future source organisation, so it should be very simple to transition to ROCm 5.7.0 when it release. Therefore when I update to 5.7.0 in the future, I should be able to drop Source 2 and 3, Patch 0 and 1, as well as the extraction logic in %prep.

Notes:
- As per above, this will replace and supersede rocm-opencl, so a lot of it is copied from the rocm-opencl spec file
- I will retire rocm-opencl after this is approved, so ignore any conflicts with it
- HIP attempts to be a platform generic API, while hipamd is AMD's vendor specific implementation, so I've wrote the spec file to allow easy separation later if HIP becomes more separated/detached from ROCm (see hip-devel) or there is value to Fedora keeping them separated

RPMlint output:
> rocm-hip.x86_64: E: shared-library-without-dependency-information /usr/lib64/libhiprtc-builtins.so.5.5.30201

GDB fails to extract debug data from this library. I believe it's because it's in a format that's not supported. Let me know if this is a blocker for inclusion.

> hip-devel.noarch: W: no-manual-page-for-binary hipcc
> hip-devel.noarch: W: no-manual-page-for-binary hipcc.pl
> hip-devel.noarch: W: no-manual-page-for-binary hipcc_cmake_linker_helper
> hip-devel.noarch: W: no-manual-page-for-binary hipconfig
> hip-devel.noarch: W: no-manual-page-for-binary hipconfig.pl
> hip-devel.noarch: W: no-manual-page-for-binary hipdemangleatp
> rocm-clinfo.x86_64: W: no-manual-page-for-binary rocm-clinfo
> rocm-hip-devel.x86_64: W: no-manual-page-for-binary roc-obj
> rocm-hip-devel.x86_64: W: no-manual-page-for-binary roc-obj-extract
> rocm-hip-devel.x86_64: W: no-manual-page-for-binary roc-obj-ls

I have expressed to upstream over email that I would like to contribute manpages, but I'm a bit busy, so they might take some time.

> rocm-clinfo.x86_64: W: no-documentation
> rocm-hip-devel.x86_64: W: no-documentation
> rocm-opencl.x86_64: W: no-documentation
> rocm-opencl-devel.x86_64: W: no-documentation

Not applicable/no docs available.

> rocm-hip.x86_64: E: executable-stack /usr/lib64/libamdhip64.so.5.5.30201
> rocm-hip.x86_64: E: executable-stack /usr/lib64/libhiprtc-builtins.so.5.5.30201
> rocm-hip.x86_64: E: executable-stack /usr/lib64/libhiprtc.so.5.5.30201

I've looked through the source and don't see anything obviously wrong. I'll do a deeper dive later, but I suspect it's just due to how hipamd is written, not a mistake per say, so it would take some time to resolve. Let me know if it's a blocking issue.

Comment 1 Jeremy Newton 2023-05-24 16:56:45 UTC
A note about the copr build, I didn't mean to build Fedora 37, since I wrote this spec file with Fedora 38 onwards in mind. Please ignore the Fedora 37 failures.

I'll need to look into the aarch64 failure on rawhide, but it works fine on Fedora 38, so it's probably trivial to fix or unrelated generic F39 issue.

Comment 2 Tom Rix 2023-05-28 18:20:55 UTC
The documentation could be improved by adding
BuildRequires: doxygen
Lots of docs will be generated.

A change to fix the no executable stack 
diff --git a/rocclr.spec b/rocclr.spec
index 0752e05..73fb54e 100644
--- a/rocclr.spec
+++ b/rocclr.spec
@@ -10,7 +10,7 @@
 
 Name:           rocclr
 Version:        %{rocm_version}
-Release:        1%{?dist}
+Release:        2%{?dist}
 Summary:        ROCm Compute Language Runtime
 Url:            https://github.com/ROCm-Developer-Tools/clr
 License:        MIT
@@ -152,6 +152,7 @@ sed -i "s|\(/usr/bin/\)env perl|\1perl|" HIP-rocm-%{version}/bin/hipcc.pl
 
 %build
 %cmake \
+    -DCMAKE_SHARED_LINKER_FLAGS=-Wl,-z,noexecstack \
     -DHIP_COMMON_DIR=$(realpath HIP-rocm-%{version}) \
     -DCMAKE_INSTALL_LIBDIR=%{_lib} \
     -DHIP_PLATFORM=amd \
@@ -251,5 +252,8 @@ fi
 %exclude %{_datadir}/hip/samples
 
 %changelog
+* Sun May 28 2023 Tom Rix <trix> - 5.5.0-2
+- Add noexecstack linker option
+
 * Tue May 16 2023 Jeremy Newton <alexjnewt at hotmail dot com> - 5.5.0-1
 - Initial package

A couple of things. in the config step of this package

-- ROCm Agent Enumurator Not Found
BuildRequires and Requires should have rocminfo package

I am testing by building rocBLAS, so far i have had to
export DEVICE_LIB_PATH=/usr/lib64/amdgcn/bitcode

there should be a automatic connect between where rocm-device-libs installs this bitcode dir
and where hipcc/clang picks them up with '--rocm-device-lib-path' 

Tom Stellar what do you think ?

Comment 3 Philipp K. 2023-05-28 21:16:32 UTC
I really like this moving forward :)
In fact I once tried to package ROCM 3.9 on my own, and sifted through some of my patches, and it seems some might still be noteworthy. Note that I am usually not a packager, so it is quite likely that one or the other things needs some polishing. I will go through some of them one comment at a time, to structure this stuff a bit.

Regarding the device libs, I also ran in the issue with clang not finding them in the default paths. Clang seems to expect them in whatever `clang --print-resource-dir` returns. My spec file accordingly contained the following:

> %define clang_resource_dir "%(clang --print-resource-dir)"
>
> # TODO: usually cmake would install the .bc files to /usr/amdgcn/bincode/
> #       Moving things to the clang resource dir should be the best choice currently:
> #       https://reviews.llvm.org/D82930
> mkdir -p %{buildroot}%{clang_resource_dir}
> mv %{buildroot}%{_prefix}/amdgcn %{buildroot}%{clang_resource_dir}/
> mv %{buildroot}%{_prefix}/lib/cmake %{buildroot}%{_libdir}

In order to make other depending packages find the moved device libs, my builds also contained the following patch

From 861de633dd5c6c9ecefa2adf49872f337ee9daa6 Mon Sep 17 00:00:00 2001
From: Philipp Knechtges <philipp-dev>
Date: Mon, 23 Nov 2020 19:40:05 +0100
Subject: [PATCH] adjust CMake Target to match install path

---
 cmake/Packages.cmake | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/cmake/Packages.cmake b/cmake/Packages.cmake
index 715ed95..55569cd 100644
--- a/cmake/Packages.cmake
+++ b/cmake/Packages.cmake
@@ -30,6 +30,9 @@ foreach(p ${count})
   set(AMD_DEVICE_LIBS_PREFIX_CODE "${AMD_DEVICE_LIBS_PREFIX_CODE}
 get_filename_component(AMD_DEVICE_LIBS_PREFIX \"\${AMD_DEVICE_LIBS_PREFIX}\" PATH)")
 endforeach()
+
+execute_process(COMMAND clang --print-resource-dir OUTPUT_VARIABLE CLANG_RESOURCE_DIR OUTPUT_STRIP_TRAILING_WHITESPACE)
+set(AMD_DEVICE_LIBS_PREFIX_CODE "set(AMD_DEVICE_LIBS_PREFIX \"${CLANG_RESOURCE_DIR}\" )")
 set(AMD_DEVICE_LIBS_TARGET_CODE)
 foreach(target ${AMDGCN_LIB_LIST})
   get_target_property(target_name ${target} ARCHIVE_OUTPUT_NAME)
-- 
2.28.0

I suspect I mostly did this for comgr to find the device libs in their new location. I did not execute anything, but from inspecting the github code, this patch might still be applicable.

Comment 4 Jeremy Newton 2023-05-29 03:03:27 UTC
Thanks for the input.

I've updated to 5.5.1 and have taken in the suggestions:
Spec URL: https://mystro256.fedorapeople.org/rocclr.spec
SRPM URL: https://mystro256.fedorapeople.org/rocclr-5.5.1-1.fc39.src.rpm
copr build: https://copr.fedorainfracloud.org/coprs/mystro256/rocm-hip/build/5971554/

Enabling docs throws a bunch of file-contains-date-and-time rpmlint warnings, which I haven't looked into yet.
I'm still somewhat puzzled by the aarch64 error. There might be a bug in the code defining something it shouldn't or similar. I might just only build opencl on aarch64.

As for rocm-device-libs, I am the maintainer of that too, and I've actually been considering moving it to the clang resource dir for a while now based on upstream's suggestion.
I was really hoping upstream would fix the default install location for me, but it seems like I'll need to deal with it myself.

If you're not aware, I patch the location in source as the default is not FHS compliant:
https://src.fedoraproject.org/rpms/rocm-device-libs/blob/rawhide/f/0001-Use-FHS-compliant-install.patch

Tom or Tom, let me know if you have any thoughts on this.

Comment 5 Philipp K. 2023-05-29 05:56:33 UTC
With the device libs stuff out of the way, I used to and with your new version still have problems with <math.h> not being found. This is mostly due to hipcc doing -isystem /usr/include to early, which breaks the #include_next logic of libstdc++. There are also a lot of paths to fix in hipcc and hipconfig.

A modernized patch of these issues would look sth like this

From ba9778102dc7d3ae6268b3ad0c4e8c165d44e153 Mon Sep 17 00:00:00 2001
From: Philipp Knechtges <philipp-dev>
Date: Mon, 29 May 2023 07:50:05 +0200
Subject: [PATCH] fixing some paths and avoid math.h not being found

---
 bin/hipcc.pl   | 22 ++++++++++++----------
 bin/hipvars.pm |  4 ++--
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/bin/hipcc.pl b/bin/hipcc.pl
index 2cd37529..49a5755f 100644
--- a/bin/hipcc.pl
+++ b/bin/hipcc.pl
@@ -131,6 +131,9 @@ $ROCM_PATH      =   $hipvars::ROCM_PATH;
 $HIP_VERSION    =   $hipvars::HIP_VERSION;
 $HIP_ROCCLR_HOME =   $hipvars::HIP_ROCCLR_HOME;
 
+$CLANG_RESOURCE_DIR = `$HIP_CLANG_PATH/$HIP_COMPILER --print-resource-dir`;
+chomp($CLANG_RESOURCE_DIR);
+
 if ($HIP_PLATFORM eq "amd") {
   # If using ROCclr runtime, need to find HIP_ROCCLR_HOME
   if (!defined $DEVICE_LIB_PATH and -e "$HIP_ROCCLR_HOME/lib/bitcode") {
@@ -138,7 +141,7 @@ if ($HIP_PLATFORM eq "amd") {
   }
   $HIP_INCLUDE_PATH = "$HIP_ROCCLR_HOME/include";
   if (!defined $HIP_LIB_PATH) {
-    $HIP_LIB_PATH = "$HIP_ROCCLR_HOME/lib";
+    $HIP_LIB_PATH = "$HIP_ROCCLR_HOME/lib64";
   }
 
   if (!defined $DEVICE_LIB_PATH) {
@@ -146,9 +149,7 @@ if ($HIP_PLATFORM eq "amd") {
       $DEVICE_LIB_PATH = "$ROCM_PATH/amdgcn/bitcode";
     }
     else {
-      # This path is to support an older build of the device library
-      # TODO: To be removed in the future.
-      $DEVICE_LIB_PATH = "$ROCM_PATH/lib";
+      $DEVICE_LIB_PATH = "$CLANG_RESOURCE_DIR/amdgcn/bitcode";
     }
   }
 }
@@ -194,7 +195,7 @@ if ($HIP_PLATFORM eq "amd") {
         $HIP_INCLUDE_PATH = "$HIP_PATH/include";
     }
     if (! defined $HIP_LIB_PATH) {
-        $HIP_LIB_PATH = "$HIP_PATH/lib";
+        $HIP_LIB_PATH = "$HIP_PATH/lib64";
     }
     if ($verbose & 0x2) {
         print ("ROCM_PATH=$ROCM_PATH\n");
@@ -231,9 +232,10 @@ if ($HIP_PLATFORM eq "amd") {
     exit (-1);
 }
 
-# Add paths to common HIP includes:
-$HIPCXXFLAGS .= " -isystem \"$HIP_INCLUDE_PATH\"" ;
-$HIPCFLAGS .= " -isystem \"$HIP_INCLUDE_PATH\"" ;
+# Including /usr/include too early breaks the #include_next logic of libstdc++
+# and one gets these nice errors "'math.h' file not found"
+#$HIPCXXFLAGS .= " -isystem \"$HIP_INCLUDE_PATH\"" ;
+#$HIPCFLAGS .= " -isystem \"$HIP_INCLUDE_PATH\"" ;
 
 my $compileOnly = 0;
 my $needCXXFLAGS = 0;  # need to add CXX flags to compile step
@@ -583,8 +585,8 @@ if ($HIP_PLATFORM eq "amd") {
     }
 
     if ($hasHIP) {
-        if ($DEVICE_LIB_PATH ne "$ROCM_PATH/amdgcn/bitcode") {
-            $HIPCXXFLAGS .= " --hip-device-lib-path=\"$DEVICE_LIB_PATH\"";
+        if ($DEVICE_LIB_PATH ne "$CLANG_RESOURCE_DIR/amdgcn/bitcode") {
+            $HIPCXXFLAGS .= " --rocm-device-lib-path=\"$DEVICE_LIB_PATH\"";
         }
     }
 
diff --git a/bin/hipvars.pm b/bin/hipvars.pm
index c0c2eaa3..c93fd469 100644
--- a/bin/hipvars.pm
+++ b/bin/hipvars.pm
@@ -79,7 +79,7 @@ if (-e "$HIP_PATH/bin/rocm_agent_enumerator") {
 }elsif (-e "$HIP_PATH/../bin/rocm_agent_enumerator") { # case for backward compatibility
     $ROCM_PATH=$ENV{'ROCM_PATH'} // dirname("$HIP_PATH"); # use parent directory of HIP_PATH
 } else {
-    $ROCM_PATH=$ENV{'ROCM_PATH'} // "/opt/rocm";
+    $ROCM_PATH=$ENV{'ROCM_PATH'} // "/usr";
 }
 $CUDA_PATH=$ENV{'CUDA_PATH'} // '/usr/local/cuda';
 
@@ -87,7 +87,7 @@ $CUDA_PATH=$ENV{'CUDA_PATH'} // '/usr/local/cuda';
 if ($isWindows) {
     $HIP_CLANG_PATH=$ENV{'HIP_CLANG_PATH'} // "$HIP_PATH/bin";
 } else {
-    $HIP_CLANG_PATH=$ENV{'HIP_CLANG_PATH'} // "$ROCM_PATH/llvm/bin";
+    $HIP_CLANG_PATH=$ENV{'HIP_CLANG_PATH'} // "$ROCM_PATH/bin";
 }
 # HIP_ROCCLR_HOME is used by Windows builds
 $HIP_ROCCLR_HOME=$ENV{'HIP_ROCCLR_HOME'};
-- 
2.40.1

Changing this at least gets me to compile https://github.com/ROCm-Developer-Tools/hip-tests/blob/develop/samples/0_Intro/square/square.hipref.cpp . However, it does not run with hipMemcpy failing in line 78.

Comment 6 Philipp K. 2023-05-29 06:06:35 UTC
There also seem to be missing some perl requirements in your current spec:

hip-devel needs in hipconfig:
perl-Getopt-Long

rocm-hip-devel needs in roc-obj-*:
perl-URI-Encode

Thx for your work :)

Comment 7 Philipp K. 2023-05-29 07:04:51 UTC
Ok, sry for the noise, but square.hipref.cpp with the latest build that include the patches from RHBZ#2207599 work now. However, I find it a bit odd that this square example pulls in the device libs, even though it is already compiled and does not use hiprtc.

Comment 8 Jeremy Newton 2023-05-30 19:13:45 UTC
> There also seem to be missing some perl requirements in your current spec:

Ah sorry, I forgot to include perl-generators to generate the perl requires. The update below is fixed.

> A modernized patch of these issues would look sth like this

Thanks! I've updated with sed in %prep to make it a bit more flexible than a patch file, but thanks for the patch for illustration.

> -        if ($DEVICE_LIB_PATH ne "$ROCM_PATH/amdgcn/bitcode") {
> -            $HIPCXXFLAGS .= " --hip-device-lib-path=\"$DEVICE_LIB_PATH\"";
> +        if ($DEVICE_LIB_PATH ne "$CLANG_RESOURCE_DIR/amdgcn/bitcode") {
> +            $HIPCXXFLAGS .= " --rocm-device-lib-path=\"$DEVICE_LIB_PATH\"";

If I understand correctly, we always want to use --rocm-device-lib-path for redundancy, since I change the default device libs path in the fedora package.

> I find it a bit odd that this square example pulls in the device libs

I think hipcc just always uses it. The perl hipcc is a bit hacky and limited, which is why I think they are rewriting it in c++ (see https://github.com/ROCm-Developer-Tools/HIPCC/blob/develop/README.md).

Here's the updated files:
Spec URL: https://mystro256.fedorapeople.org/rocclr.spec
SRPM URL: https://mystro256.fedorapeople.org/rocclr-5.5.1-2.fc39.src.rpm
copr build: https://copr.fedorainfracloud.org/coprs/mystro256/rocm-hip/build/5988523/

I still need to look into those docs related rpmlint warnings, but I added a fix for aarch64, along with the fixes related to Philipp's comments.
I'm going to see if I can get blender working, as that seems to be a good way to test if HIP is working.

Comment 9 Jeremy Newton 2023-05-31 02:25:27 UTC
OK sorry, one more update:
Spec URL: https://mystro256.fedorapeople.org/rocclr.spec
SRPM URL: https://mystro256.fedorapeople.org/rocclr-5.5.1-4.fc39.src.rpm
copr build: https://copr.fedorainfracloud.org/coprs/mystro256/rocm-hip/build/5989231/

I added a new "hip" package as it turns out hipcc and hipconfig are used at runtime, such as with Blender.
Now doing "sudo dnf install rocm-hip" will allow for running blender without any issue on my system.
The only caveat is that blender checks for "libamdhip64.so" instead of "libamdhip64.so.5" or similar, so I moved the libamdhip64.so symlink to the rocm-hip package to workaround this issue. This introduces a devel-file-in-non-devel-package rpmlint warning.

Excluding that, I got the rpmlint out down to just a few no-manual-page-for-binary and no-documentation warnings, which I think can be ignored.

As well, I used the following to add a requires on rocm-device-libs for hipcc but I did it in a way to silence a rpmlint explicit-lib-dependency warning:

> # hipcc requirements:
> Requires:       %{_libdir}/amdgcn/bitcode

It seems ok to me since hipcc now looks for this directory due to the sed patchwork in %prep, but I can change it back to "rocm-device-libs" if you don't like adding directory requires like this.

@trix, what do you think?

Comment 10 Tom Rix 2023-05-31 15:28:19 UTC
Which blender are you using, can you send me instructions ?
Using fedora's with -DWITH_CYCLES_HIP_BINARIES=ON fails in to build in a non hip area.

The rocm-device-libs, imo needs to be solved in where the bitcode/ is install and/or get clang to look.
I did not take a close look yet at this yet to see which was the better approach.

My notes from the spec file
--- a/rocclr.spec
+++ b/rocclr.spec
@@ -46,6 +46,8 @@ BuildRequires:  pkgconfig(numa)
 BuildRequires:  pkgconfig(ocl-icd)
 BuildRequires:  python3-cppheaderparser
 BuildRequires:  rocm-comgr-devel
+# it would be good if these rocm packages were versioned so
+# we are not mixing 5.5.0 and 5.5.1
 BuildRequires:  rocminfo
 BuildRequires:  rocm-runtime-devel
 BuildRequires:  zlib-devel
@@ -106,6 +108,11 @@ ROCm HIP development package.
 %package -n hip
 Summary:        C++ Runtime API and Kernel Language
 BuildArch:      noarch
+# [!]: No file requires outside of /etc, /bin, /sbin, /usr/bin, /usr/sbin.
+#     Note: Incorrect Requires : /usr/lib64/amdgcn/bitcode
+#     See: https://docs.fedoraproject.org/en-US/packaging-
+#     guidelines/#_file_and_directory_dependencies
+# could bitcode/ be installed into clang's resource dir ?
 # hipcc requirements:
 Requires:       %{_libdir}/amdgcn/bitcode
 Requires:       rocminfo
@@ -273,6 +280,13 @@ fi
 %{_libdir}/libamdhip64.so.5{,.*}
 %{_libdir}/libhiprtc.so.5{,.*}
 %{_libdir}/libhiprtc-builtins.so.5{,.*}
+#
+# rocm-hip.x86_64: W: devel-file-in-non-devel-package /usr/lib64/libamdhip64.so
+# I do not like this work around, applications need to follow libraries
+# even if that means rebuilding the application.  Since blender should be
+# using this library and not the AMD version.  This likely mean cleaning/enhancing
+# up other applications as ROCm is generally available on fedora.
+
 # Workaround blender issue: blender looks for libamdhip64.so instead of the
 # versioned counterpart, so include the .so symlink in the runtime package
 %{_libdir}/libamdhip64.so

Comment 11 Jeremy Newton 2023-05-31 21:41:05 UTC
> Which blender are you using, can you send me instructions ?

I just used Fedora's Blender package. Go to edit->preferences to open the preferences window, then click on "System" on the left then "HIP" on the top.
If you install rocm-hip, check on a system with supported HW, it will be found in the list. If you then delete the libamdhip64.so symlink, it won't find hip anymore.
I'll fix the package for now and we can fix blender later.

> Using fedora's with -DWITH_CYCLES_HIP_BINARIES=ON fails in to build in a non hip area.

I'm not sure about this, I just used the existing fedora package. I guess maybe you can build blender against hip-devel or rocm-hip-devel to get it to link at compile time?

> Note: Incorrect Requires : /usr/lib64/amdgcn/bitcode

I can use just "Requires: rocm-device-libs" if that works for you, but rpmlint complains about explicit-lib-dependency. I guess I can just ignore it.

> could bitcode/ be installed into clang's resource dir ?

Sure I can do that, but it's unrelated to the dependency. Basically the default is to install to /usr/amdgcn/bitcode, so I patched it to put it in /usr/lib64/amdgcn/bitcode (arbitrary decision). Putting it in /usr/lib64/clang/VERSION/amdgcn/bitcode or similiar is fine by me, but I'll need to land multiple patches in the existing packages to stage it over to the new location.
Is this blocking for this review? Or can I do it later? I'll start the work on moving it now if you think it's valuable; I just don't understand the logic on the clang side and would need to dive into that code a bit to find the best location for the bitcodes.

>>>>>

Also I spoke with upstream a bit today; in ROCm 5.7, they plan to move hipcc and hipconfig into its another package (https://github.com/ROCm-Developer-Tools/HIPCC), so I'll need to submit another review for that to prepare. Basically the hipcc.pl will be replaced with hipcc.bin (c++). I also might have put some files in the wrong place, so I'll follow up with an update in a few hours once I've thought things through a bit.

Let me know if you see any issues. I've made a pull request for a few of my patches to reduce some of the sed logic in %prep:
https://github.com/ROCm-Developer-Tools/HIPCC/pull/83

Comment 12 Tom Stellard 2023-06-01 04:21:40 UTC
(In reply to Jeremy Newton from comment #11)
> > Which blender are you using, can you send me instructions ?
> 
> I just used Fedora's Blender package. Go to edit->preferences to open the
> preferences window, then click on "System" on the left then "HIP" on the top.
> If you install rocm-hip, check on a system with supported HW, it will be
> found in the list. If you then delete the libamdhip64.so symlink, it won't
> find hip anymore.
> I'll fix the package for now and we can fix blender later.
> 
> > Using fedora's with -DWITH_CYCLES_HIP_BINARIES=ON fails in to build in a non hip area.
> 
> I'm not sure about this, I just used the existing fedora package. I guess
> maybe you can build blender against hip-devel or rocm-hip-devel to get it to
> link at compile time?
> 
> > Note: Incorrect Requires : /usr/lib64/amdgcn/bitcode
> 
> I can use just "Requires: rocm-device-libs" if that works for you, but
> rpmlint complains about explicit-lib-dependency. I guess I can just ignore
> it.
> 
> > could bitcode/ be installed into clang's resource dir ?
> 
> Sure I can do that, but it's unrelated to the dependency. Basically the
> default is to install to /usr/amdgcn/bitcode, so I patched it to put it in
> /usr/lib64/amdgcn/bitcode (arbitrary decision). Putting it in
> /usr/lib64/clang/VERSION/amdgcn/bitcode or similiar is fine by me, but I'll
> need to land multiple patches in the existing packages to stage it over to
> the new location.
> Is this blocking for this review? Or can I do it later? I'll start the work
> on moving it now if you think it's valuable; I just don't understand the
> logic on the clang side and would need to dive into that code a bit to find
> the best location for the bitcodes.

Where does clang search for this by default?  I think it would make sense to put the bit code in that directory.  I know this code in clang fairly well and I can help debug it if you give me a simple example.

Comment 13 Philipp K. 2023-06-01 10:01:09 UTC
(In reply to Philipp K. from comment #7)
> Ok, sry for the noise, but square.hipref.cpp with the latest build that
> include the patches from RHBZ#2207599 work now. However, I find it a bit odd
> that this square example pulls in the device libs, even though it is already
> compiled and does not use hiprtc.

Just for future reference, I opened an issue upstream: https://github.com/ROCm-Developer-Tools/ROCclr/issues/44 However, I dont have much hope that this will get fixed. This should also not block in any way the inclusion of these packages in fedora.

However, this probably has the implication that the rocm-hip package requires rocm-device-libs in the spec-file in one way or another?

Also rocm-opencl should probably require rocm-device-libs anyway.

Maybe rocm-comgr should not require rocm-device-libs only at build time, and pull in rocm-device-libs, which would then satisfy the runtime needs of rocm-opencl and rocm-device-libs?

Not sure how realistic this is, but maybe if the aforementioned issue gets resolved, one could in the long-term split off hiprtc as a separate package? At least in theory, I guess, this should allow for binary distribution of dependent packages that only require rocm-hip and do not pull in a full clang/llvm dependency.

Comment 14 Tom Rix 2023-06-01 12:29:24 UTC
with rocm-device-libs rpm installed
My simple test is
touch t.hip
clang t.hip
error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library

To work correctly is necessary to do
clang --rocm-device-lib-path=/usr/lib64/amdgcn/bitcode t.hip

From this print out
clang -print-rocm-search-dirs                                                                                                       
ROCm installation search path: /usr                                                                                                          
ROCm installation search path: /usr                                                                                                          
ROCm installation search path: /usr/lib64/clang/16                                                                                           
ROCm installation search path: /opt/rocm                                                                                                     
ROCm installation search path: /usr/local                                                                                                    
ROCm installation search path: /usr 

Because we could have different clangs installed, the best search path is
/usr/lib64/clang/16

From comments in clang/lib/Driver/ToolChains/AMDGPU.cpp, detectDeviceLibrary()  
// Find device libraries in <LLVM_DIR>/lib/clang/<ver>/lib/amdgcn/bitcode                                                                  
  LibDevicePath = D.ResourceDir;                                                                                                             
  llvm::sys::path::append(LibDevicePath, CLANG_INSTALL_LIBDIR_BASENAME,                                                                      
                          "amdgcn", "bitcode");                                                                                              
  HasDeviceLibrary = CheckDeviceLib(LibDevicePath, true);                                                                                    
  if (HasDeviceLibrary)                                                                                                                      
    return;

simply copying the directory there does not work, so i suspect this will need to
with 2 changes.

1. rocm-device-libs
follow what libomp does and use 
Requires: clang-resource-filesystem%{?isa} = %{version}
to install amdgcn/bitcode into clang's resource dir lib/

2. clang
have a default to look there

Comment 15 Jeremy Newton 2023-06-01 12:42:32 UTC
> Also rocm-opencl should probably require rocm-device-libs anyway.

How come? I don't see any code paths in OpenCL requiring it at all. Hip needs it because of hipcc as far as I know and rocm-device-libs is used generally during compilation. In other words, rocm-device-libs is not a runtime dependency, only build time.

> Where does clang search for this by default?

I was actually trying to figure that out, as I'm a bit unfamiliar with the code. I see this:

https://github.com/llvm/llvm-project/blob/release/16.x/clang/lib/Driver/ToolChains/HIPAMD.cpp#L346
and
https://github.com/llvm/llvm-project/blob/release/16.x/clang/lib/Driver/ToolChains/HIPSPV.cpp#L215

The whole thing looks a bit opaque to me, so feel free to help out :)

> 1. rocm-device-libs
> follow what libomp does and use 
> Requires: clang-resource-filesystem%{?isa} = %{version}
> to install amdgcn/bitcode into clang's resource dir lib/

Sure I'm ok with this if Tom Stellar is willing to help out with the LLVM patching side.
Or I can also reach out to rocm device libs upstream to see if they can guide me towards making a patch that they're ok with.

Comment 16 Philipp K. 2023-06-01 13:14:32 UTC
(In reply to Tom Rix from comment #14)
> From comments in clang/lib/Driver/ToolChains/AMDGPU.cpp,
> detectDeviceLibrary()  
> // Find device libraries in <LLVM_DIR>/lib/clang/<ver>/lib/amdgcn/bitcode   
> 
>   LibDevicePath = D.ResourceDir;                                            
> 
>   llvm::sys::path::append(LibDevicePath, CLANG_INSTALL_LIBDIR_BASENAME,     
> 
>                           "amdgcn", "bitcode");                             
> 
>   HasDeviceLibrary = CheckDeviceLib(LibDevicePath, true);                   
> 
>   if (HasDeviceLibrary)                                                     
> 
>     return;
> 
> simply copying the directory there does not work, so i suspect this will
> need to
> with 2 changes.
For me it worked when copying to /usr/lib64/clang/16/amdgcn/bitcode. Cf. my patch above. Maybe the code comment in the clang/llvm codebase is out of sync? Or maybe the "/lib" is missing somewhere in the clang/llvm codebase?

(In reply to Jeremy Newton from comment #15)
> > Also rocm-opencl should probably require rocm-device-libs anyway.
> 
> How come? I don't see any code paths in OpenCL requiring it at all. Hip
> needs it because of hipcc as far as I know and rocm-device-libs is used
> generally during compilation. In other words, rocm-device-libs is not a
> runtime dependency, only build time.
As far as I can tell, both pull them in as runtime dependencies. For OpenCL it is somehow expected, since this is IIRC the programming model after all, there is no binary precompilation and code is expected to run everywhere, thus requiring compiling/linking; the full package.

For, HIP i consider this in fact a bug (with the issue linked above), since in general I would expect the precompilation to be done, and no .bc files to be of need anymore at runtime.

To test this, run any OpenCL application (e.g. clinfo) or any HIP application with hipMemcpy (e.g. the square example) with AMD_COMGR_REDIRECT_LOGS=stdout AMD_COMGR_EMIT_VERBOSE_LOGS=1. For me in both cases I see a line that reads AMD_COMGR_ACTION_ADD_DEVICE_LIBRARIES followed by AMD_COMGR_ACTION_LINK_BC_TO_BC, which seems to link the *.bc files from rocm-device-libs.

The respective code seems in both cases to reside in rocclr: https://github.com/ROCm-Developer-Tools/ROCclr/blob/develop/device/devprogram.cpp

Comment 17 Jeremy Newton 2023-06-01 15:36:06 UTC
> For me it worked when copying to /usr/lib64/clang/16/amdgcn/bitcode

You're right, I did this on my Fedora 37 machine:
sudo ln -s /usr/lib64/amdgcn/ /usr/lib64/clang/15/amdgcn

And it worked. I think I should relocate rocm-device-libs to /usr/lib64/clang/MAJOR then :)
Let me get working on patching rocm-device-libs!

Side note, I spoke to upstream, it looks like they've been preparing to move the files here for a while now and plan to move around the ROCM 6.0 timespan, so it seems logic to do it early.

Comment 18 Jeremy Newton 2023-06-02 00:04:55 UTC
Ok updated, should be much better now:
Spec URL: https://mystro256.fedorapeople.org/rocclr.spec
SRPM URL: https://mystro256.fedorapeople.org/rocclr-5.5.1-5.fc39.src.rpm
copr build: https://copr.fedorainfracloud.org/coprs/mystro256/rocm-hip/build/5995610/

Note that there's a planned koji outage right now, so it'll take me time to update rocm-device-libs with the fixed paths, so I uploaded the fixed package to my copr for testing.

Some responses to comments I missed:

> +# it would be good if these rocm packages were versioned so
> +# we are not mixing 5.5.0 and 5.5.1

I'm not sure which this is referring to. Comgr and device libs is not ROCm versioned because I forked it to be more aligned with LLVM releases. For some background, upstream ROCm builds against unstable LLVM, so I can't really package official ROCm releases with Fedora, rather I've resorted to forking and making my own release branches until ROCm moves it into llvm-project proper, which I believe is on their TODO list.

As for rocminfo and rocm-runtime. I don't generally update it beyond .0 unless there's fixes, e.g. no update from 5.5.0 to 5.5.1 because they rarely update it and they retag the same git hash most of the time. It's a lot of package update curn to update packages that don't changes. When I back-port to EPEL, I always rebuild the latest though, since I only update it when a new RHEL is released.

I added ">= %{rocm_version}" to the BuildRequires/Requires to reduce mixing. I can add a provides for the rocm_version to these packages if you want stricter requirements. E.g. to rocminfo I can add "Provides: rocminfo(rocm) = %{rocm_release}" and then have these packages depend on that.

Current rpmlint output:
> hip.noarch: W: no-manual-page-for-binary hipcc
> hip.noarch: W: no-manual-page-for-binary hipcc.pl
> hip.noarch: W: no-manual-page-for-binary hipconfig
> hip.noarch: W: no-manual-page-for-binary hipconfig.pl
> hip-devel.noarch: W: no-manual-page-for-binary hipcc_cmake_linker_helper
> hip-devel.noarch: W: no-manual-page-for-binary hipdemangleatp
> rocm-clinfo.x86_64: W: no-manual-page-for-binary rocm-clinfo
> rocm-hip-devel.x86_64: W: no-manual-page-for-binary roc-obj
> rocm-hip-devel.x86_64: W: no-manual-page-for-binary roc-obj-extract
> rocm-hip-devel.x86_64: W: no-manual-page-for-binary roc-obj-ls
> hip-devel.noarch: W: no-documentation
> rocm-clinfo.x86_64: W: no-documentation
> rocm-hip-devel.x86_64: W: no-documentation
> rocm-opencl.x86_64: W: no-documentation
> rocm-opencl-devel.x86_64: W: no-documentation

Comment 19 Tom Stellard 2023-06-02 18:43:14 UTC
(In reply to Tom Rix from comment #14)
> with rocm-device-libs rpm installed
> My simple test is
> touch t.hip
> clang t.hip
> error: cannot find ROCm device library; provide its path via '--rocm-path'
> or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm
> device library

It would be great if someone would add this as a CI test to the romc-device-library package.   We are planning on moving the resource directory path from /usr/lib64/clang/<MAJOR>/ to /usr/lib/clang/<MAJOR> when we package clang-17 for Fedora, so having automated testing for these paths would be really helpful.

Comment 20 Tom Rix 2023-06-02 20:40:12 UTC
I can add the test when the rocm-device-library settles from these changes.
I updated to the latest device-library, and the install to the clang resource dir works.
Then the most recent rocclr above.
My building of rocBLAS fails because it does not use the new default that is rocBLAS's problem.
The new hip-doc package addresses the issue fedora-review warnings.

Unless any objections are raised by this Sunday, I will approve.

Comment 21 Philipp K. 2023-06-02 21:08:18 UTC
Tried to install latest copr build and dnf fails with "rocminfo >= 5.5.1 needed".

Comment 22 Tom Rix 2023-06-02 22:02:38 UTC
I assumed this was an oversite and why rocm needs to settle to 5.5.1
I hacked this as part of my testing

git diff
diff --git a/rocminfo.spec b/rocminfo.spec
index 5fb5623..fa9c98c 100644
--- a/rocminfo.spec
+++ b/rocminfo.spec
@@ -1,5 +1,5 @@
 Name:          rocminfo
-Version:       5.5.0
+Version:       5.5.1
 Release:       1%{?dist}
 Summary:       ROCm system info utility

Comment 23 Jeremy Newton 2023-06-02 22:33:40 UTC
> Tried to install latest copr build and dnf fails with "rocminfo >= 5.5.1 needed".

Ah yes sorry, my mistake:
Spec URL: https://mystro256.fedorapeople.org/rocclr.spec
SRPM URL: https://mystro256.fedorapeople.org/rocclr-5.5.1-6.fc39.src.rpm
copr build: https://copr.fedorainfracloud.org/coprs/mystro256/rocm-hip/build/5999448/

> I assumed this was an oversite and why rocm needs to settle to 5.5.1

Did you seem my comment above about this?

Comment 24 Philipp K. 2023-06-03 20:04:20 UTC
Just another minor issue (however this does not seem to affect the working of hipcc), when I run HIPCC_VERBOSE=2 hipcc for some input file, I'll get:

Use of uninitialized value $DEVICE_LIB_PATH in concatenation (.) or string at /usr/bin//hipcc.pl line 192.

Also tried to build https://github.com/amd/rocm-examples/tree/develop/HIP-Basic and there are certainly some cmake issues still lurking around, but this is probably material for another bug, and quite scattered across different packages.

Looking forward to see this in the official repos :)

Comment 25 Tom Rix 2023-06-04 16:31:33 UTC
This is a key package to enabling ROCm in Fedora.
The major issues have been resolved.
There are multiple other follow on ROCm packages that will flesh out the minor issues.
Thanks for the hard work.
Approved.

Comment 26 Jeremy Newton 2023-06-04 23:56:48 UTC
Thanks Tom!

Comment 27 Fedora Admin user for bugzilla script actions 2023-06-04 23:57:16 UTC
The Pagure repository was created at https://src.fedoraproject.org/rpms/rocclr

Comment 28 Fedora Update System 2023-06-05 04:56:09 UTC
FEDORA-2023-a53dd0f60b has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-a53dd0f60b

Comment 29 Jeremy Newton 2023-06-05 13:39:30 UTC
FYI: the f38 package is pending for push to updates-testing, but f39/rawhide is blocked by: https://bugzilla.redhat.com/show_bug.cgi?id=2212323

Comment 30 Fedora Update System 2023-06-06 02:28:05 UTC
FEDORA-2023-a53dd0f60b has been pushed to the Fedora 38 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf install --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-a53dd0f60b \*`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-a53dd0f60b

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 31 Tom Rix 2023-06-06 12:14:31 UTC
The requested test has been submitted here
https://src.fedoraproject.org/rpms/rocm-device-libs/pull-request/3

Comment 32 Jeremy Newton 2023-06-06 16:24:08 UTC
> The requested test has been submitted here

Thanks, I'll test it and pull it in.

> but f39/rawhide is blocked by...

This was fixed and this package is now in RAWHIDE/F39, so I'm closing this ticket off.

Comment 33 Fedora Update System 2023-06-11 05:17:56 UTC
FEDORA-2023-a53dd0f60b has been pushed to the Fedora 38 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf install --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-a53dd0f60b \*`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-a53dd0f60b

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 34 Fedora Update System 2023-06-19 02:09:42 UTC
FEDORA-2023-a53dd0f60b has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.