Bug 2240414 - Review Request: rocblas - ROCm BLAS
Summary: Review Request: rocblas - ROCm BLAS
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: Package Review
Version: rawhide
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Jeremy Newton
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 2242859
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-09-24 00:11 UTC by Tom Rix
Modified: 2023-10-31 13:07 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-10-23 15:13:14 UTC
Type: ---
Embargoed:
alexjnewt: fedora-review+


Attachments (Terms of Use)

Description Tom Rix 2023-09-24 00:11:50 UTC
Spec URL: https://trix.fedorapeople.org/rocblas.spec
SRPM URL: https://trix.fedorapeople.org/rocblas-5.6.0-1.fc40.src.rpm

rocBLAS is the AMD library for Basic Linear Algebra Subprograms                                                                               
(BLAS) on the ROCm platform. It is implemented in the HIP                                                                                     
programming language and optimized for AMD GPUs.

Size of *.so was causing relocation errors, so scaled back the gpus to build.

Reproducible: Always

Comment 1 Tom Rix 2023-10-03 21:38:58 UTC
Spec URL: https://trix.fedorapeople.org/rocblas.spec
SRPM URL: https://trix.fedorapeople.org/rocblas-5.6.0-2.fc40.src.rpm

Reworked to split *.so into gpu family's and to load the appropriate one at install time.
Add an option --with test to create a rocblas-test.rpm with the tests and benchmarks

Testing is ok on my gfx1102
Building is ok with the WIP rocsparse.spec here 
https://github.com/trixirt/rocSPARSE/blob/fedora/rocm-rel-5.6/rocsparse.spec

Comment 2 Jeremy Newton 2023-10-04 13:55:43 UTC
So I'm not totally opposed to the idea of splitting up the SO libs... but I'm a bit concerned with the logic in %post.

My gut says we shouldn't copy, but use symlinks instead, but if you're going to use symlinks, I would suggest "alternatives". It's not 100% a traditional use for alternatives, but it at least allows the user/system maintainer to switch easily if the detect logic doesn't work. It also reduces some possible packaging pitfalls.

I also suggest to have a bunch of rocblas subpackages. E.g rocblas-gfx11, rocblas-gfx10, etc, with a metapackage of "rocblas" that just pulls them all in with the detection logic. This allows easier selection by the user installing the packages.

My gut also says to default to gfx11, but this is completely subjective and I have no good reason to feel this way.

Last point, please update to 5.7, I just pushed the update to hip this morning.

Comment 3 Tom Rix 2023-10-06 13:18:41 UTC
I am wondering if rocm shouldn't be handled like mpi by using modules.
We are always going to switch a set of libs from multiple packages at the same time.
instead of the mpi, we would have rocm gpu family backends.
ex/
setenv        ROCM_LIB       /usr/lib64/rocm/gfx11/lib
prepend-path  LD_LIBRARY_PATH /usr/lib64/rocm/gfx11/lib

and there would be new package like rocm-module to set this up.

Here is an example of what Benson suggested for gloo
https://bugzilla.redhat.com/show_bug.cgi?id=2240302
https://copr-dist-git.fedorainfracloud.org/cgit/fed500/gloo/gloo.git/tree/gloo.spec?id=5f4092aba4f411ac9a10fc05d14aeab8909df3d9

Comment 4 Ben Woodard 2023-10-06 15:11:25 UTC
(In reply to Tom Rix from comment #3)
> I am wondering if rocm shouldn't be handled like mpi by using modules.
> We are always going to switch a set of libs from multiple packages at the
> same time.
> instead of the mpi, we would have rocm gpu family backends.
> ex/
> setenv        ROCM_LIB       /usr/lib64/rocm/gfx11/lib
> prepend-path  LD_LIBRARY_PATH /usr/lib64/rocm/gfx11/lib
> 
> and there would be new package like rocm-module to set this up.
> 
> Here is an example of what Benson suggested for gloo
> https://bugzilla.redhat.com/show_bug.cgi?id=2240302
> https://copr-dist-git.fedorainfracloud.org/cgit/fed500/gloo/gloo.git/tree/
> gloo.spec?id=5f4092aba4f411ac9a10fc05d14aeab8909df3d9

I'm not sure that this would work. As Cordell pointed out many users have multiple different GPUs installed in their system. This won't be as much of a problem in the enterprise space where graphics aren't really a thing but many systems are built with APUs (combined CPU+GPU) packages and then they install a DGPU like a Radeon 7900 XT. How would that work with your alternative's ligic? You would need two to be active at once.

I agree, that you should roll everything forward to 5.7 and in the back of your mind consider how to maintain compatibility when rocm 6 rolls out. It is supposed to be a break API release.

Comment 5 Tom Rix 2023-10-07 00:17:33 UTC
Everyone is not going to be happy.  There is 10 kilos flour going into a single 5 kilo bag.
This is better than the first option where the older gpus were dropped.

Comment 6 Tom Rix 2023-10-07 14:19:59 UTC
Most coverage of gpu will happen in the default, see
https://github.com/trixirt/rocm-rpm-macros/blob/main/modules.rocm/default
I think this is the best we can do, a combo of the last 2 families, if you have a gfx8 or gfx9, then you have to use that specific environment.  There will be no combo support for gfx8 or gfx9.
When gfx12 rolls out the default will be gfx11 and gfx12 combo

Comment 7 Tom Rix 2023-10-09 12:44:04 UTC
The new build infra has been linked as a depends-on package reviews for
module infra that splits out these gpu families
rpm macros a general improvement, not specifically needed, but I am tired of cut-n-pasting boiler plate
An example of use for rocBLAS is
https://github.com/trixirt/rocBLAS/commit/deba70490ff6af3f866553fe703f4e00e6da3067

And then the followup rocSPARSE that uses rocBLAS is
https://github.com/trixirt/rocSPARSE/commit/969e09adf17d5c1993fb409904743bf443cb8d92

Comment 8 Tom Rix 2023-10-16 23:31:57 UTC
Spec URL: https://trix.fedorapeople.org/rocblas.spec
SRPM URL: https://trix.fedorapeople.org/rocblas-5.7.1-1.fc40.src.rpm

Uses the new infra.

Comment 9 Jeremy Newton 2023-10-20 23:06:19 UTC
So it seems pretty sane, except you put the .so files in the regular package, and the .so.* in the devel.

A suggestion could be to add a macro for %files so you don't need to specify all the families.

I'm just building it in copr to check the output:
https://copr.fedorainfracloud.org/coprs/mystro256/playground/build/6552139/

Comment 10 Tom Rix 2023-10-21 19:22:22 UTC
Spec URL: https://trix.fedorapeople.org/rocblas.spec
SRPM URL: https://trix.fedorapeople.org/rocblas-5.7.1-2.fc40.src.rpm

fixed the so location.
Use a glob instead of macro to simplify the gfx locations

Comment 11 Jeremy Newton 2023-10-21 22:29:06 UTC
Ok so after looking into this further, I see some remaining issues:

- please change the license field to "MIT AND BSD-3-Clause" to match SPDX style (capitalise AND)
- I'd much prefer you use the URL https://github.com/ROCmSoftwarePlatform/rocBLAS as the upstream, as it makes it easier to find the clone-able code. I need to fix a few of the other ROCm packages to match.
- There's a bunch of orphaned directories and we should make sure they're owned by a package as per the guidelines. I'd suggest either add them to the rocm-rpm-macros-modules package, or add a filesystem package to own these directories:
     /usr/lib64/rocm/gfx11
     /usr/lib64/rocm/gfx11/lib
     /usr/lib64/rocm/gfx11/lib/cmake
     /usr/lib64/rocm/gfx10
     /usr/lib64/rocm/gfx10/lib
     /usr/lib64/rocm/gfx10/lib/cmake
     /usr/lib64/rocm/gfx9
     /usr/lib64/rocm/gfx9/lib
     /usr/lib64/rocm/gfx9/lib/cmake
     /usr/lib64/rocm/gfx8
     /usr/lib64/rocm/gfx8/lib
     /usr/lib64/rocm/gfx8/lib/cmake
I'd much rather we fix this before approving this package, or else we're run into orphaned directories.

Comment 13 Jeremy Newton 2023-10-23 00:41:53 UTC
LGTM, although you put AMD instead of AND in the changelog.

Approved

Comment 14 Fedora Admin user for bugzilla script actions 2023-10-31 13:07:30 UTC
The Pagure repository was created at https://src.fedoraproject.org/rpms/rocblas


Note You need to log in before you can comment on or make changes to this bug.