Bug 2240302 - Review Request: gloo - A PyTorch communication library
Summary: Review Request: gloo - A PyTorch communication library
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: Package Review
Version: rawhide
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Benson Muite
QA Contact: Fedora Extras Quality Assurance
URL: https://github.com/facebookincubator/...
Whiteboard:
Depends On:
Blocks: ML-SIG
TreeView+ depends on / blocked
 
Reported: 2023-09-23 00:33 UTC by Tom Rix
Modified: 2024-03-26 10:11 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2023-12-16 20:26:26 UTC
Type: ---
Embargoed:
benson_muite: fedora-review+


Attachments (Terms of Use)
The .spec file difference from Copr build 6506272 to 6514816 (4.69 KB, patch)
2023-10-10 21:33 UTC, Fedora Review Service
no flags Details | Diff

Description Tom Rix 2023-09-23 00:33:11 UTC
Spec URL: https://trix.fedorapeople.org/gloo.spec
SRPM URL: https://trix.fedorapeople.org/gloo-0.5.0%5egit20230824.01a0c81-1.fc40.src.rpm

Gloo is a collective communications library. It comes with a number of                                                                        
collective algorithms useful for machine learning applications. These                                                                         
include a barrier, broadcast, and allreduce.                                                                                                  
                                                                                                                                              
Transport of data between participating machines is abstracted so that                                                                        
IP can be used at all times, or InifiniBand (or RoCE) when available.                                                                         
In the latter case, if the InfiniBand transport is used, GPUDirect can                                                                        
be used to accelerate cross machine GPU-to-GPU memory transfers.                                                                              
                                                                                                                                              
Where applicable, algorithms have an implementation that works with                                                                           
system memory buffers, and one that works with NVIDIA GPU memory buffers.                                                                     
In the latter case, it is not necessary to copy memory between host and                                                                       
device; this is taken care of by the algorithm implementations.

This commit shows its intended use in the PyTorch package
https://github.com/trixirt/pytorch-fedora/commit/f879f528afa93e6ab160bd802ba57e4bfa4b385a

Reproducible: Always

Comment 1 Benson Muite 2023-09-24 14:52:48 UTC
Package Review
==============

Legend:
[x] = Pass, [!] = Fail, [-] = Not applicable, [?] = Not evaluated
[ ] = Manual review needed



===== MUST items =====

C/C++:
[x]: Package does not contain kernel modules.
[x]: If your application is a C or C++ application you must list a
     BuildRequires against gcc, gcc-c++ or clang.
[x]: Header files in -devel subpackage, if present.
[x]: ldconfig not called in %post and %postun for Fedora 28 and later.
[x]: Package does not contain any libtool archives (.la)
[x]: Package contains no static executables.
[x]: Rpath absent or only used for internal libs.
[x]: Development (unversioned) .so files in -devel subpackage, if present.

Generic:
[x]: Package is licensed with an open-source compatible license and meets
     other legal requirements as defined in the legal section of Packaging
     Guidelines.
[x]: License field in the package spec file matches the actual license.
     Note: Checking patched sources after %prep for licenses. Licenses
     found: "Unknown or generated", "BSD 3-Clause License", "MIT License".
     252 files have unknown license. Detailed output of licensecheck in
     /home/FedoraPackaging/reviews/gloo/2240302-gloo/licensecheck.txt
[x]: License file installed when any subpackage combination is installed.
[x]: %build honors applicable compiler flags or justifies otherwise.
[x]: Package contains no bundled libraries without FPC exception.
[x]: Changelog in prescribed format.
[x]: Sources contain only permissible code or content.
[-]: Package contains desktop file if it is a GUI application.
[x]: Development files must be in a -devel package
[x]: Package uses nothing in %doc for runtime.
[x]: Package consistently uses macros (instead of hard-coded directory
     names).
[x]: Package is named according to the Package Naming Guidelines.
[ ]: Package does not generate any conflict.
[ ]: Package obeys FHS, except libexecdir and /usr/target.
[-]: If the package is a rename of another package, proper Obsoletes and
     Provides are present.
[ ]: Requires correct, justified where necessary.
[x]: Spec file is legible and written in American English.
[-]: Package contains systemd file(s) if in need.
[x]: Useful -debuginfo package or justification otherwise.
[!]: Package is not known to require an ExcludeArch tag.
[-]: Large documentation must go in a -doc subpackage. Large could be size
     (~1MB) or number of files.
     Note: Documentation size is 13843 bytes in 1 files.
[ ]: Package complies to the Packaging Guidelines
[x]: Package successfully compiles and builds into binary rpms on at least
     one supported primary architecture.
[x]: Package installs properly.
[x]: Rpmlint is run on all rpms the build produces.
     Note: No rpmlint messages.
[x]: If (and only if) the source package includes the text of the
     license(s) in its own file, then that file, containing the text of the
     license(s) for the package is included in %license.
[x]: The License field must be a valid SPDX expression.
[x]: Package requires other packages for directories it uses.
[x]: Package must own all directories that it creates.
[x]: Package does not own files or directories owned by other packages.
[x]: Package uses either %{buildroot} or $RPM_BUILD_ROOT
[x]: Package does not run rm -rf %{buildroot} (or $RPM_BUILD_ROOT) at the
     beginning of %install.
[x]: Macros in Summary, %description expandable at SRPM build time.
[x]: Dist tag is present.
[x]: Package does not contain duplicates in %files.
[x]: Permissions on files are set properly.
[x]: Package must not depend on deprecated() packages.
[x]: Package use %makeinstall only when make install DESTDIR=... doesn't
     work.
[x]: Package is named using only allowed ASCII characters.
[x]: Package does not use a name that already exists.
[x]: Package is not relocatable.
[x]: Sources used to build the package match the upstream source, as
     provided in the spec URL.
[x]: Spec file name must match the spec package %{name}, in the format
     %{name}.spec.
[x]: File names are valid UTF-8.
[x]: Packages must not store files under /srv, /opt or /usr/local

===== SHOULD items =====

Generic:
[ ]: If the source package does not include license text(s) as a separate
     file from upstream, the packager SHOULD query upstream to include it.
[ ]: Final provides and requires are sane (see attachments).
[ ]: Package functions as described.
[x]: Latest version is packaged.
[x]: Package does not include license text files separate from upstream.
[x]: Patches link to upstream bugs/comments/lists or are otherwise
     justified.
[-]: Sources are verified with gpgverify first in %prep if upstream
     publishes signatures.
     Note: gpgverify is not used.
[!]: Package should compile and build into binary rpms on all supported
     architectures.
[!]: %check is present and all tests pass.
[ ]: Packages should try to preserve timestamps of original installed
     files.
[x]: Reviewer should test that the package builds in mock.
[x]: Buildroot is not present
[x]: Package has no %clean section with rm -rf %{buildroot} (or
     $RPM_BUILD_ROOT)
[x]: No file requires outside of /etc, /bin, /sbin, /usr/bin, /usr/sbin.
[x]: Fully versioned dependency in subpackages if applicable.
[x]: Packager, Vendor, PreReq, Copyright tags should not be in spec file
[x]: Sources can be downloaded from URI in Source: tag
[x]: SourceX is a working URL.
[x]: Spec use %global instead of %define unless justified.

===== EXTRA items =====

Generic:
[x]: Rpmlint is run on debuginfo package(s).
     Note: No rpmlint messages.
[x]: Rpmlint is run on all installed packages.
     Note: No rpmlint messages.
[x]: Large data in /usr/share should live in a noarch subpackage if package
     is arched.
[x]: Spec file according to URL is the same as in SRPM.


Rpmlint
-------
Checking: gloo-0.5.0^git20230824.01a0c81-1.fc38.x86_64.rpm
          gloo-devel-0.5.0^git20230824.01a0c81-1.fc38.x86_64.rpm
          gloo-debuginfo-0.5.0^git20230824.01a0c81-1.fc38.x86_64.rpm
          gloo-debugsource-0.5.0^git20230824.01a0c81-1.fc38.x86_64.rpm
          gloo-0.5.0^git20230824.01a0c81-1.fc38.src.rpm
============================ rpmlint session starts ============================
rpmlint: 2.4.0
configuration:
    /usr/lib/python3.11/site-packages/rpmlint/configdefaults.toml
    /etc/xdg/rpmlint/fedora-legacy-licenses.toml
    /etc/xdg/rpmlint/fedora-spdx-licenses.toml
    /etc/xdg/rpmlint/fedora.toml
    /etc/xdg/rpmlint/scoring.toml
    /etc/xdg/rpmlint/users-groups.toml
    /etc/xdg/rpmlint/warn-on-functions.toml
rpmlintrc: [PosixPath('/tmp/tmplw2t9ko3')]
checks: 31, packages: 5

 5 packages and 0 specfiles checked; 0 errors, 0 warnings, 0 badness; has taken 4.0 s 




Rpmlint (debuginfo)
-------------------
Checking: gloo-debuginfo-0.5.0^git20230824.01a0c81-1.fc38.x86_64.rpm
============================ rpmlint session starts ============================
rpmlint: 2.4.0
configuration:
    /usr/lib/python3.11/site-packages/rpmlint/configdefaults.toml
    /etc/xdg/rpmlint/fedora-legacy-licenses.toml
    /etc/xdg/rpmlint/fedora-spdx-licenses.toml
    /etc/xdg/rpmlint/fedora.toml
    /etc/xdg/rpmlint/scoring.toml
    /etc/xdg/rpmlint/users-groups.toml
    /etc/xdg/rpmlint/warn-on-functions.toml
rpmlintrc: [PosixPath('/tmp/tmp10t1mp8h')]
checks: 31, packages: 1

 1 packages and 0 specfiles checked; 0 errors, 0 warnings, 0 badness; has taken 1.7 s 





Rpmlint (installed packages)
----------------------------
============================ rpmlint session starts ============================
rpmlint: 2.4.0
configuration:
    /usr/lib/python3.11/site-packages/rpmlint/configdefaults.toml
    /etc/xdg/rpmlint/fedora-legacy-licenses.toml
    /etc/xdg/rpmlint/fedora-spdx-licenses.toml
    /etc/xdg/rpmlint/fedora.toml
    /etc/xdg/rpmlint/scoring.toml
    /etc/xdg/rpmlint/users-groups.toml
    /etc/xdg/rpmlint/warn-on-functions.toml
checks: 31, packages: 4

 4 packages and 0 specfiles checked; 0 errors, 0 warnings, 0 badness; has taken 1.9 s 



Source checksums
----------------
https://github.com/facebookincubator/gloo/archive/01a0c815d1a98eb9b38341cf63546f234fbcc43b/gloo-01a0c81.tar.gz :
  CHECKSUM(SHA256) this package     : 60053ab1b30c82b8972e67e2614a824d57b978b2751f782562b690bc9e3570a1
  CHECKSUM(SHA256) upstream package : 60053ab1b30c82b8972e67e2614a824d57b978b2751f782562b690bc9e3570a1


Requires
--------
gloo (rpmlib, GLIBC filtered):
    ld-linux-x86-64.so.2()(64bit)
    libc.so.6()(64bit)
    libgcc_s.so.1()(64bit)
    libgcc_s.so.1(GCC_3.0)(64bit)
    libgcc_s.so.1(GCC_3.3.1)(64bit)
    libstdc++.so.6()(64bit)
    libstdc++.so.6(CXXABI_1.3)(64bit)
    libstdc++.so.6(CXXABI_1.3.11)(64bit)
    libstdc++.so.6(CXXABI_1.3.13)(64bit)
    libstdc++.so.6(CXXABI_1.3.3)(64bit)
    libstdc++.so.6(CXXABI_1.3.5)(64bit)
    rtld(GNU_HASH)

gloo-devel (rpmlib, GLIBC filtered):
    cmake-filesystem(x86-64)
    gloo(x86-64)
    libgloo.so.23.8.24()(64bit)

gloo-debuginfo (rpmlib, GLIBC filtered):

gloo-debugsource (rpmlib, GLIBC filtered):



Provides
--------
gloo:
    gloo
    gloo(x86-64)
    libgloo.so.23.8.24()(64bit)

gloo-devel:
    cmake(Gloo)
    cmake(gloo)
    gloo-devel
    gloo-devel(x86-64)

gloo-debuginfo:
    debuginfo(build-id)
    gloo-debuginfo
    gloo-debuginfo(x86-64)
    libgloo.so.23.8.24-0.5.0^git20230824.01a0c81-1.fc38.x86_64.debug()(64bit)

gloo-debugsource:
    gloo-debugsource
    gloo-debugsource(x86-64)



Generated by fedora-review 0.10.0 (e79b66b) last change: 2023-07-24
Command line :/usr/bin/fedora-review -b 2240302 -m fedora-38-x86_64
Buildroot used: fedora-38-x86_64
Active plugins: Generic, C/C++, Shell-api
Disabled plugins: Java, Haskell, fonts, SugarActivity, Ruby, Perl, Ocaml, R, PHP, Python
Disabled flags: EXARCH, EPEL6, EPEL7, DISTTAG, BATCH


Comments:
a) Can builds with MPI be added? See for example:
https://orion.fedorapeople.org/dbcsr.spec
b) Can tests or a smoke test be added?  Tests probably need additional configuration,
but an example of a smoke test can be found in [1].
c) Can libibverbs transport be enabled? See [1]
d) Can libuv transport be enabled? It may need an update
to libuv[2] to get cmake files. See [3].

1) https://koji.fedoraproject.org/koji/taskinfo?taskID=106636605
2) https://src.fedoraproject.org/rpms/libuv
3) https://src.fedoraproject.org/rpms/libuv/pull-request/5

Comment 2 Benson Muite 2023-09-24 14:53:46 UTC
4) It does not build on i686

Comment 3 Tom Rix 2023-09-24 16:42:36 UTC
Spec URL: https://trix.fedorapeople.org/gloo.spec
SRPM URL: https://trix.fedorapeople.org/gloo-0.5.0%5egit20230824.01a0c81-2.fc40.src.rpm

Did not seem to be a knob to turn on MPI
Tests depend on a really old openssl, they break the build
Added libibverbs
Tried to add libuv, can add once your PR is merged
Removed i686 from build.

Thanks for the review!

Comment 4 Benson Muite 2023-09-25 11:15:22 UTC
There is a knob to turn on MPI, though it is not in the main
CMakeLists.txt file:
https://github.com/facebookincubator/gloo/blob/main/gloo/mpi/CMakeLists.txt#L1

The OpenSSL used is a compatibility shim allowing
use of 1.1 interface but using OpenSSL3.  Please see:
https://koji.fedoraproject.org/koji/taskinfo?taskID=106674591
https://download.copr.fedorainfracloud.org/results/fed500/gloo/fedora-rawhide-x86_64/06440734-gloo/gloo.spec

Probably, it is helpful to initially have 3 builds:
i) No MPI, ibverbs, tcp
ii) OpenMPI, ibverbs, tcp
iii) MPICH, ibverbs, tcp

There are also nccl and rccl backends:
https://github.com/facebookincubator/gloo/tree/main/gloo/nccl
but nccl and rccl are not in Fedora.

1) https://github.com/NVIDIA/nccl
2) https://github.com/ROCmSoftwarePlatform/rccl

Please also add soname to shared the library listing:
https://docs.fedoraproject.org/en-US/packaging-guidelines/#_listing_shared_library_files

Comment 5 Tom Rix 2023-09-27 11:28:06 UTC
What do you mean by 3 builds ?
I am looking closer at nccl & rccl, nccl has a cuda requirement and rccl may have a cuda requirement.  There is no good way at this time to address building with cuda.

Comment 6 Benson Muite 2023-09-29 07:03:22 UTC
For an example build see:
https://copr.fedorainfracloud.org/coprs/fed500/gloo/build/6467673/
Pull requests to fix directory ownership:
https://src.fedoraproject.org/rpms/openmpi/pull-request/14
https://src.fedoraproject.org/rpms/mpich/pull-request/11

For OpenSSL there are suggested changes at:
https://github.com/facebookincubator/gloo/issues/358
Maybe it is better to create a patch? Supporting encrypted communications
seems useful.

rccl does not require Cuda, it does require Hip, but can be added later.

Comment 7 Tom Rix 2023-10-01 20:40:40 UTC
The first step in getting rccl, having hipify
https://bugzilla.redhat.com/show_bug.cgi?id=2241664

Comment 8 Tom Rix 2023-10-08 16:05:12 UTC
Spec URL: https://trix.fedorapeople.org/gloo.spec
SRPM URL: https://trix.fedorapeople.org/gloo-0.5.0%5egit20230824.01a0c81-4.fc40.src.rpm

I incorporated the changes from 
https://copr.fedorainfracloud.org/coprs/fed500/gloo/build/6467673/
And combined the subpackages for mpich and openmpi back into the gloo package

Since rccl will take a while and is orthogonal, let's go with this

Comment 9 Fedora Review Service 2023-10-08 16:19:55 UTC
Copr build:
https://copr.fedorainfracloud.org/coprs/build/6506272
(succeeded)

Review template:
https://download.copr.fedorainfracloud.org/results/@fedora-review/fedora-review-2240302-gloo/fedora-rawhide-x86_64/06506272-gloo/fedora-review/review.txt

Please take a look if any issues were found.

---
This comment was created by the fedora-review-service
https://github.com/FrostyX/fedora-review-service

If you want to trigger a new Copr build, add a comment containing new
Spec and SRPM URLs or [fedora-review-service-build] string.

Comment 10 Benson Muite 2023-10-09 05:16:11 UTC
The MPI packages should be separate.  OpenMPI and MPICH cannot be loaded at the same time.
Not everybody may want to use the MPI transport. So it is good to have 3 separate packages,
one without MPI, one with the OpenMPI implementation of MPI and a third with the MPICH
implementation of MPI. See:
https://docs.fedoraproject.org/en-US/neurofedora/mpi/
https://docs.fedoraproject.org/en-US/packaging-guidelines/MPI/#_packaging_of_mpi_software

If you have a patch for OpenSSL, that would be great, but not essential.

Comment 11 Tom Rix 2023-10-10 21:09:24 UTC
Spec URL: https://trix.fedorapeople.org/gloo.spec
SRPM URL: https://trix.fedorapeople.org/gloo-0.5.0%5egit20230824.01a0c81-5.fc40.src.rpm

For splitting them back.
Change the so version to like what was done for xnnpack
Fix a problem in the configury
Sorry, no patch for OpenSSL.

Comment 12 Fedora Review Service 2023-10-10 21:33:06 UTC
Created attachment 1993358 [details]
The .spec file difference from Copr build 6506272 to 6514816

Comment 13 Fedora Review Service 2023-10-10 21:33:09 UTC
Copr build:
https://copr.fedorainfracloud.org/coprs/build/6514816
(succeeded)

Review template:
https://download.copr.fedorainfracloud.org/results/@fedora-review/fedora-review-2240302-gloo/fedora-rawhide-x86_64/06514816-gloo/fedora-review/review.txt

Please take a look if any issues were found.

---
This comment was created by the fedora-review-service
https://github.com/FrostyX/fedora-review-service

If you want to trigger a new Copr build, add a comment containing new
Spec and SRPM URLs or [fedora-review-service-build] string.

Comment 14 Benson Muite 2023-10-11 18:20:03 UTC
Do not exclude
%{_libdir}/mpich/include/
and 
%{_libdir}/openmpi/include/

The mpich and openmpi development packages should be usable without the 
main development package. Please fix on import.

Approved.

Comment 15 Fedora Admin user for bugzilla script actions 2023-10-11 18:36:21 UTC
The Pagure repository was created at https://src.fedoraproject.org/rpms/gloo

Comment 16 Zbigniew Jędrzejewski-Szmek 2024-03-26 10:11:08 UTC
Apparently this was missed:
[x]: Package must not depend on deprecated() packages.

$ dnf5 repoquery --provides openssl1.1-devel | grep deprecated
deprecated()

This package will become FTBFS in a few days.


Note You need to log in before you can comment on or make changes to this bug.