Bug 2125789 - Review Request: simdutf - Unicode validation and transcoding at billions of characters per second
Summary: Review Request: simdutf - Unicode validation and transcoding at billions of c...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: Package Review
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Benson Muite
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-09-10 11:30 UTC by Ali Erdinc Koroglu
Modified: 2022-11-14 07:03 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2022-11-14 07:03:35 UTC
Type: Bug
Embargoed:
benson_muite: fedora-review+


Attachments (Terms of Use)

Description Ali Erdinc Koroglu 2022-09-10 11:30:33 UTC
SPEC Url: https://download.copr.fedorainfracloud.org/results/aekoroglu/fedora/fedora-rawhide-x86_64/04831505-simdutf/simdutf.spec
SRPM Url: https://download.copr.fedorainfracloud.org/results/aekoroglu/fedora/fedora-rawhide-x86_64/04831505-simdutf/simdutf-1.0.1-1.fc38.src.rpm

Description
Unicode (UTF8, UTF16, UTF32) validation and transcoding at billions of 
characters per second using SSE2, AVX2, NEON, AVX-512.

Comment 1 Benson Muite 2022-09-11 06:55:45 UTC
It builds on Copr:
https://copr.fedorainfracloud.org/coprs/fed500/simdutf/build/4832009/
However, tests fail on Celeron processors with Haswell architectures that do not have AVX2 instructions. Issue filed at:
https://github.com/simdutf/simdutf/issues/168

Maybe a patch is required?  A similar robustness improvement can be applied to https://github.com/simdjson/simdjson

Comment 2 Benson Muite 2022-09-11 16:18:44 UTC
Checking the header only library on Celeron indicates architecture detection works ok. Simply using ctest will fail on an x86_64 processor that does not support AVX instructions. Ran fedora-review on a package without tests.

Package Review
==============

Legend:
[x] = Pass, [!] = Fail, [-] = Not applicable, [?] = Not evaluated
[ ] = Manual review needed



===== MUST items =====

C/C++:
[x]: Package does not contain kernel modules.
[x]: Package contains no static executables.
[x]: If your application is a C or C++ application you must list a
     BuildRequires against gcc, gcc-c++ or clang.
[x]: Header files in -devel subpackage, if present.
[x]: ldconfig not called in %post and %postun for Fedora 28 and later.
[x]: Package does not contain any libtool archives (.la)
[x]: Rpath absent or only used for internal libs.
[x]: Development (unversioned) .so files in -devel subpackage, if present.

Generic:
[x]: Package is licensed with an open-source compatible license and meets
     other legal requirements as defined in the legal section of Packaging
     Guidelines.
[!]: License field in the package spec file matches the actual license.
     Note: Checking patched sources after %prep for licenses. Licenses
     found: "Unknown or generated", "*No copyright* Apache License 2.0",
     "MIT License", "Unicode strict Apache License 2.0", "Open Software
     License 3.0", "*No copyright* Boost Software License 1.0", "GNU Lesser
     General Public License, Version 2.1", "BSD 3-Clause License", "*No
     copyright* Open Software License 3.0". 262 files have unknown license.
     Detailed output of licensecheck in
     /home/FedoraPackaging/reviews/simdutf/2125789-simdutf/srpm-
     unpacked/review-simdutf/licensecheck.txt
[x]: License file installed when any subpackage combination is installed.
[!]: If the package is under multiple licenses, the licensing breakdown
     must be documented in the spec.
[x]: %build honors applicable compiler flags or justifies otherwise.
[x]: Package contains no bundled libraries without FPC exception.
[x]: Changelog in prescribed format.
[x]: Sources contain only permissible code or content.
[-]: Package contains desktop file if it is a GUI application.
[x]: Development files must be in a -devel package
[x]: Package uses nothing in %doc for runtime.
[x]: Package consistently uses macros (instead of hard-coded directory
     names).
[x]: Package is named according to the Package Naming Guidelines.
[x]: Package does not generate any conflict.
[x]: Package obeys FHS, except libexecdir and /usr/target.
[-]: If the package is a rename of another package, proper Obsoletes and
     Provides are present.
[x]: Requires correct, justified where necessary.
[x]: Spec file is legible and written in American English.
[-]: Package contains systemd file(s) if in need.
[x]: Useful -debuginfo package or justification otherwise.
[?]: Package is not known to require an ExcludeArch tag.
[-]: Large documentation must go in a -doc subpackage. Large could be size
     (~1MB) or number of files.
     Note: Documentation size is 20480 bytes in 2 files.
[x]: Package complies to the Packaging Guidelines
[x]: Package successfully compiles and builds into binary rpms on at least
     one supported primary architecture.
[x]: Package installs properly.
[x]: Rpmlint is run on all rpms the build produces.
     Note: There are rpmlint messages (see attachment).
[x]: If (and only if) the source package includes the text of the
     license(s) in its own file, then that file, containing the text of the
     license(s) for the package is included in %license.
[x]: Package requires other packages for directories it uses.
[x]: Package must own all directories that it creates.
[x]: Package does not own files or directories owned by other packages.
[x]: Package uses either %{buildroot} or $RPM_BUILD_ROOT
[x]: Package does not run rm -rf %{buildroot} (or $RPM_BUILD_ROOT) at the
     beginning of %install.
[x]: Macros in Summary, %description expandable at SRPM build time.
[x]: Dist tag is present.
[x]: Package does not contain duplicates in %files.
[x]: Permissions on files are set properly.
[x]: Package must not depend on deprecated() packages.
[x]: Package use %makeinstall only when make install DESTDIR=... doesn't
     work.
[x]: Package is named using only allowed ASCII characters.
[x]: Package does not use a name that already exists.
[x]: Package is not relocatable.
[x]: Sources used to build the package match the upstream source, as
     provided in the spec URL.
[x]: Spec file name must match the spec package %{name}, in the format
     %{name}.spec.
[x]: File names are valid UTF-8.
[x]: Packages must not store files under /srv, /opt or /usr/local

===== SHOULD items =====

Generic:
[-]: If the source package does not include license text(s) as a separate
     file from upstream, the packager SHOULD query upstream to include it.
[x]: Final provides and requires are sane (see attachments).
[?]: Fully versioned dependency in subpackages if applicable.
     Note: No Requires: %{name}%{?_isa} = %{version}-%{release} in simdutf-
     devel
[x]: Package functions as described.
[x]: Latest version is packaged.
[x]: Package does not include license text files separate from upstream.
[x]: Patches link to upstream bugs/comments/lists or are otherwise
     justified.
[-]: Sources are verified with gpgverify first in %prep if upstream
     publishes signatures.
     Note: gpgverify is not used.
[x]: Package should compile and build into binary rpms on all supported
     architectures.
[?]: %check is present and all tests pass.
[x]: Packages should try to preserve timestamps of original installed
     files.
[x]: Reviewer should test that the package builds in mock.
[x]: Buildroot is not present
[x]: Package has no %clean section with rm -rf %{buildroot} (or
     $RPM_BUILD_ROOT)
[x]: No file requires outside of /etc, /bin, /sbin, /usr/bin, /usr/sbin.
[x]: Packager, Vendor, PreReq, Copyright tags should not be in spec file
[x]: Sources can be downloaded from URI in Source: tag
[x]: SourceX is a working URL.
[x]: Spec use %global instead of %define unless justified.

===== EXTRA items =====

Generic:
[!]: Spec file according to URL is the same as in SRPM.
     Note: Spec file as given by url is not the same as in SRPM (see
     attached diff).
     See: (this test has no URL)
[x]: Rpmlint is run on debuginfo package(s).
     Note: There are rpmlint messages (see attachment).
[x]: Rpmlint is run on all installed packages.
     Note: There are rpmlint messages (see attachment).
[x]: Large data in /usr/share should live in a noarch subpackage if package
     is arched.


Rpmlint
-------
Cannot parse rpmlint output:


Rpmlint (debuginfo)
-------------------
Cannot parse rpmlint output:



Rpmlint (installed packages)
----------------------------
Cannot parse rpmlint output:


Source checksums
----------------
https://github.com/simdutf/simdutf/archive/v1.0.1/simdutf-1.0.1.tar.gz :
  CHECKSUM(SHA256) this package     : e7832ba58fb95fe00de76dbbb2f17d844a7ad02a6f5e3e9e5ce9520e820049a0
  CHECKSUM(SHA256) upstream package : e7832ba58fb95fe00de76dbbb2f17d844a7ad02a6f5e3e9e5ce9520e820049a0


Requires
--------
simdutf (rpmlib, GLIBC filtered):
    libc.so.6()(64bit)
    libgcc_s.so.1()(64bit)
    libgcc_s.so.1(GCC_3.0)(64bit)
    libstdc++.so.6()(64bit)
    libstdc++.so.6(CXXABI_1.3)(64bit)
    rtld(GNU_HASH)

simdutf-devel (rpmlib, GLIBC filtered):
    cmake-filesystem(x86-64)
    libsimdutf.so.1()(64bit)

simdutf-debuginfo (rpmlib, GLIBC filtered):

simdutf-debugsource (rpmlib, GLIBC filtered):



Provides
--------
simdutf:
    libsimdutf.so.1()(64bit)
    simdutf
    simdutf(x86-64)

simdutf-devel:
    cmake(simdutf)
    simdutf-devel
    simdutf-devel(x86-64)

simdutf-debuginfo:
    debuginfo(build-id)
    libsimdutf.so.1.0.0-1.0.1-1.fc38.x86_64.debug()(64bit)
    simdutf-debuginfo
    simdutf-debuginfo(x86-64)

simdutf-debugsource:
    simdutf-debugsource
    simdutf-debugsource(x86-64)



Diff spec file in url and in SRPM
---------------------------------
--- /home/FedoraPackaging/simdutf/2125789-simdutf/srpm-unpacked/simdutf.spec	2022-09-11 17:42:59
.228210856 +0300
+++ /home/FedoraPackaging/simdutf/2125789-simdutf/srpm-unpacked/review-simdutf/srpm-unpacked/simdut
f.spec	2022-09-11 17:44:26.000000000 +0300
@@ -1,2 +1,11 @@
+## START: Set by rpmautospec
+## (rpmautospec version 0.3.0)
+%define autorelease(e:s:pb:n) %{?-p:0.}%{lua:
+    release_number = 1;
+    base_release_number = tonumber(rpm.expand("%{?-b*}%{!?-b:1}"));
+    print(release_number + base_release_number - 1);
+}%{?-e:.%{-e*}}%{?-s:.%{-s*}}%{!?-n:%{?dist}}
+## END: Set by rpmautospec
+
 Name:           simdutf
 Version:        1.0.1
@@ -49,3 +58,4 @@
 
 %changelog
-%autochangelog
+* Sun Sep 11 2022 John Doe <packager> 1.0.1-1
+- Uncommitted changes


Generated by fedora-review 0.8.0 (e988316) last change: 2022-04-07
Command line :/usr/bin/fedora-review -n simdutf
Buildroot used: fedora-rawhide-x86_64
Active plugins: Generic, Shell-api, C/C++
Disabled plugins: PHP, Ocaml, Perl, fonts, R, Java, Haskell, SugarActivity, Python
Disabled flags: EPEL6, EPEL7, DISTTAG, BATCH, EXARCH


Comments:
a) licensecheck.txt contains further licenses, please add a breakdown in the spec file:

*No copyright* Apache License 2.0
---------------------------------
simdutf-1.0.1/LICENSE-APACHE
simdutf-1.0.1/README.md

*No copyright* Boost Software License 1.0
-----------------------------------------
simdutf-1.0.1/benchmarks/competition/utf8lut/LICENSE

*No copyright* Open Software License 3.0
----------------------------------------
simdutf-1.0.1/benchmarks/competition/u8u16/src/libu8u16.w

BSD 3-Clause License
--------------------
simdutf-1.0.1/include/simdutf/internal/isadetection.h

GNU Lesser General Public License, Version 2.1
----------------------------------------------
simdutf-1.0.1/benchmarks/competition/utf8sse4/fromutf8-sse.cpp

MIT License
-----------
simdutf-1.0.1/LICENSE-MIT
simdutf-1.0.1/benchmarks/competition/utf8lut/src/core/Dfa.h

Open Software License 3.0
-------------------------
simdutf-1.0.1/benchmarks/competition/u8u16/OSL3.0.txt

Unicode strict Apache License 2.0
---------------------------------
simdutf-1.0.1/benchmarks/competition/llvm/ConvertUTF.cpp
simdutf-1.0.1/benchmarks/competition/llvm/ConvertUTF.h

b) There is an option to have a header only library, which can be useful for development. Might it be worth also packaging this as a subpackage?
c) License files are in both the main package and the -devel package.  As the -devel package should pull in the main package, it should be ok to just have the license files in the main package.
d) It is also possible to include address sanitization with libasan.  This could be done in a subpackage, if it would be helpful for users.
e) For architecture detection on x86, may want to modify the testing helper function since tests will fail on x86 processors without AVX2 instructions. The tests are not packaged and the header only library does runtime detection correctly, but should there be an x86 machine on Koji without AVX2, building will fail on that machine.

Comment 4 Benson Muite 2022-09-14 03:08:41 UTC
Thanks for the information on architectures. ConvertUTF.cpp and ConvertUTF.h maybe problematic.  As per discussion at https://github.com/AmokHuginnsson/replxx/issues/12 can they be replaced by https://github.com/nemtrif/utfcpp which is packaged https://packages.fedoraproject.org/pkgs/utf8cpp/utf8cpp-devel/

Comment 5 Benson Muite 2022-09-18 15:46:29 UTC
Filed an issue upstream on licenses of included files https://github.com/simdutf/simdutf/issues/170

Comment 6 Benson Muite 2022-09-25 08:38:00 UTC
Apache-2.0 should be ASL 2.0 https://docs.fedoraproject.org/en-US/legal/allowed-licenses/

Comment 7 Benson Muite 2022-10-29 07:12:16 UTC
It seems ConvertUTF.cpp and ConvertUTF.h can be removed/replaced https://github.com/simdutf/simdutf/issues/170

Comment 8 Benson Muite 2022-10-30 07:14:14 UTC
For packaging, ConvertUTF.cpp and ConvertUTF.h are not needed as they are only used in the benchmarks.  These are not run or packaged.  Thus, the folder benchmarks can be removed, and lines 23 to 27 removed from the topmost CMakeLists.txt, with those changes I think it should be ok.

Comment 10 Benson Muite 2022-11-10 06:32:38 UTC
Can you add
rm -r benchmarks
after 
%autosetup -n %{name}-%{version} -p1
in the spec file. This will ensure troublesome content is not installed in error.

If the benchmark directory is removed, only licenses left are Apache 2 and BSD 3 Clause. 
All files are under Apache 2, except simdutf-2.0.2/include/simdutf/internal/isadetection.h which is BSD 3 Clause
Can this information be added to the spec file? Licenses breakdown is required.

Comment 12 Benson Muite 2022-11-10 17:23:35 UTC
Approved. 

Two minor points that can be addressed when importing:
1) 
%license LICENSE-APACHE LICENSE-MIT
should just be
%license LICENSE-APACHE

MIT licensed material was removed

2) after the line

License:        Apache-2.0 AND BSD-3-Clause
add
#All files are under Apache 2, except simdutf-2.0.2/include/simdutf/internal/isadetection.h which is BSD 3 Clause

Comment 13 Ali Erdinc Koroglu 2022-11-10 17:44:50 UTC
You're right fixing it, thanks.

Comment 14 Gwyn Ciesla 2022-11-10 19:58:42 UTC
(fedscm-admin):  The Pagure repository was created at https://src.fedoraproject.org/rpms/simdutf


Note You need to log in before you can comment on or make changes to this bug.