Bug 1908740 - Review Request: harry - A tool for measuring string similarity
Summary: Review Request: harry - A tool for measuring string similarity
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: Package Review
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Ben Beasley
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: FE-DEADREVIEW
TreeView+ depends on / blocked
 
Reported: 2020-12-17 13:29 UTC by Ruediger Landmann
Modified: 2021-01-28 00:45 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2021-01-28 00:45:21 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Sample spec file implementing review feedback—still needs patches upstreamed (4.45 KB, text/plain)
2020-12-28 14:14 UTC, Ben Beasley
no flags Details
Patch for compiler warnings, mentioned in review (503 bytes, patch)
2020-12-28 14:16 UTC, Ben Beasley
no flags Details | Diff
Patch for test failures on non-x86 arches, mentioned in review (1.42 KB, patch)
2020-12-28 14:17 UTC, Ben Beasley
no flags Details | Diff

Description Ruediger Landmann 2020-12-17 13:29:15 UTC
Spec URL: https://download.copr.fedorainfracloud.org/results/rlandmann/Harry/fedora-rawhide-x86_64/01840938-harry/harry.spec
SRPM URL: https://download.copr.fedorainfracloud.org/results/rlandmann/Harry/fedora-rawhide-x86_64/01840938-harry/harry-0.4.2-1.fc34.src.rpm
Description: 
Harry is a small tool for comparing strings. The tool supports several common 
distance and kernel functions for strings as well as some exotic similarity 
measures. The focus of Harry lies on implicit similarity measures, that is, 
comparison functions that do not give rise to an explicit vector space. 
Examples of such similarity measures are the Levenshtein distance, the 
Jaro-Winkler distance or the spectrum kernel.

Comment 2 Ben Beasley 2020-12-28 14:13:04 UTC
Package Review
==============

Since I needed to make sure my suggestions did not have typos or omissions, I
implemented them in an updated spec file, which I will attach with two
additional patches, mentioned below. You may start with my updated spec file
and fill in the missing upstream bug URLs, or you may do things your own way.
If you use my spec file, you should make sure you understand my comments and
what I changed, check that I didn’t miss anything, and decide if you agree
with my choices. As is common in these reviews, some of my findings are
clear-cut applications of the guidelines, while others are open to discussion.
I’m happy to hear your thoughts.

Legend:
[x] = Pass, [!] = Fail, [-] = Not applicable, [?] = Not evaluated

===== Additional findings =====

[!]: The line “rm -rf %{buildroot}home” in %install is not needed, and
     does not even do what you intended (there is no slash between the
     buildroot directory and home, so it expands to something like
     “/builddir/build/BUILDROOT/harry-0.4.2-2.fc34.x86_64home”. Please
     remove it.

     I think this was intended to deal with the examples getting
     installed in the wrong place, with the buildroot getting expanded
     twice. You can fix that by removing the initial $(DESTDIR) in each
     examples/*/Makefile.am in %prep, using sed or a patch; this is a fix
     you should send upstream via a GitHub issue.

[!]: There are a lot of compiler errors about _BSD_SOURCE and
     _SVID_SOURCE being deprecated. This is not a review blocker, but it
     is so easy to fix that I think you should do it anyway to keep from
     obscuring meaningful compiler warnings. You can easily fix it with
     harry-0.4.2-default-source.patch (to be attached). Please apply it
     by adding “Patch1: harry-0.4.2-default-source.patch”, and changing
     “%autosetup” to “%autosetup -p1”, then file an upstream issue on
     GitHub offering the patch upstream. Then put a comment above the
     Patch1 line linking to the GitHub issue.

[!]: A new change for Fedora 34 is that packages using make need to have
     an explicit BuildRequires for it. This is not in the guidelines yet.
     Please add “BuildRequires: make”.

[!]: The “--prefix %{_bindir}” to %configure is neither correct nor
     needed. The prefix should be something like /usr, and the %configure
     macro already takes care of this. Please remove your --prefix
     argument.

     An additional benefit is that the documentation and the Python
     module are then installed in the right place automatically (at least
     after the previously-mentioned fix for the examples); everything
     after %make_install in the %install section can be removed.

[!]: The pattern for the man page should be “%{_mandir}/man1/harry.1*”
     or, perhaps better, “%{_mandir/man1/%{name}.1*” instead of
     “%{_mandir}/man1/harry.1.gz”, since the compression type could change.
     See https://docs.fedoraproject.org/en-US/packaging-guidelines/#_manpages.

[!]: Documentation must be marked as such. In the %files section, change
     “%{_docdir}/%{name}” to “%doc %{_pkgdocdir}”.

[!]: You can use %{name} in URL: as you did in Source0:, and you can use
     %{url} in Source0, like this:

     URL:            https://github.com/rieck/%{name}
     Source0:        %{url}/archive/%{version}.tar.gz

     That’s a lot cleaner.

[!]: You should give the tests the opportunity to run in parallel; see
     https://docs.fedoraproject.org/en-US/packaging-guidelines/#_parallel_make.
     Change “make check” to “make check %{?_smp_mflags}” or, even better
     in my opinion, “%make_build check”.

[!]: The BuildRequires on libomp is not correct; libomp is an LLVM-specific
     OpenMP runtime library. (Plus, libomp is not available on s390x, so this
     would impose an unnecessary ExcludeArch requirement.) Remove it; the gcc
     dependency is enough to get OpenMP support. (GCC 4.2 and later support
     OpenMP 2.5.)

[!]: The check_lee test fails on architectures other than i686/x86_64. You can
     test this by doing a koji scratch build. The problem is implicit
     assumptions about whether “char” is “signed char” or “unsigned char” in
     hstring.h/hstring.c. I am attaching a patch,
     harry-0.4.2-hstring-signed-vs-unsigned-char.patch, that fixes this. I am
     not certain that it fixes all bugs of this nature, but the tests do
     pass. You should file a GitHub issue about this and offer the patch
     upstream.

===== MUST items =====

C/C++:
[x]: Package does not contain kernel modules.
[x]: Package contains no static executables.
[x]: If your application is a C or C++ application you must list a
     BuildRequires against gcc, gcc-c++ or clang.
[-]: Header files in -devel subpackage, if present.
[x]: Package does not contain any libtool archives (.la)
[x]: Rpath absent or only used for internal libs.

Generic:
[x]: Package is licensed with an open-source compatible license and meets
     other legal requirements as defined in the legal section of Packaging
     Guidelines.
[!]: License field in the package spec file matches the actual license.
     Note: Checking patched sources after %prep for licenses. Licenses
     found: "Unknown or generated", "GNU General Public License v2.0 or
     later", "GNU General Public License v3.0 or later", "*No copyright*
     Public domain". 56 files have unknown license. Detailed output of
     licensecheck in
     /home/ben/Downloads/harry/1908740-harry/licensecheck.txt

     Since src/md5.h, src/md5.c, src/murmur.h, and src/murmur.c are in the
     public domain, please change the License field from “GPLv3+” to “GPLv3+
     and Public Domain”, and add a comment or additional %license file
     explaining which files are under which license. For example, a comment
     “The entire source code is GPLv3+ except src/md5.* and src/murmur.*,
     which are Public Domain” in the spec file before the License field would
     suffice. See
     https://docs.fedoraproject.org/en-US/packaging-guidelines/LicensingGuidelines/#_multiple_licensing_scenarios.
     You can omit mention of md5.c/md5.h if you remove or overwite it in
     prep as mentioned in the bundled library discussion below.
     
     Only harry-0.4.2/git2changes.py and harry-0.4.2/m4/pkg.m4 are GPLv2+
     instead of GPLv3+; since they are not part of the installed RPM
     (including by being compiled into a library or executable), I agree that
     GPLv2+ does not need to appear in the License field.

[x]: License file installed when any subpackage combination is installed.
[x]: %build honors applicable compiler flags or justifies otherwise.
[!]: Package contains no bundled libraries without FPC exception.

     So, this point in the fedora-review template is out of date. The current
     guidelines governing bundling are at
     https://docs.fedoraproject.org/en-US/packaging-guidelines/#bundling, and
     these are the “actual rules” for packaging. (Technically, decisions made
     by FESCo/FPC but not yet written into the guidelines are an even higher
     authority.) See also https://pagure.io/packaging-committee/issue/575. So
     no FPC exceptions will be required for bundled libraries, but we do need
     to follow the guidelines. Here is a page describing why it’s best to
     avoid bundled libraries whenever possible:
     https://fedoraproject.org/wiki/Bundled_Libraries.

     src/md5.h, src/md5.c: Even under the old rules, this was covered by an
     existing exception as a “copylib,” a small snippet intended by its
     upstream for copy-pasting into other projects. See
     https://fedoraproject.org/wiki/Bundled_Libraries_Virtual_Provides, which
     documented this. You have two options
     (https://docs.fedoraproject.org/en-US/packaging-guidelines/#bundling):
       1. Indicate it in the spec file: use “Provides: bundled(md5-plumb)”.
       2. Since you are not passing
	  --enable-md5 to the configure script, you may remove the
	  bundled copylib (src/md5.c and src/md5.h) in %prep, and omit
	  the virtual Provides. Unfortunately, upstream does not support
          building with these files missing, so you would have two options:
	  a. In %prep, “rm -f src/md5.h src/md5.c”, then patch src/util.c
             to wrap “#include "md5.h"” in
	     “#ifdef ENABLE_MD5HASH”/“#endif”; and patch src/Makefile.am
	     to build src/md5.c only conditionally. See
             https://www.gnu.org/software/automake/manual/html_node/Conditional-Sources.html.
	     If you get all of this patched, you should definitely send
             the changes upstream in a GitHub issue or PR.
          b. In %prep, overwrite the bundled library with the emptiest valid source
	     files, like
	     “echo '' > src/md5.h; echo 'typedef int dummy;' > src/md5.c”.
             Technically, this is a patch, and so under
             https://docs.fedoraproject.org/en-US/packaging-guidelines/PatchUpstreamStatus/
             you should add a comment to the effect that this is a
             workaround for ensuring the md5.c copylib is unbundled, but that because
             it is not conditionalized, it is not suitable for sending upstream.
     Personally, I favor option 2.b.; it is almost as easy as option 1.,
     but it gets rid of the unnecessary bundled copylib.

     src/murmur.h, src/murmur.c: This is a bundled copy of MurmurHash2 from
     https://github.com/aappleby/smhasher, trivially ported to C. Under the
     old rules, I would advise you to apply for an FPC exception on the
     grounds that this is a copylib like the md5.c and sha1.c the FPC approved
     long ago; see
     https://pagure.io/packaging-committee/issues?status=Closed&search_pattern=md5&close_status=
     and
     https://pagure.io/packaging-committee/issues?status=Closed&search_pattern=sha1&close_status=.
     Today, the guidelines don’t even mention copylibs, only “system
     libraries,” so we’re on our own to a certain extent. Let’s document
     it with “Provides: bundled(murmurhash2)” since there is no
     associated version number.

     src/uthash.h: This is a bundled copy of uthash 1.6 as a header-only
     library. See
     https://docs.fedoraproject.org/en-US/packaging-guidelines/#_packaging_header_only_libraries.
     This is available in fedora as uthash/uthash-devel. The packaging
     guidelines say that this MUST be unbundled if upstream provides a
     mechanism to do so, which it does: the configure script does know
     how to find the system uthash, and will use it in preference to the
     bundled copy. You
     must therefore make two changes:
       1. In %prep, “rm -f src/uthash.h” to ensure the bundled copy is
          not used in the build no matter what.
       2. Add “BuildRequires: uthash-static”.

[x]: Changelog in prescribed format.
[x]: Package does not run rm -rf %{buildroot} (or $RPM_BUILD_ROOT) at the
     beginning of %install.
     Note: rm -rf %{buildroot} present but not required
[x]: Sources contain only permissible code or content.
[-]: Package contains desktop file if it is a GUI application.
[-]: Development files must be in a -devel package
[x]: Package uses nothing in %doc for runtime.
[x]: Package consistently uses macros (instead of hard-coded directory
     names).
[x]: Package is named according to the Package Naming Guidelines.
[x]: Package does not generate any conflict.
[x]: Package obeys FHS, except libexecdir and /usr/target.
[-]: If the package is a rename of another package, proper Obsoletes and
     Provides are present.
[!]: Requires correct, justified where necessary.

     Remove “Requires: python3-Levenshtein” since it is used only for the
     tests; keep only the BuildRequires on this.

     It appears python3-urllib3 is not used at all, and both “Requires:
     python3-urllib3” and “BuildRequires: python3-urllib3” should be
     removed.

[x]: Spec file is legible and written in American English.
[-]: Package contains systemd file(s) if in need.
[x]: Useful -debuginfo package or justification otherwise.
[x]: Package is not known to require an ExcludeArch tag.
[!]: Large documentation must go in a -doc subpackage. Large could be size
     (~1MB) or number of files.
     Note: Documentation size is 583680 bytes in 14 files.

     So, you can perhaps get away with not splitting this out given the
     current contents based on the guidelines; I think you should do it
     anyway, especially since the existing documentation is already much
     larger than the other contents of the package (executable and
     Python module).
     
     Also, I really like the text and PDF versions of the documentation, even
     though their contents are the same as the man page. I think you should
     build these and include them, along with the other documentation, in a
     -doc subpackage. You can build them by adding
     “BuildRequires: perl-podlators” (for text) and
     “BuildRequires: perl-Pod-LaTeX” and “BuildRequires: tex(latex)” (for
     PDF), “%make_build -C doc %{name}.txt %{name}.pdf” in %build, and
      “cp -p doc/%{name}.txt doc/%{name}.pdf %{buildroot}%{_pkgdocdir}” in
     %install. If you don’t want to, though, you don’t have to. The man
     page suffices.

     The Doxygen HTML documentation is not useful for users, since it
     covers only implementation details internal to the command-line
     tool, so you are right not to package it.

     The -doc subpackage should be noarch, does not need to depend on
     the base package, and therefore should have its own copy of the
     license file.

[!]: Package complies to the Packaging Guidelines
     Fix other findings to resolve this.
[x]: Package successfully compiles and builds into binary rpms on at least
     one supported primary architecture.
[x]: Package installs properly.
[x]: Rpmlint is run on all rpms the build produces.
     Note: There are rpmlint messages (see attachment).
[x]: If (and only if) the source package includes the text of the
     license(s) in its own file, then that file, containing the text of the
     license(s) for the package is included in %license.
[x]: Package requires other packages for directories it uses.
[x]: Package must own all directories that it creates.
[x]: Package does not own files or directories owned by other packages.
[x]: Package uses either %{buildroot} or $RPM_BUILD_ROOT
[x]: Macros in Summary, %description expandable at SRPM build time.
[x]: Dist tag is present.
[x]: Package does not contain duplicates in %files.
[x]: Permissions on files are set properly.
[x]: Package must not depend on deprecated() packages.
[x]: Package use %makeinstall only when make install DESTDIR=... doesn't
     work.
[x]: Package is named using only allowed ASCII characters.
[x]: Package does not use a name that already exists.
[x]: Package is not relocatable.
[x]: Sources used to build the package match the upstream source, as
     provided in the spec URL.
[x]: Spec file name must match the spec package %{name}, in the format
     %{name}.spec.
[x]: File names are valid UTF-8.
[x]: Packages must not store files under /srv, /opt or /usr/local

Python:
[-]: Python eggs must not download any dependencies during the build
     process.
[-]: A package which is used by another package via an egg interface should
     provide egg info.
[!]: Package meets the Packaging Guidelines::Python

     This package provides a bare module, harry.py, not a package. There
     should be no “harry” subdirectory in site-packages. Drop the
     --prefix option from %configure and the “mv” from the %install
     section and the makefile will install it correctly. Then change
     “%{python3_sitelib}/%{name}” in the %files section to
     “%pycached %{python3_sitelib}/%{name}.py”.

     Please add “%py-provides python3-%{name}”; see
     https://docs.fedoraproject.org/en-US/packaging-guidelines/Python/#_the_py_provides_macro

[x]: Package contains BR: python2-devel or python3-devel
[x]: Packages MUST NOT have dependencies (either build-time or runtime) on
     packages named with the unversioned python- prefix unless no properly
     versioned package exists. Dependencies on Python packages instead MUST
     use names beginning with python2- or python3- as appropriate.
[x]: Python packages must not contain %{pythonX_site(lib|arch)}/* in %files
[x]: Binary eggs must be removed in %prep

===== SHOULD items =====

Generic:
[-]: If the source package does not include license text(s) as a separate
     file from upstream, the packager SHOULD query upstream to include it.
[!]: Final provides and requires are sane (see attachments).
     See previous note regarding python3-Levenshtein and python3-urllib3.
[x]: Package functions as described.
[x]: Latest version is packaged.
[x]: Package does not include license text files separate from upstream.
[!]: Patches link to upstream bugs/comments/lists or are otherwise
     justified.

     See
     https://docs.fedoraproject.org/en-US/packaging-guidelines/PatchUpstreamStatus/.
     Please file upstream issues on GitHub and link to them in a spec
     file comment for any patches.

     You should certainly file an upstream issue for
     harry_autotools.patch and link it.

     You should also link https://github.com/rieck/harry/issues/18 for the
     lines after “# convert to Python 3”; altering sources with sed is
     considered a patch, and this is the corresponding upstream issue. You
     could even offer the converted and altered version as a PR if you like,
     to make it easier for upstream to migrate.

[-]: Sources are verified with gpgverify first in %prep if upstream
     publishes signatures.
     Note: gpgverify is not used.
[-]: Description and summary sections in the package spec file contains
     translations for supported Non-English languages, if available.
[x]: Package should compile and build into binary rpms on all supported
     architectures.
[x]: %check is present and all tests pass.
[x]: Packages should try to preserve timestamps of original installed
     files.
[x]: Reviewer should test that the package builds in mock.
[x]: Buildroot is not present
[x]: Package has no %clean section with rm -rf %{buildroot} (or
     $RPM_BUILD_ROOT)
[x]: No file requires outside of /etc, /bin, /sbin, /usr/bin, /usr/sbin.
[x]: Fully versioned dependency in subpackages if applicable.
[x]: Packager, Vendor, PreReq, Copyright tags should not be in spec file
[x]: Sources can be downloaded from URI in Source: tag
[x]: SourceX is a working URL.
[x]: Spec use %global instead of %define unless justified.

===== EXTRA items =====

Generic:
[x]: Rpmlint is run on debuginfo package(s).
     Note: No rpmlint messages.
[x]: Rpmlint is run on all installed packages.
     Note: There are rpmlint messages (see attachment).
[x]: Large data in /usr/share should live in a noarch subpackage if package
     is arched.
[x]: Package should not use obsolete m4 macros
[x]: Spec file according to URL is the same as in SRPM.


Rpmlint
-------
Checking: harry-0.4.2-2.fc34.x86_64.rpm
          harry-debuginfo-0.4.2-2.fc34.x86_64.rpm
          harry-debugsource-0.4.2-2.fc34.x86_64.rpm
          harry-0.4.2-2.fc34.src.rpm
harry.x86_64: W: file-not-utf8 /usr/share/doc/harry/reuters/reuters.zip
4 packages and 0 specfiles checked; 0 errors, 1 warnings.




Rpmlint (debuginfo)
-------------------
Checking: harry-debuginfo-0.4.2-2.fc34.x86_64.rpm
1 packages and 0 specfiles checked; 0 errors, 0 warnings.





Rpmlint (installed packages)
----------------------------
harry.x86_64: W: file-not-utf8 /usr/share/doc/harry/reuters/reuters.zip
3 packages and 0 specfiles checked; 0 errors, 1 warnings.



Source checksums
----------------
https://github.com/rieck/harry/archive/0.4.2.tar.gz :
  CHECKSUM(SHA256) this package     : a55eee754ffaf14edbb4c8b359b797589f066605c5b5cd2667ea3246c4cbb0e2
  CHECKSUM(SHA256) upstream package : a55eee754ffaf14edbb4c8b359b797589f066605c5b5cd2667ea3246c4cbb0e2


Requires
--------
harry (rpmlib, GLIBC filtered):
    libarchive.so.13()(64bit)
    libc.so.6()(64bit)
    libconfig.so.11()(64bit)
    libgcc_s.so.1()(64bit)
    libgcc_s.so.1(GCC_3.3.1)(64bit)
    libgomp.so.1()(64bit)
    libgomp.so.1(GOMP_1.0)(64bit)
    libgomp.so.1(GOMP_4.0)(64bit)
    libgomp.so.1(OMP_1.0)(64bit)
    libm.so.6()(64bit)
    libpthread.so.0()(64bit)
    libz.so.1()(64bit)
    libz.so.1(ZLIB_1.2.0)(64bit)
    python(abi)
    python3
    python3-Levenshtein
    python3-numpy
    python3-urllib3
    rtld(GNU_HASH)

harry-debuginfo (rpmlib, GLIBC filtered):

harry-debugsource (rpmlib, GLIBC filtered):



Provides
--------
harry:
    harry
    harry(x86-64)

harry-debuginfo:
    debuginfo(build-id)
    harry-debuginfo
    harry-debuginfo(x86-64)

harry-debugsource:
    harry-debugsource
    harry-debugsource(x86-64)



Generated by fedora-review 0.7.6 (b083f91) last change: 2020-11-10
Command line :/usr/bin/fedora-review -b 1908740
Buildroot used: fedora-rawhide-x86_64
Active plugins: Python, Generic, Shell-api, C/C++
Disabled plugins: Ocaml, Java, SugarActivity, Haskell, PHP, Perl, fonts, R
Disabled flags: EPEL6, EPEL7, DISTTAG, BATCH, EXARCH

Comment 3 Ben Beasley 2020-12-28 14:14:52 UTC
Created attachment 1742694 [details]
Sample spec file implementing review feedback—still needs patches upstreamed

You may use this as a nearly-complete starting point if you like. Please review all changes carefully, along with the corresponding review feedback, and make sure you both understand and agree with them.

Comment 4 Ben Beasley 2020-12-28 14:16:16 UTC
Created attachment 1742695 [details]
Patch for compiler warnings, mentioned in review

Add _DEFAULT_SOURCE to deprecated _BSD_SOURCE and _SVID_SOURCE; this is backwards-compatible while avoiding deprecation warnings on modern glibc.

Comment 5 Ben Beasley 2020-12-28 14:17:02 UTC
Created attachment 1742696 [details]
Patch for test failures on non-x86 arches, mentioned in review

Fix reliance on implementation-dependent signedess of char in hstring.

Comment 6 Ben Beasley 2020-12-28 14:22:29 UTC
Koji scratch build of original submission, showing test failures on non-x86 architectures (except s390x, which fails due to bogus libomp dependency):

https://koji.fedoraproject.org/koji/taskinfo?taskID=58471386

Koji build of attached sample updated spec file, with harry-0.4.2-hstring-signed-vs-unsigned-char.patch:

https://koji.fedoraproject.org/koji/taskinfo?taskID=58423597

Comment 7 Package Review 2021-01-28 00:45:21 UTC
This is an automatic action taken by review-stats script.

The ticket submitter failed to clear the NEEDINFO flag in a month.
As per https://fedoraproject.org/wiki/Policy_for_stalled_package_reviews
we consider this ticket as DEADREVIEW and proceed to close it.


Note You need to log in before you can comment on or make changes to this bug.