Spec URL: https://gist.github.com/xanderlent/fae314c0c50218ec065a6fcbdc7d2b38/raw/6214c868ad35a5f96be8faba43ef7cb3bf84212f/rust-tokenizers.spec SRPM URL: https://gist.github.com/xanderlent/fae314c0c50218ec065a6fcbdc7d2b38/raw/6214c868ad35a5f96be8faba43ef7cb3bf84212f/rust-tokenizers-0.21.1-1.fc43.src.rpm Description: Provides an implementation of today's most used tokenizers, with a focus on performances and versatility. Fedora Account System Username: xanderlent This is the rust-tokenizers package, a core piece of the Hugging Face libraries.
Copr build: https://copr.fedorainfracloud.org/coprs/build/8881511 (failed) Build log: https://download.copr.fedorainfracloud.org/results/@fedora-review/fedora-review-2358553-rust-tokenizers/fedora-rawhide-x86_64/08881511-rust-tokenizers/builder-live.log.gz Please make sure the package builds successfully at least for Fedora Rawhide. - If the build failed for unrelated reasons (e.g. temporary network unavailability), please ignore it. - If the build failed because of missing BuildRequires, please make sure they are listed in the "Depends On" field --- This comment was created by the fedora-review-service https://github.com/FrostyX/fedora-review-service If you want to trigger a new Copr build, add a comment containing new Spec and SRPM URLs or [fedora-review-service-build] string.
Failure is expected due to missing deps.
Spec URL: https://gist.github.com/xanderlent/fae314c0c50218ec065a6fcbdc7d2b38/raw/70d43789832750dbb0aaad81f154452b93c7b1d0/rust-tokenizers.spec SRPM URL: https://gist.github.com/xanderlent/fae314c0c50218ec065a6fcbdc7d2b38/raw/70d43789832750dbb0aaad81f154452b93c7b1d0/rust-tokenizers-0.21.4-1.fc43.src.rpm rust2rpm.toml file: https://gist.github.com/xanderlent/fae314c0c50218ec065a6fcbdc7d2b38/raw/70d43789832750dbb0aaad81f154452b93c7b1d0/rust2rpm.toml
Spec URL: https://gist.github.com/xanderlent/fae314c0c50218ec065a6fcbdc7d2b38/raw/ab322958f48efa31baaeaa71362b123fce1aaeca/rust-tokenizers.spec SRPM URL: https://gist.github.com/xanderlent/fae314c0c50218ec065a6fcbdc7d2b38/raw/ab322958f48efa31baaeaa71362b123fce1aaeca/rust-tokenizers-0.21.4-1.fc43.src.rpm See also rust2rpm.toml: https://gist.github.com/xanderlent/fae314c0c50218ec065a6fcbdc7d2b38/raw/ab322958f48efa31baaeaa71362b123fce1aaeca/rust2rpm.toml
One possible improvement is to drop the unstable_wasm feature; we don't really need to package it.
It looks like "tokenizers" depends on indicatif == 0.17, while the Fedora package has been bumped to 0.18 and no rust-indicatif0.17 package exists yet. In order to build, we'll have to either get the tokenizers project to update their dep and publish a release, or we'll have to get a rust-indicatif0.17 package published.
... building indicatif == 0.17 would also require an additional rust-console package, so that doesn't seem like a good option. Bumping the indicatif dep is probably the best option: https://github.com/huggingface/tokenizers/pull/1867 For the most part, this looks OK, but there are several prerequisites. Fedora has rust-fancy-regex 0.13, but this package needs 0.14. Fedora does not yet have rust-hf-hub, which this package will require. Fedora has rust-getrandom, but not the wasm_js feature that this package requires. However, all of those prerequisites are necessitated by optional features of this package. If we build only the default feature set, none of them should be a concern. I've built a variant of this package that drops optional features, and bumps the indicatif dep version, and that should pass review except for a few duplicate files: Wrote: /builddir/build/SRPMS/rust-tokenizers-0.21.4-1.fc43.src.rpm Wrote: /builddir/build/RPMS/rust-tokenizers+indicatif-devel-0.21.4-1.fc43.noarch.rpm Wrote: /builddir/build/RPMS/rust-tokenizers+esaxx_fast-devel-0.21.4-1.fc43.noarch.rpm Wrote: /builddir/build/RPMS/rust-tokenizers+default-devel-0.21.4-1.fc43.noarch.rpm Wrote: /builddir/build/RPMS/rust-tokenizers+onig-devel-0.21.4-1.fc43.noarch.rpm Wrote: /builddir/build/RPMS/rust-tokenizers+progressbar-devel-0.21.4-1.fc43.noarch.rpm Wrote: /builddir/build/RPMS/rust-tokenizers-devel-0.21.4-1.fc43.noarch.rpm RPM build warnings: File listed twice: /usr/share/cargo/registry/tokenizers-0.21.4/CHANGELOG.md File listed twice: /usr/share/cargo/registry/tokenizers-0.21.4/LICENSE File listed twice: /usr/share/cargo/registry/tokenizers-0.21.4/README.md
Please pull changes from: https://github.com/gordonmessmer/rust-tokenizers
Package Review ============== Legend: [x] = Pass, [!] = Fail, [-] = Not applicable, [?] = Not evaluated Issues: ======= - Package does not contain duplicates in %files. Note: warning: File listed twice: /usr/share/cargo/registry/tokenizers-0.22.1/CHANGELOG.md See: https://docs.fedoraproject.org/en-US/packaging- guidelines/#_duplicate_files This duplicate is normal for rust2rpm. ===== MUST items ===== Generic: [x]: Package is licensed with an open-source compatible license and meets other legal requirements as defined in the legal section of Packaging Guidelines. [x]: License field in the package spec file matches the actual license. [x]: License file installed when any subpackage combination is installed. [x]: %build honors applicable compiler flags or justifies otherwise. [x]: Package contains no bundled libraries without FPC exception. [x]: Changelog in prescribed format. [x]: Sources contain only permissible code or content. [-]: Package contains desktop file if it is a GUI application. [x]: Development files must be in a -devel package [x]: Package uses nothing in %doc for runtime. [x]: Package consistently uses macros (instead of hard-coded directory names). [x]: Package is named according to the Package Naming Guidelines. [x]: Package does not generate any conflict. [x]: Package obeys FHS, except libexecdir and /usr/target. [-]: If the package is a rename of another package, proper Obsoletes and Provides are present. [x]: Requires correct, justified where necessary. [x]: Spec file is legible and written in American English. [-]: Package contains systemd file(s) if in need. [x]: Package is not known to require an ExcludeArch tag. [x]: Package complies to the Packaging Guidelines [x]: Package successfully compiles and builds into binary rpms on at least one supported primary architecture. [x]: Package installs properly. [x]: Rpmlint is run on all rpms the build produces. Note: No rpmlint messages. [x]: If (and only if) the source package includes the text of the license(s) in its own file, then that file, containing the text of the license(s) for the package is included in %license. [x]: The License field must be a valid SPDX expression. [x]: Package requires other packages for directories it uses. [x]: Package must own all directories that it creates. [x]: Package does not own files or directories owned by other packages. [x]: Package uses either %{buildroot} or $RPM_BUILD_ROOT [x]: Package does not run rm -rf %{buildroot} (or $RPM_BUILD_ROOT) at the beginning of %install. [x]: Macros in Summary, %description expandable at SRPM build time. [x]: Dist tag is present. [x]: Permissions on files are set properly. [x]: Package must not depend on deprecated() packages. [x]: Package use %makeinstall only when make install DESTDIR=... doesn't work. [x]: Package is named using only allowed ASCII characters. [x]: Package does not use a name that already exists. [x]: Package is not relocatable. [x]: Sources used to build the package match the upstream source, as provided in the spec URL. [x]: Spec file name must match the spec package %{name}, in the format %{name}.spec. [x]: File names are valid UTF-8. [x]: Large documentation must go in a -doc subpackage. Large could be size (~1MB) or number of files. Note: Documentation size is 0 bytes in 0 files. [x]: Packages must not store files under /srv, /opt or /usr/local ===== SHOULD items ===== Generic: [-]: If the source package does not include license text(s) as a separate file from upstream, the packager SHOULD query upstream to include it. [x]: Final provides and requires are sane (see attachments). [x]: Fully versioned dependency in subpackages if applicable. Note: No Requires: %{name}%{?_isa} = %{version}-%{release} in rust- tokenizers-devel , rust-tokenizers+default-devel , rust- tokenizers+esaxx_fast-devel , rust-tokenizers+indicatif-devel , rust- tokenizers+onig-devel , rust-tokenizers+progressbar-devel [x]: Package functions as described. [x]: Latest version is packaged. [x]: Package does not include license text files separate from upstream. [x]: Patches link to upstream bugs/comments/lists or are otherwise justified. [-]: Sources are verified with gpgverify first in %prep if upstream publishes signatures. Note: gpgverify is not used. [x]: Package should compile and build into binary rpms on all supported architectures. [x]: %check is present and all tests pass. [x]: Packages should try to preserve timestamps of original installed files. [x]: Reviewer should test that the package builds in mock. [x]: Buildroot is not present [x]: Package has no %clean section with rm -rf %{buildroot} (or $RPM_BUILD_ROOT) [x]: No file requires outside of /etc, /bin, /sbin, /usr/bin, /usr/sbin. [x]: Packager, Vendor, PreReq, Copyright tags should not be in spec file [x]: Sources can be downloaded from URI in Source: tag [x]: SourceX is a working URL. [x]: Spec use %global instead of %define unless justified. ===== EXTRA items ===== Generic: [!]: Spec file according to URL is the same as in SRPM. Note: Spec file as given by url is not the same as in SRPM (see attached diff). See: (this test has no URL) [x]: Rpmlint is run on all installed packages. Note: No rpmlint messages. Rpmlint ------- Checking: rust-tokenizers-devel-0.22.1-2.fc44.noarch.rpm rust-tokenizers+default-devel-0.22.1-2.fc44.noarch.rpm rust-tokenizers+esaxx_fast-devel-0.22.1-2.fc44.noarch.rpm rust-tokenizers+indicatif-devel-0.22.1-2.fc44.noarch.rpm rust-tokenizers+onig-devel-0.22.1-2.fc44.noarch.rpm rust-tokenizers+progressbar-devel-0.22.1-2.fc44.noarch.rpm rust-tokenizers-0.22.1-2.fc44.src.rpm ============================ rpmlint session starts ============================ rpmlint: 2.7.0 configuration: /usr/lib/python3.13/site-packages/rpmlint/configdefaults.toml /etc/xdg/rpmlint/fedora-spdx-licenses.toml /etc/xdg/rpmlint/fedora.toml /etc/xdg/rpmlint/scoring.toml /etc/xdg/rpmlint/users-groups.toml /etc/xdg/rpmlint/warn-on-functions.toml rpmlintrc: [PosixPath('/tmp/tmpjd189vgb')] checks: 32, packages: 7 7 packages and 0 specfiles checked; 0 errors, 0 warnings, 37 filtered, 0 badness; has taken 0.2 s Rpmlint (installed packages) ---------------------------- ============================ rpmlint session starts ============================ rpmlint: 2.7.0 configuration: /usr/lib/python3.14/site-packages/rpmlint/configdefaults.toml /etc/xdg/rpmlint/fedora-spdx-licenses.toml /etc/xdg/rpmlint/fedora.toml /etc/xdg/rpmlint/scoring.toml /etc/xdg/rpmlint/users-groups.toml /etc/xdg/rpmlint/warn-on-functions.toml checks: 32, packages: 6 6 packages and 0 specfiles checked; 0 errors, 0 warnings, 33 filtered, 0 badness; has taken 0.1 s Source checksums ---------------- https://crates.io/api/v1/crates/tokenizers/0.22.1/download#/tokenizers-0.22.1.crate : CHECKSUM(SHA256) this package : 6475a27088c98ea96d00b39a9ddfb63780d1ad4cceb6f48374349a96ab2b7842 CHECKSUM(SHA256) upstream package : 6475a27088c98ea96d00b39a9ddfb63780d1ad4cceb6f48374349a96ab2b7842 Requires -------- rust-tokenizers-devel (rpmlib, GLIBC filtered): (crate(ahash/default) >= 0.8.11 with crate(ahash/default) < 0.9.0~) (crate(ahash/serde) >= 0.8.11 with crate(ahash/serde) < 0.9.0~) (crate(aho-corasick/default) >= 1.1.0 with crate(aho-corasick/default) < 2.0.0~) (crate(compact_str/default) >= 0.9.0 with crate(compact_str/default) < 0.10.0~) (crate(compact_str/serde) >= 0.9.0 with crate(compact_str/serde) < 0.10.0~) (crate(dary_heap/default) >= 0.3.6 with crate(dary_heap/default) < 0.4.0~) (crate(dary_heap/serde) >= 0.3.6 with crate(dary_heap/serde) < 0.4.0~) (crate(derive_builder/default) >= 0.20.0 with crate(derive_builder/default) < 0.21.0~) (crate(esaxx-rs) >= 0.1.10 with crate(esaxx-rs) < 0.2.0~) (crate(getrandom/default) >= 0.3.0 with crate(getrandom/default) < 0.4.0~) (crate(itertools/default) >= 0.14.0 with crate(itertools/default) < 0.15.0~) (crate(log/default) >= 0.4.0 with crate(log/default) < 0.5.0~) (crate(macro_rules_attribute/default) >= 0.2.0 with crate(macro_rules_attribute/default) < 0.3.0~) (crate(monostate/default) >= 0.1.12 with crate(monostate/default) < 0.2.0~) (crate(paste/default) >= 1.0.14 with crate(paste/default) < 2.0.0~) (crate(rand/default) >= 0.9.0 with crate(rand/default) < 0.10.0~) (crate(rayon-cond/default) >= 0.4.0 with crate(rayon-cond/default) < 0.5.0~) (crate(rayon/default) >= 1.10.0 with crate(rayon/default) < 2.0.0~) (crate(regex-syntax/default) >= 0.8.0 with crate(regex-syntax/default) < 0.9.0~) (crate(regex/default) >= 1.10.0 with crate(regex/default) < 2.0.0~) (crate(serde/default) >= 1.0.0 with crate(serde/default) < 2.0.0~) (crate(serde/derive) >= 1.0.0 with crate(serde/derive) < 2.0.0~) (crate(serde_json/default) >= 1.0.0 with crate(serde_json/default) < 2.0.0~) (crate(spm_precompiled/default) >= 0.1.3 with crate(spm_precompiled/default) < 0.2.0~) (crate(thiserror/default) >= 2.0.0 with crate(thiserror/default) < 3.0.0~) (crate(unicode-normalization-alignments/default) >= 0.1.0 with crate(unicode-normalization-alignments/default) < 0.2.0~) (crate(unicode-segmentation/default) >= 1.11.0 with crate(unicode-segmentation/default) < 2.0.0~) (crate(unicode_categories/default) >= 0.1.0 with crate(unicode_categories/default) < 0.2.0~) cargo rust-tokenizers+default-devel (rpmlib, GLIBC filtered): cargo crate(tokenizers) crate(tokenizers/esaxx_fast) crate(tokenizers/onig) crate(tokenizers/progressbar) rust-tokenizers+esaxx_fast-devel (rpmlib, GLIBC filtered): (crate(esaxx-rs/cpp) >= 0.1.10 with crate(esaxx-rs/cpp) < 0.2.0~) cargo crate(tokenizers) rust-tokenizers+indicatif-devel (rpmlib, GLIBC filtered): (crate(indicatif/default) >= 0.18.0 with crate(indicatif/default) < 0.19.0~) cargo crate(tokenizers) rust-tokenizers+onig-devel (rpmlib, GLIBC filtered): (crate(onig) >= 6.4.0 with crate(onig) < 7.0.0~) cargo crate(tokenizers) rust-tokenizers+progressbar-devel (rpmlib, GLIBC filtered): cargo crate(tokenizers) crate(tokenizers/indicatif) Provides -------- rust-tokenizers-devel: crate(tokenizers) rust-tokenizers-devel rust-tokenizers+default-devel: crate(tokenizers/default) rust-tokenizers+default-devel rust-tokenizers+esaxx_fast-devel: crate(tokenizers/esaxx_fast) rust-tokenizers+esaxx_fast-devel rust-tokenizers+indicatif-devel: crate(tokenizers/indicatif) rust-tokenizers+indicatif-devel rust-tokenizers+onig-devel: crate(tokenizers/onig) rust-tokenizers+onig-devel rust-tokenizers+progressbar-devel: crate(tokenizers/progressbar) rust-tokenizers+progressbar-devel Diff spec file in url and in SRPM --------------------------------- --- /home/gmessmer/git/fedora/rust-tokenizers/rust-tokenizers.spec 2025-09-22 22:12:13.393956589 +0000 +++ /home/gmessmer/git/fedora/rust-tokenizers/review-rust-tokenizers/srpm-unpacked/rust-tokenizers.spec 2025-09-22 00:00:00.000000000 +0000 @@ -1,2 +1,12 @@ +## START: Set by rpmautospec +## (rpmautospec version 0.8.1) +## RPMAUTOSPEC: autorelease, autochangelog +%define autorelease(e:s:pb:n) %{?-p:0.}%{lua: + release_number = 2; + base_release_number = tonumber(rpm.expand("%{?-b*}%{!?-b:1}")); + print(release_number + base_release_number - 1); +}%{?-e:.%{-e*}}%{?-s:.%{-s*}}%{!?-n:%{?dist}} +## END: Set by rpmautospec + # Generated by rust2rpm 27 %bcond check 1 @@ -151,3 +161,12 @@ %changelog -%autochangelog +## START: Generated by rpmautospec +* Mon Sep 22 2025 Gordon Messmer <gmessmer> - 0.22.1-2 +- Bump indicatif dependency version + +* Mon Sep 22 2025 Gordon Messmer <gmessmer> - 0.22.1-1 +- Update version and hide optional dependencies + +* Mon Sep 22 2025 Gordon Messmer <gmessmer> - 0.21.4-1 +- Import +## END: Generated by rpmautospec Generated by fedora-review 0.10.0 (e79b66b) last change: 2023-07-24 Command line :/usr/bin/fedora-review -n rust-tokenizers Buildroot used: fedora-rawhide-x86_64 Active plugins: Shell-api, Generic Disabled plugins: R, Python, PHP, Haskell, Perl, Java, fonts, SugarActivity, C/C++, Ocaml Disabled flags: EXARCH, EPEL6, EPEL7, DISTTAG, BATCH
Thanks for the review, I've implemented the requested changes (though in the case of fancy-regex, it looks like I could downgrade the dep without issues, so I did that instead). Spec URL: https://gist.github.com/xanderlent/fae314c0c50218ec065a6fcbdc7d2b38/raw/7b3a2a2eae1c9cd35b0311fe6e8d6ce4751347a9/rust-tokenizers.spec SRPM URL: https://gist.github.com/xanderlent/fae314c0c50218ec065a6fcbdc7d2b38/raw/7b3a2a2eae1c9cd35b0311fe6e8d6ce4751347a9/rust-tokenizers-0.22.1-1.fc44.src.rpm See also: rust2rpm.toml: https://gist.github.com/xanderlent/fae314c0c50218ec065a6fcbdc7d2b38/raw/7b3a2a2eae1c9cd35b0311fe6e8d6ce4751347a9/rust2rpm.toml and the patch file it uses: https://gist.github.com/xanderlent/fae314c0c50218ec065a6fcbdc7d2b38/raw/7b3a2a2eae1c9cd35b0311fe6e8d6ce4751347a9/tokenizers-fix-metadata.diff
If you'd like, I was playing around with rust-hf-hub, and it might be possible to add that package as well?
OK, bug 2399619 filed for the hf-hub crate, but that doesn't necessarily block this review.
Updates look good to me.
The Pagure repository was created at https://src.fedoraproject.org/rpms/rust-tokenizers
FEDORA-2025-a853c77995 (rust-tokenizers-0.22.1-1.fc44) has been submitted as an update to Fedora 44. https://bodhi.fedoraproject.org/updates/FEDORA-2025-a853c77995
FEDORA-2025-a853c77995 (rust-tokenizers-0.22.1-1.fc44) has been pushed to the Fedora 44 stable repository. If problem still persists, please make note of it in this bug report.