Bug 2358524 - Review Request: rust-spm_precompiled - SentencePiece's DoubleArray and it's Normalizer, implemented in Rust
Summary: Review Request: rust-spm_precompiled - SentencePiece's DoubleArray and it's N...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: Package Review
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Fabio Valentini
QA Contact: Fedora Extras Quality Assurance
URL: https://crates.io/crates/spm_precompiled
Whiteboard:
Depends On:
Blocks: 2358553
TreeView+ depends on / blocked
 
Reported: 2025-04-09 00:46 UTC by Alexander Lent
Modified: 2025-05-31 17:17 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2025-05-31 17:17:00 UTC
Type: ---
Embargoed:
decathorpe: fedora-review+


Attachments (Terms of Use)
The .spec file difference from Copr build 8881500 to 9065005 (1.69 KB, patch)
2025-05-20 04:13 UTC, Fedora Review Service
no flags Details | Diff

Description Alexander Lent 2025-04-09 00:46:08 UTC
Spec URL: https://gist.githubusercontent.com/xanderlent/bb2e75d150686c1b54880d1258bb5d81/raw/8cc1196ddde0a35e5c509cb003e8e9e8aa479f00/rust-spm_precompiled.spec
SRPM URL: https://gist.github.com/xanderlent/bb2e75d150686c1b54880d1258bb5d81/raw/8cc1196ddde0a35e5c509cb003e8e9e8aa479f00/rust-spm_precompiled-0.1.4-1.fc43.src.rpm
Description: This crate aims to emulate https://github.com/google/sentencepiece Dart::DoubleArray struct and it's Normalizer.  This crate is highly specialized and not intended for general use.
Fedora Account System Username: xanderlent

This is the first of several packages that rust-tokenizers (to be submitted) will depend on. It is a good starting point since it is self-contained. Note that it does not contain precompiled code, rather it parses a precompiled data structure for normalization/tokenization in a specialized format.

Comment 1 Alexander Lent 2025-04-09 01:06:14 UTC
See Also the rust2rpm.toml file used to generate the specfile: https://gist.githubusercontent.com/xanderlent/bb2e75d150686c1b54880d1258bb5d81/raw/8cc1196ddde0a35e5c509cb003e8e9e8aa479f00/rust2rpm.toml

Comment 2 Fedora Review Service 2025-04-09 15:05:08 UTC
Copr build:
https://copr.fedorainfracloud.org/coprs/build/8881500
(succeeded)

Review template:
https://download.copr.fedorainfracloud.org/results/@fedora-review/fedora-review-2358524-rust-spm_precompiled/fedora-rawhide-x86_64/08881500-rust-spm_precompiled/fedora-review/review.txt

Please take a look if any issues were found.


---
This comment was created by the fedora-review-service
https://github.com/FrostyX/fedora-review-service

If you want to trigger a new Copr build, add a comment containing new
Spec and SRPM URLs or [fedora-review-service-build] string.

Comment 3 Fabio Valentini 2025-05-10 15:36:08 UTC
Taking this review - some initial comments:

1. This crate ships base64-encoded binary data that I can't really tell the origin of (test.json). It looks like that is only used as input for tests - at the very least, it should be excluded from built packages.

2. The dependency on base64 v0.13 is very outdated, it would be good to get this updated to v0.22 upstream (and in this package).

3. There are multiple typos in the crate's description and documentation (it should be "its Normalizer", not "it's Normalizer"), would be great to get that fixed upstream too :)

4. The "precompiled" in the name initially threw me off, but it looks like this crate actually doesn't contain any pre-compiled stuff, it is used for *handling* precompiled data of some sort?

Comment 4 Alexander Lent 2025-05-20 04:06:06 UTC
Spec URL: https://gist.githubusercontent.com/xanderlent/bb2e75d150686c1b54880d1258bb5d81/raw/f34e107e73eb67db00231e9c6547467de88ba0df/rust-spm_precompiled.spec
SRPM URL: https://gist.github.com/xanderlent/bb2e75d150686c1b54880d1258bb5d81/raw/f34e107e73eb67db00231e9c6547467de88ba0df/rust-spm_precompiled-0.1.4-1.fc43.src.rpm

Thanks for the review! This is definitely an unusual package.

1. I've tried to exclude the tests from the final RPM. Please take a look.

2. The package hasn't been updated in three years :-(, but someone developed a patch this year, which I've integrated:
https://github.com/huggingface/spm_precompiled/pull/4

3. I agree about the grammar. I'll prepare a more comprehensive changeset (for ex, fixing README.md or tweaking Cargo.toml so that rust2rpm can summarize it) if you like the direction so far.

4. It seems like the "precompiled" phrase comes from the fact that this Rust implementation parses the "precompiled_charsmap" generated by Google's SentencePiece library: https://github.com/google/sentencepiece/blob/273449044caa593c2fd7eb7550cb3ab2cff93f1a/src/sentencepiece_model.proto#L252

Comment 5 Fedora Review Service 2025-05-20 04:13:39 UTC
Created attachment 2090842 [details]
The .spec file difference from Copr build 8881500 to 9065005

Comment 6 Fedora Review Service 2025-05-20 04:13:41 UTC
Copr build:
https://copr.fedorainfracloud.org/coprs/build/9065005
(succeeded)

Review template:
https://download.copr.fedorainfracloud.org/results/@fedora-review/fedora-review-2358524-rust-spm_precompiled/fedora-rawhide-x86_64/09065005-rust-spm_precompiled/fedora-review/review.txt

Please take a look if any issues were found.


---
This comment was created by the fedora-review-service
https://github.com/FrostyX/fedora-review-service

If you want to trigger a new Copr build, add a comment containing new
Spec and SRPM URLs or [fedora-review-service-build] string.

Comment 7 Fabio Valentini 2025-05-31 14:08:13 UTC
Thank you, this looks good now!

> %exclude %{crate_instdir}/test.json
> %exclude %{crate_instdir}/src/tests.rs

Note that this works *now*, but if there will ever be a new release published for this crate, excluding the tests.rs file this way *might* break things if it's explicitly referenced in the Cargo.toml file. For now, it's fine though.

===

Package was generated with rust2rpm, simplifying the review.

✅ package contains only permissible content
✅ package builds and installs without errors on rawhide
✅ test suite is run and all unit tests pass
✅ latest version of the crate is packaged
✅ license matches upstream specification and is acceptable for Fedora
✅ license file is included with %license in %files
✅ package complies with Rust Packaging Guidelines

Package APPROVED.

===

Recommended post-import rust-sig tasks:

- set up package on release-monitoring.org:
  project: $crate
  homepage: https://crates.io/crates/$crate
  backend: crates.io
  version scheme: semantic
  version filter (*NOT* pre-release filter): alpha;beta;rc;pre
  distro: Fedora
  Package: rust-$crate

- set bugzilla assignee overrides to @rust-sig (optional)

Comment 8 Fedora Admin user for bugzilla script actions 2025-05-31 16:47:24 UTC
The Pagure repository was created at https://src.fedoraproject.org/rpms/rust-spm_precompiled

Comment 9 Fedora Update System 2025-05-31 17:14:30 UTC
FEDORA-2025-2a176ffa35 (rust-spm_precompiled-0.1.4-1.fc43) has been submitted as an update to Fedora 43.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-2a176ffa35

Comment 10 Fedora Update System 2025-05-31 17:17:00 UTC
FEDORA-2025-2a176ffa35 (rust-spm_precompiled-0.1.4-1.fc43) has been pushed to the Fedora 43 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.