Bug 2028895

Summary: Review Request: rust-tree-sitter - Rust bindings to the Tree-sitter parsing library
Product: [Fedora] Fedora Reporter: Aleksei Bavshin <alebastr89>
Component: Package ReviewAssignee: Fabio Valentini <decathorpe>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: decathorpe, package-review
Target Milestone: ---Flags: decathorpe: fedora-review+
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: rust-tree-sitter-0.20.2-1.fc37 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 05:40:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2028896, 2028897, 2028899, 2028900    

Description Aleksei Bavshin 2021-12-03 17:10:47 UTC
Spec URL: https://alebastr.fedorapeople.org/review/tree-sitter-cli/rust-tree-sitter.spec
SRPM URL: https://download.copr.fedorainfracloud.org/results/alebastr/rust-tree-sitter-cli/fedora-rawhide-x86_64/02999952-rust-tree-sitter/rust-tree-sitter-0.20.1-2.fc36.src.rpm
Copr build: https://copr.fedorainfracloud.org/coprs/alebastr/rust-tree-sitter-cli/build/2999952/
Description:
Rust bindings to the Tree-sitter parsing library.

Fedora Account System Username: alebastr
Note: generated with `rust2rpm -a`; added upstream license file as required by the MIT license

Comment 1 Fabio Valentini 2021-12-05 11:32:41 UTC
This crate seems to contain a bundled copy of tree-sitter C library? You will need to declare that in the .spec file somehow (and make sure the license is also acceptable).

Additionally, the Rust bindings are generated sources (with bindgen), which is not permissible in Fedora.

Generated sources must be regenerated at build time, but there is neither a build.rs script that supports doing that, nor can we be sure that the generated sources haven't been "tampered" with post-generation.

At any rate, using pre-generated sources from bindgen is a bad idea in general, because they encode architecture-specific details from the machine that bindgen was run on (i.e. running bindgen on x86_64 machine often produces bindings that are wrong or broken on 32-bit architectures, like i686 or arm).

You'll need to ask upstream for a way to regenerate those sources at build time (probably a `build.rs` script with Cargo feature flags to turn regenerating bindings on or off, or `/usr/bin/bindgen` invocation). You can look at other -sys crates how they handle this case (to make sure bindgen is run not only during the create build itself, but also when building the crate in dependent packages, regardless of chosen feature set).

Comment 2 Fabio Valentini 2021-12-05 11:45:42 UTC
Oh, additionally, we need to make sure the C code is actually compiled with Fedora CFLAGS and LDFLAGS. Not sure how to pass those through to the cc crate (it might read CFLAGS etc. from the environment correctly; if that's the case, then %set_build_flags in %build should do it, but that won't be applied in builds of dependent packages, so some other mechanism will need to be used.)

Comment 3 Aleksei Bavshin 2021-12-05 11:48:21 UTC
(In reply to Fabio Valentini from comment #1)
> This crate seems to contain a bundled copy of tree-sitter C library? You
> will need to declare that in the .spec file somehow (and make sure the
> license is also acceptable).

The license is not an issue, everything is coming from the same repository under a common MIT license.
Unbundling the library doesn't seem to be possible, as one of the provided features depends on compiling the library with a different set of macros.  I'll add `bundled(libtree-sitter) = %{version}`.
Btw, do you know if rust-cc is applying our recommended CFLAGS? Do I need to add anything extra for that?

> 
> Additionally, the Rust bindings are generated sources (with bindgen), which
> is not permissible in Fedora.
> 
> Generated sources must be regenerated at build time, but there is neither a
> build.rs script that supports doing that, nor can we be sure that the
> generated sources haven't been "tampered" with post-generation.
> 
> At any rate, using pre-generated sources from bindgen is a bad idea in
> general, because they encode architecture-specific details from the machine
> that bindgen was run on (i.e. running bindgen on x86_64 machine often
> produces bindings that are wrong or broken on 32-bit architectures, like
> i686 or arm).
> 
> You'll need to ask upstream for a way to regenerate those sources at build
> time (probably a `build.rs` script with Cargo feature flags to turn
> regenerating bindings on or off, or `/usr/bin/bindgen` invocation). You can
> look at other -sys crates how they handle this case (to make sure bindgen is
> run not only during the create build itself, but also when building the
> crate in dependent packages, regardless of chosen feature set).

Uh... no, those were pre-generated 3 years ago. I'm not sure we can still call it pre-generated after 3 years of manual edits :(
Which also means there's no way to add bindgen to the upstream build process.

Comment 4 Aleksei Bavshin 2021-12-08 06:25:50 UTC
Spec URL: https://alebastr.fedorapeople.org/review/tree-sitter-cli/rust-tree-sitter.spec
SRPM URL: https://download.copr.fedorainfracloud.org/results/alebastr/rust-tree-sitter-cli/fedora-rawhide-x86_64/03012110-rust-tree-sitter/rust-tree-sitter-0.20.1-3.fc36.src.rpm

Copr Build: https://copr.fedorainfracloud.org/coprs/alebastr/rust-tree-sitter-cli/build/3012110/
Exploded view with patches: https://copr-dist-git.fedorainfracloud.org/cgit/alebastr/rust-tree-sitter-cli/rust-tree-sitter.git/tree/?id=459c0decd4555b1153bf1f791a720cf20d8af48a

Changes:
- Added the `generate-bindings` feature and a patch to make it default (upstream PR is sent[1], waiting for a feedback).
  Invoking /usr/bin/bindgen doesn't really work, as it will create the bindings for a build machine arch, and our rust library packages are all noarch.
- Added %%set_build_flags to verify that the C code compiles with recommended flags.
- Added bundled(tree-sitter) to the `devel` subpackage.

I run upstream unit tests on all our supported arches (except of s390x; LLVM keeps crashing in the copr chroot), and everything passed.
***

> Generated sources must be regenerated at build time.

Actually, no. The Guidelines only *suggest* that[1]. We have a stricter policy for Python, but not for Rust.

[1]: https://github.com/tree-sitter/tree-sitter/pull/1524
[2]: https://docs.fedoraproject.org/en-US/packaging-guidelines/what-can-be-packaged/#_pregenerated_code

Comment 5 Fabio Valentini 2021-12-12 11:31:32 UTC
Thanks, the PR for upstream looks good.

The PR for the rust macros to always set CFLAGS etc. has also been merged, I think we can expect a new release with that functionality soon.

Comment 6 Fabio Valentini 2022-02-22 15:54:29 UTC
Can you update the package to the latest version?
With rust2rpm 21, everything should be ready otherwise, I think ...

Comment 7 Aleksei Bavshin 2022-02-24 06:16:34 UTC
Spec URL: https://alebastr.fedorapeople.org/review/tree-sitter-cli/rust-tree-sitter.spec
SRPM URL: https://download.copr.fedorainfracloud.org/results/alebastr/rust-tree-sitter-cli/fedora-rawhide-x86_64/03540803-rust-tree-sitter/rust-tree-sitter-0.20.2-1.fc37.src.rpm

0.20.2 is not the latest version, but 0.20.3+ generates parser code incompatible with tree-sitter-0.20.1 we have in rawhide and fails the test suite. I'll have to coordinate further updates with the C library package.

Comment 8 Fabio Valentini 2022-02-24 14:38:55 UTC
Looks good to me, with one exception:
I just noticed the included tree-sitter C library code bundles parts of icu,
so you'll need to add "Provides: bundled(libicu) = version" to the -devel package, as well.

The ICU license file is already included, but you'll need to fix the license tag to incorporate the license of the bundled ICU sources, which seems to be a version of the "Unicode" license: https://fedoraproject.org/wiki/Licensing/Unicode (so it should probably be "License: MIT and Unicode").

Comment 9 Fabio Valentini 2022-02-24 15:40:18 UTC
Package was generated with rust2rpm, simplifying the review.

- package builds and installs without errors on rawhide
- test suite is run and all unit tests pass (there are zero tests)
- latest (possible) version of the crate is packaged
- license matches upstream specification (MIT + Unicode) and is acceptable for Fedora
- license files are included with %license in %files
- package complies with Rust Packaging Guidelines

Package APPROVED.

===

Recommended post-import rust-sig tasks:

- add @rust-sig with "commit" access as package co-maintainer

- set bugzilla assignee overrides to @rust-sig (optional)

- set up package on release-monitoring.org:
  project: $crate
  homepage: https://crates.io/crates/$crate
  backend: crates.io
  version scheme: semantic
  version filter: alpha;beta;rc;pre
  distro: Fedora
  Package: rust-$crate

- track package in koschei for all built branches

Comment 10 Gwyn Ciesla 2022-02-24 20:45:48 UTC
(fedscm-admin):  The Pagure repository was created at https://src.fedoraproject.org/rpms/rust-tree-sitter