Bug 2358292 - Review Request: rust-icu_locid_transform_data - Data for the icu_locid_transform crate
Summary: Review Request: rust-icu_locid_transform_data - Data for the icu_locid_transf...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: Package Review
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Fabio Valentini
QA Contact: Fedora Extras Quality Assurance
URL: https://crates.io/crates/icu_locid_tr...
Whiteboard:
Depends On:
Blocks: 2358507
TreeView+ depends on / blocked
 
Reported: 2025-04-08 15:34 UTC by Ben Beasley
Modified: 2025-04-21 16:45 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2025-04-09 15:27:57 UTC
Type: ---
Embargoed:
decathorpe: fedora-review+


Attachments (Terms of Use)

Description Ben Beasley 2025-04-08 15:34:27 UTC
Spec URL: https://music.fedorapeople.org/rust-icu_locid_transform.spec
SRPM URL: https://music.fedorapeople.org/rust-icu_locid_transform-1.5.1-1.fc41.src.rpm
Description: Data for the icu_locid_transform crate.
Fedora Account System Username: music

This is part of the ICU4X stack (version 1.5).

https://release-monitoring.org/project/377614/

Comment 1 Fedora Review Service 2025-04-08 15:34:45 UTC
Cannot find any valid SRPM URL for this ticket. Common causes are:

- You didn't specify `SRPM URL: ...` in the ticket description
  or any of your comments
- The URL schema isn't HTTP or HTTPS
- The SRPM package linked in your URL doesn't match the package name specified
  in the ticket summary


---
This comment was created by the fedora-review-service
https://github.com/FrostyX/fedora-review-service

If you want to trigger a new Copr build, add a comment containing new
Spec and SRPM URLs or [fedora-review-service-build] string.

Comment 3 Fedora Review Service 2025-04-08 15:57:51 UTC
Copr build:
https://copr.fedorainfracloud.org/coprs/build/8874478
(succeeded)

Review template:
https://download.copr.fedorainfracloud.org/results/@fedora-review/fedora-review-2358292-rust-icu_locid_transform_data/fedora-rawhide-x86_64/08874478-rust-icu_locid_transform_data/fedora-review/review.txt

Please take a look if any issues were found.


---
This comment was created by the fedora-review-service
https://github.com/FrostyX/fedora-review-service

If you want to trigger a new Copr build, add a comment containing new
Spec and SRPM URLs or [fedora-review-service-build] string.

Comment 4 Fabio Valentini 2025-04-08 18:15:23 UTC
Hm, the data/* files exclusively contain generated code, not "data" per se.

Not sure *what* they're generated from, but it doesn't look like the code for generating these files is part of the published crate,
which would be a violation of the rules here:
https://docs.fedoraproject.org/en-US/packaging-guidelines/what-can-be-packaged/#pregenerated-code

And I'm a little bit unsure whether we've already encountered issues with endianness in API calls like this one?

> zerovec::ZeroVec::from_bytes_unchecked(b"am\0ar\0as\0balbe\0bg\0bgcbhobn\0brxchrcswcv...")

Comment 5 Ben Beasley 2025-04-08 19:44:44 UTC
(In reply to Fabio Valentini from comment #4)
> Hm, the data/* files exclusively contain generated code, not "data" per se.

True, it *is* generated data (byte buffers), but wrapped up in generated boilerplate code.

> Not sure *what* they're generated from, but it doesn't look like the code
> for generating these files is part of the published crate,
> which would be a violation of the rules here:
> https://docs.fedoraproject.org/en-US/packaging-guidelines/what-can-be-
> packaged/#pregenerated-code

As far as I can tell, this is

https://github.com/unicode-org/icu4x/blob/58e7b89140dd95dfc778b1bc34d88abefe598208/Makefile.toml#L115

[tasks.ci-job-full-datagen]
description = "Run full data generation on latest CLDR and ICU"
category = "CI"
dependencies = [
    "bakeddata-check",
]

which refers to

https://github.com/unicode-org/icu4x/blob/58e7b89140dd95dfc778b1bc34d88abefe598208/tools/make/data.toml#L119

[tasks.bakeddata-check]
description = "Rebuild baked data and ensure that the working copy is clean"
category = "ICU4X Data"
dependencies = ["bakeddata"]
script_runner = "@duckscript"
script = '''
exit_on_error true

output = exec git status --porcelain=v1
output_length = length ${output.stdout}
if greater_than ${output_length} 0
    msg = array "" ""
    array_push ${msg} "Baked data needs to be updated. Please run `cargo make bakeddata`"
    array_push ${msg} ""
    array_push ${msg} "${output.stdout}"
    msg = array_join ${msg} "\n"
    trigger_error ${msg}
end

which refers to

https://github.com/unicode-org/icu4x/blob/58e7b89140dd95dfc778b1bc34d88abefe598208/tools/make/data.toml#L102

[tasks.bakeddata]
description = "Builds full baked data"
category = "ICU4X Data"
script_runner = "@duckscript"
script = '''
exit_on_error true

if array_is_empty ${@}
    exec --fail-on-error cargo run -p bakeddata-scripts --release
else
    exec --fail-on-error cargo build -p bakeddata-scripts
    for component in ${@}
        exec --fail-on-error target/debug/bakeddata-scripts "${component}"
    end
end
'''

which relies on

https://github.com/unicode-org/icu4x/tree/release/1.5/tools/bakeddata-scripts

which uses

https://github.com/unicode-org/icu4x/tree/release/1.5/provider/datagen

https://crates.io/crates/icu_datagen

The bits of generated code appear to be scattered across

https://github.com/unicode-org/icu4x/blob/release/1.5/provider/datagen/src/baked_exporter.rs

They aren’t immediately recognizable there because most of each generated block of code is templated-in names, but the comments and constructs I spot-checked seemed to be present.

The data is ultimately encoded into buffers with databake, https://src.fedoraproject.org/rpms/rust-databake.

I think it’s reasonable to argue that the generated code is actually all boilerplate from the code generator (icu_datagen+databake), and doesn’t have its own sources specific to this crate. In https://docs.fedoraproject.org/en-US/packaging-guidelines/what-can-be-packaged/#pregenerated-code, this is similar to the bison example: bison consumes a rules file (which must be in the source RPM) and produces C sources, which contain a lot of extra boilerplate code from bison itself, but the version of bison that did the generating doesn’t have to be in the source RPM, or even in Fedora, although it’s better if it can be.

The origin of the data in the buffers is another matter. According to

https://github.com/unicode-org/icu4x/blob/release/1.5/tutorials/data-management.md

“Data generation is done using the icu_datagen crate, which pulls in data from Unicode's Common Locale Data Repository (CLDR) and from ICU4C releases to generate ICU4X data. The crate has a command line interface as well as a Rust API, which can be used in Rust scripts.”

So the ultimate sources for the data are somewhere in

https://cldr.unicode.org/index/downloads

Looking at

https://github.com/unicode-org/icu4x/blob/release/1.5/provider/datagen/src/provider.rs

it appears that there could also be data from an ICU(4C) release, e.g.

https://github.com/unicode-org/icu/releases/download/release-77-1/icuexportdata_release-77-1.zip

I’m not sure how practical it is to follow the breadcrumb trails to associate particular Unicode data files with particular baked data buffers. Given README.md attributes the CLDR and ICU versions,

  This data was generated with CLDR version 46.0.0-BETA2, ICU version icu4x/2024-05-16/75.x, and
  LSTM segmenter version v0.1.0.

I suppose it should suffice to add the following as additional Sources to cover the requirement to include original sources for the data:

https://github.com/unicode-org/icu/releases/download/icu4x%2F2024-05-16%2F75.x/icuexportdata_icu4x-2024-05-16-75.x.zip
https://github.com/unicode-org/cldr-json/releases/download/46.0.0-BETA2/cldr-46.0.0-BETA2-json-full.zip

This is a bit bulky, but I don’t know what else to do.

Whatever we decide to do with this crate, we’ll need to repeat the exercise several times. There are 13 "icu_*_data" crates in ICU4X.

> 
> And I'm a little bit unsure whether we've already encountered issues with
> endianness in API calls like this one?
> 
> > zerovec::ZeroVec::from_bytes_unchecked(b"am\0ar\0as\0balbe\0bg\0bgcbhobn\0brxchrcswcv...")

There was https://github.com/unicode-org/icu4x/issues/6292, but that was with rkyv, not zerovec/databake. To be honest, I’m not sure how to check for endianness issues here other than running the tests, which do pass on s390x (https://copr.fedorainfracloud.org/coprs/music/idna1/build/8874479/).

Comment 6 Fabio Valentini 2025-04-09 11:48:25 UTC
Ok, sleeping on it for a night, I think the most sensible and productive thing to do here is to classify the contents of the files in data/macros/* as data, i.e. machine-readable data that is just in a very peculiar format.

I don't think it makes sense to classify it as "code" since the purpose of those files is clearly to attach data to code, not include executable code - there is only a trivial amount of Rust code in these files, and it is almost exclusively wrapper code to set up data structures to load / store the embedded data.

It might be a good idea to document this somewhere.

===

Package was generated with rust2rpm, simplifying the review.

✅ package contains only permissible content
✅ package builds and installs without errors on rawhide
✅ test suite is run and all unit tests pass
✅ latest version of the crate is packaged
✅ license matches upstream specification and is acceptable for Fedora
✅ license file is included with %license in %files
✅ package complies with Rust Packaging Guidelines

Package APPROVED.

===

Recommended post-import rust-sig tasks:

- set up package on release-monitoring.org:
  project: $crate
  homepage: https://crates.io/crates/$crate
  backend: crates.io
  version scheme: semantic
  version filter (*NOT* pre-release filter): alpha;beta;rc;pre
  distro: Fedora
  Package: rust-$crate

- set bugzilla assignee overrides to @rust-sig (optional)

Comment 7 Ben Beasley 2025-04-09 14:31:53 UTC
Thank you for the review. I will document the rationale for the handling of data files and add links to this review. I will probably do so in rust2rpm.toml, since that is where people are most likely to actually look.

Comment 8 Fedora Admin user for bugzilla script actions 2025-04-09 14:33:32 UTC
The Pagure repository was created at https://src.fedoraproject.org/rpms/rust-icu_locid_transform_data

Comment 9 Fedora Update System 2025-04-09 15:23:30 UTC
FEDORA-2025-cc9ca7b487 (rust-icu_locid_transform_data-1.5.1-1.fc43) has been submitted as an update to Fedora 43.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-cc9ca7b487

Comment 10 Fedora Update System 2025-04-09 15:27:57 UTC
FEDORA-2025-cc9ca7b487 (rust-icu_locid_transform_data-1.5.1-1.fc43) has been pushed to the Fedora 43 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 11 Fedora Update System 2025-04-09 17:38:28 UTC
FEDORA-2025-04847cb65d (rust-icu_collections-1.5.0-3.fc42, rust-icu_locid-1.5.0-2.fc42, and 10 more) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-04847cb65d

Comment 12 Fedora Update System 2025-04-09 17:38:42 UTC
FEDORA-2025-cd87acc644 (rust-icu_collections-1.5.0-3.fc41, rust-icu_locid-1.5.0-2.fc41, and 10 more) has been submitted as an update to Fedora 41.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-cd87acc644

Comment 13 Fedora Update System 2025-04-09 17:39:00 UTC
FEDORA-2025-e923d51676 (rust-icu_collections-1.5.0-3.fc40, rust-icu_locid-1.5.0-2.fc40, and 10 more) has been submitted as an update to Fedora 40.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-e923d51676

Comment 14 Fedora Update System 2025-04-09 17:41:00 UTC
FEDORA-EPEL-2025-3fd4fbc045 (rust-atoi-2.0.0-2.el10_1, rust-detone-1.0.0-4.el10_1, and 15 more) has been submitted as an update to Fedora EPEL 10.1.
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2025-3fd4fbc045

Comment 15 Fedora Update System 2025-04-09 17:41:16 UTC
FEDORA-EPEL-2025-c872f3c0bf (rust-atoi-2.0.0-2.el10_0, rust-detone-1.0.0-4.el10_0, and 16 more) has been submitted as an update to Fedora EPEL 10.0.
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2025-c872f3c0bf

Comment 16 Fedora Update System 2025-04-09 17:43:10 UTC
FEDORA-EPEL-2025-ef545b8853 (rust-atoi-2.0.0-2.el9, rust-icu_collections-1.5.0-3.el9, and 11 more) has been submitted as an update to Fedora EPEL 9.
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2025-ef545b8853

Comment 17 Fedora Update System 2025-04-10 00:22:14 UTC
FEDORA-EPEL-2025-ef545b8853 has been pushed to the Fedora EPEL 9 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2025-ef545b8853

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 18 Fedora Update System 2025-04-10 02:07:51 UTC
FEDORA-2025-04847cb65d has been pushed to the Fedora 42 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf install --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-04847cb65d \*`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-04847cb65d

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 19 Fedora Update System 2025-04-10 03:32:19 UTC
FEDORA-2025-cd87acc644 has been pushed to the Fedora 41 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf install --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-cd87acc644 \*`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-cd87acc644

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 20 Fedora Update System 2025-04-10 04:09:24 UTC
FEDORA-2025-e923d51676 has been pushed to the Fedora 40 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf install --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-e923d51676 \*`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-e923d51676

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 21 Fedora Update System 2025-04-10 04:20:45 UTC
FEDORA-EPEL-2025-3fd4fbc045 has been pushed to the Fedora EPEL 10.1 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2025-3fd4fbc045

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 22 Fedora Update System 2025-04-10 04:21:02 UTC
FEDORA-EPEL-2025-c872f3c0bf has been pushed to the Fedora EPEL 10.0 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2025-c872f3c0bf

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 23 Fedora Update System 2025-04-11 00:42:18 UTC
FEDORA-EPEL-2025-ef545b8853 (rust-atoi-2.0.0-2.el9, rust-icu_collections-1.5.0-3.el9, and 11 more) has been pushed to the Fedora EPEL 9 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 24 Fedora Update System 2025-04-13 02:39:48 UTC
FEDORA-2025-04847cb65d has been pushed to the Fedora 42 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-04847cb65d`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-04847cb65d

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 25 Fedora Update System 2025-04-13 03:13:39 UTC
FEDORA-EPEL-2025-3fd4fbc045 has been pushed to the Fedora EPEL 10.1 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2025-3fd4fbc045

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 26 Fedora Update System 2025-04-13 03:17:02 UTC
FEDORA-2025-cd87acc644 has been pushed to the Fedora 41 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-cd87acc644`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-cd87acc644

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 27 Fedora Update System 2025-04-13 03:24:55 UTC
FEDORA-EPEL-2025-c872f3c0bf has been pushed to the Fedora EPEL 10.0 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2025-c872f3c0bf

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 28 Fedora Update System 2025-04-20 04:21:47 UTC
FEDORA-2025-04847cb65d (python-pydantic-core-2.27.2-5.fc42, rust-adblock-0.9.6-1.fc42, and 28 more) has been pushed to the Fedora 42 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 29 Fedora Update System 2025-04-21 00:31:18 UTC
FEDORA-EPEL-2025-3fd4fbc045 (python-pydantic-core-2.23.4-2.el10_1, rust-adblock-0.9.6-1.el10_1, and 29 more) has been pushed to the Fedora EPEL 10.1 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 30 Fedora Update System 2025-04-21 00:52:16 UTC
FEDORA-EPEL-2025-c872f3c0bf (rust-atoi-2.0.0-2.el10_0, rust-detone-1.0.0-4.el10_0, and 21 more) has been pushed to the Fedora EPEL 10.0 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 31 Fedora Update System 2025-04-21 01:39:44 UTC
FEDORA-2025-e923d51676 (python-pydantic-core-2.20.1-3.fc40, rust-adblock-0.9.6-1.fc40, and 28 more) has been pushed to the Fedora 40 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 32 Fedora Update System 2025-04-21 16:45:37 UTC
FEDORA-2025-cd87acc644 (python-pydantic-core-2.27.2-5.fc41, rust-adblock-0.9.6-1.fc41, and 28 more) has been pushed to the Fedora 41 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.