Upstream has plans[0] for adding a dependency on PyArrow in the next major release (3.0). With `libarrow`[1] excluding `%{ix86}` and `%{arm}` we need to follow suit. Since a lot of packages depend on pandas, we should set that in motion now. Even if upstream should decide not to depend on PyArrow, this effort is still worth it as a continuation of the i686 leaf removal effort[2] from a view release back. Besides, upstream has dropped 32-bit support in pandas a while back. Thus, this brings us in line with upstream. [0] https://github.com/pandas-dev/pandas/issues/54466 [1] https://src.fedoraproject.org/rpms/libarrow [2] https://fedoraproject.org/wiki/Changes/EncourageI686LeafRemoval Reproducible: Always
Here is the list of packages directly depending on pandas: R-reticulate arbor cantera mlpack openms pyproj python-SALib python-arviz python-astropy python-bioframe python-bluepyopt python-contextualbandits python-cro python-dask python-devicely python-ephyviewer python-fastavro python-formulaic python-geopandas python-geoplot python-hass-data-detective python-hdfs python-hdmf python-hypothesis python-imbalanced-learn python-jsonpickle python-libpysal python-lsp-server python-mapclassify python-mizani python-mne python-mplcursors python-netpyne python-neurom python-neurosynth python-niaaml python-niaarm python-niaclass python-niapy python-nilearn python-opfunu python-pandas-datareader python-pandas-flavor python-param python-partd python-patsy python-pingouin python-plotly python-plotnine python-probeinterface python-pybids python-pynwb python-pyongc python-pypet python-pytest-arraydiff python-pytest-harvest python-pytest-regressions python-pytest-steps python-rapidfuzz python-scikit-uplift python-sciunit python-seaborn python-sklearn-genetic-opt python-sklearn-nature-inspired-algorithms python-sport-activities-features python-statsmodels python-succulent python-tabulate python-xarray python-yfinance root rpy snakemake stats-collect wult I'll go through them to figure out which are leaf packages not yet excluding 32-bit arches and file bugs/PRs for them, working my way up.
The leaf removal will have to be done from the leaf side of things going up the dep tree, so this could touch quite a few packages. It's maybe worth putting through the change process to ensure everyone knows about this? Also maybe worth explicitly emailing the python-sig ML to ensure they're aware of this?
According to https://fedoraproject.org/wiki/Changes/EncourageI686LeafRemoval, the removal can be done with a minimum of fuss. I don't think it makes sense to go through the Change process, because that would delay the whole procedure quite a bit. I'd say just write a heads-up to fedora-devel with all the maintainers of packages that will be affected in CC, wait a week, and do the deed. If there are dependent packages that FTBFS, pull requests can be filed, and proven packager privileges can be used to merge them.
Sounds good, just as long as the community and package maintainers are clearly aware of this. I think it'll still require the generation of a recursive dependency tree and its traversal from the leaves up towards the root (pandas). I.e., it won't be enough to only look at direct dependencies of pandas. Right?
Thanks for the input. I will send a mail out to the devel and python lists, so package maintainers are aware. I'll probably do this in batches for packages that are leaves in the dependency tree. And, yes, we will have to go through the dependency tree recursively, working our way in (or up). Currently the list of dependent packages is split almost equal between leaves (34) and non-leaves (36). Of the leaf packages I have yet to look up which packages may already exclude 32-bit arches. I can post both lists here if people are interested.
Sure would be nice if https://pagure.io/koji/issue/3809 could happen. A lot of the dependent packages are noarch. Quoting from https://github.com/pandas-dev/pandas/issues/54466#issuecomment-1941380678, > […] future mandatory pyarrow dependency will not imply current pyarrow package but the new pyarrow-core (libarrow only) or pyarrow-base (libarrow and libparquet only), that will be published for the first time in a matter of weeks. I don’t think this affects the likely need to exclude 32-bit architectures in the future, but it’s useful context.
I think you are correct. The split of pyarrow in smaller sub packages will still require us to drop 32-bit support in pandas since libarrow in Fedora already excludes those. Regarding koji, I will add a comment to the ticket announcing that we will ExcludeArch a great number of noarch packages. Hopefully there will be a solution to the noarch builds on i686 issue soon. If not, should we consider the use of ExclusiveArch? That's probably opening another can of worms and might require us to tackle noarch packages that already make use of ExcludeArch.
(In reply to Sandro from comment #7) > If not, should we consider the use of ExclusiveArch? That's probably opening > another can of worms and might require us to tackle noarch packages that > already make use of ExcludeArch. What would ExclusiveArch do that ExcludeArch wouldn’t? (Other than make life harder for people working on things like RISC-V?)
I thought it would allow building only on the specified arches. Though, my understanding might be wrong. But then I fail to fully understand this example[1]: ``` BuildArch: noarch # List the arches that the dependent package builds on below ExclusiveArch: %{ix86} %{arm} x86_64 noarch ``` Going by the comment, I thought this dictates what arches to use for building. Thus excluding noarch from ExclusiveArch and have it use only 64-bit arches/builders might do the trick. [1] https://docs.fedoraproject.org/en-US/packaging-guidelines/#_arch_specific_runtime_and_build_time_dependencies
(In reply to Sandro from comment #9) > I thought it would allow building only on the specified arches. Though, my > understanding might be wrong. But then I fail to fully understand this > example[1]: > > ``` > BuildArch: noarch > # List the arches that the dependent package builds on below > ExclusiveArch: %{ix86} %{arm} x86_64 noarch > ``` > > Going by the comment, I thought this dictates what arches to use for > building. Thus excluding noarch from ExclusiveArch and have it use only > 64-bit arches/builders might do the trick. > > [1] > https://docs.fedoraproject.org/en-US/packaging-guidelines/ > #_arch_specific_runtime_and_build_time_dependencies I think you are correct about what this does, but I also think that if i686, x86_64, ppc64le, aarch64, and s390x are the only possible architectures (that is, ignoring alternative architectures), then > BuildArch: noarch > ExclusiveArch: x86_64 ppc64le aarch64 s390x noarch and > BuildArch: noarch > ExcludeArch: %{ix86} express the same thing, but the latter better represents the *intent* of excluding 32-bit architectures, and doesn’t have to be modified to allow new 64-bit architectures in the future.
(In reply to Ben Beasley from comment #10) > express the same thing, but the latter better represents the *intent* of > excluding 32-bit architectures, and doesn’t have to be modified to allow new > 64-bit architectures in the future. I agree. With Koji honoring ExcludeArch for selecting builders all would be fine. I thought with using ExclusiveArch, preferably with a macro that defines 64-bit arches not including noarch, koji might be wrangled into submission. Of course, fixing the way Koji selects builders for noarch packages is the preferred solution.
(In reply to Sandro from comment #11) > I agree. With Koji honoring ExcludeArch for selecting builders all would be > fine. I thought with using ExclusiveArch, preferably with a macro that > defines 64-bit arches not including noarch, koji might be wrangled into > submission. Of course, fixing the way Koji selects builders for noarch > packages is the preferred solution. So, forgive me if you’re not confused or misled in the way I think you might be, but > BuildArch: noarch > ExcludeArch: %{ix86} does work today in the way you would expect. The problem is that it shouldn’t be needed: noarch packages *should not* block arched packages from dropping i686. A lot of arched packages would be leaf packages, if not for a few dozen noarch Python packages that depend on them. Because noarch packages *could* be assigned to an i686 builder, their dependencies have to maintain i686 support, or they’ll FTBFS randomly. But adding ExcludeArch to all those noarch packages is tedious, so it tends not to happen, and so everything ends up having to keep i686 support. If noarch packages were just never built on i686 by policy, then none of that would be necessary, and we would only have to worry about arched dependent packages. That’s what https://pagure.io/koji/issue/3809 is about.
I recently had to restore i686 support in python-fastapi and python-opentelemetry-contrib even though they were > BuildArch: noarch > ExcludeArch: %{ix86} for more than a year, because a new noarch package python-sentry-sdk started depending on them, and then some arched packages started depending on python-sentry-sdk, and it was easier to remove the ExcludeArch than to start sending PR’s to all the dependent packages. I’m not sure what it would take to prevent that situation.
(In reply to Ben Beasley from comment #12) > The problem is that it shouldn’t be needed: noarch packages *should not* > block arched packages from dropping i686. Thank you! Honestly, my brain is not firing on all cylinders today. I might be getting sick. We'll see tomorrow. I need to step away from the keyboard for a while and give my brain some rest. I'm not joking. > So, forgive me if you’re not confused or misled in the way I think you might be, but You could only half imagine the ways in which I get confused and misled. But you were spot on, I was. I appreciate you helping me to get back on track. I really do. And now I will call it a day, have an early night and, if my past experience is anything to go by, wake up tomorrow with a fever and what not. 🥵
(In reply to Ben Beasley from comment #6) > Sure would be nice if https://pagure.io/koji/issue/3809 could happen. A lot > of the dependent packages are noarch. Indeed, it would. It seems there is some movement. With the premise that we only need to worry about arched packages, the list of packages turns out to be rather small: mlpack python-contextualbandits python-devicely python-statsmodels python-reproject I'll double check my results and start submitting PRs.
Unfortunately, I think there’s more than that, including arched packages with noarch binary packages like python-dask, which build on every architecture, and packages that have indirect dependencies on python-pandas via noarch packages. I started with “fedrq wrsrc python-pandas.” (Usually wrsrc -s is nicer, but I wanted to make sure to capture packages that only had runtime dependencies on pandas.) The first entry is R-reticulate, https://src.fedoraproject.org/rpms/R-reticulate/blob/rawhide/f/R-reticulate.spec. It has no BuildArch/ExcludeArch/ExclusiveArch, so it’s a normal arched package that builds everywhere, and it’s therefore affected. It has its own tree of dependent packages: both R-rsconnect and R-sessioninfo depend on it. /→ R-rsconnect → ??? R-reticulate < \→ R-sessioninfo → ??? R-rsconnect and R-sessioninfo are noarch, so they are OK, but what if an arched package depends on pandas indirectly *via* R-rsconnect? R-pkgdown depends on R-rsconnect, but it is also noarch. R-devtools and R-lobstr depend on R-pkgdown: R-devtools is noarch, but R-lobstr is not, so it is potentially impacted. And the arched package R-dplyr depends on it. I’m using asterisks below to mark noarch packages. /→ *R-devtools* → ??? /→ *R-rsconnect* → *R-pkgdown* < / \→ R-lobstr → R-dplyr → ??? R-reticulate < \→ R-sessioninfo → ??? Filling this out a little further in the same way: /→ R-RMariaDB \ /→ *R-BiocFileCache* → *R-biomaRt* /→ *R-DBItest*<→ R-RPostgres >→ *R-dbplyr* < /→ <B> *R-devtools* → < \→ R-odbc ----/ \→ (R-dplyr; see <A>) / \→ R-profvis → *R-ggplot2* → (15 dependent packages) → ??? /→ *R-rsconnect* → *R-pkgdown* < / \→ R-lobstr → <A> R-dplyr → (18 dependent packages) → ??? R-reticulate < /→ (*R-devtoools*; see <B>) \ /→ R-pkgcache \→ R-sessioninfo < /→ (*R-devtoools*; see <B>) \→ *R-rcmdcheck* < \→ *R-reprex* \→ *R-rhub* → (*R-devtoools*; see <B>) Ok, now that I’ve drawn part of a scary graph that seems like it’s going to consume most of the R ecosystem, I’ll point out that Pandas is only a test dependency for R-reticulate, and we should be able to cut off this entire tree at the root by conditionalizing the BuildRequires there on architecture and skipping some or all tests on i686. But the above is still a good example of how each directly-dependent package needs to be considered individually, and how indirectly-dependent packages are important too.
For R-reticulate: https://src.fedoraproject.org/rpms/R-reticulate/pull-request/1
(In reply to Ben Beasley from comment #16) > Unfortunately, I think there’s more than that, including arched packages > with noarch binary packages like python-dask, which build on every > architecture, and packages that have indirect dependencies on python-pandas > via noarch packages. > > I started with “fedrq wrsrc python-pandas.” (Usually wrsrc -s is nicer, but > I wanted to make sure to capture packages that only had runtime dependencies > on pandas.) The first entry is R-reticulate, > https://src.fedoraproject.org/rpms/R-reticulate/blob/rawhide/f/R-reticulate. > spec. It has no BuildArch/ExcludeArch/ExclusiveArch, so it’s a normal arched > package that builds everywhere, and it’s therefore affected. It has its own > tree of dependent packages: both R-rsconnect and R-sessioninfo depend on it. It would have been too easy otherwise. ;) I didn't consider packages only build requiring pandas. That's why R-reticulate is not on my list. Nor did I consider sub packages, which I should have, since, as you pointed out, a single src package may provide noarch and arched sub packages. Back to the drawing board it is. Thanks for pointing out the flaws and for submitting the first PR getting things rolling. I'll post an updated list taking above points into consideration.
Created attachment 2017692 [details] Leaf status dependency tree of python-pandas Now that I've been doing a bit more digging and poking, I hopefully have a more complete picture of the state of affairs. (In reply to Ben Beasley from comment #16) > Ok, now that I’ve drawn part of a scary graph that seems like it’s going to > consume most of the R ecosystem, I’ll point out that Pandas is only a test > dependency for R-reticulate, and we should be able to cut off this entire > tree at the root by conditionalizing the BuildRequires there on architecture > and skipping some or all tests on i686. But the above is still a good > example of how each directly-dependent package needs to be considered > individually, and how indirectly-dependent packages are important too. Regarding above, I've learned that packages, for which only the source package dependents on the package being investigated, this is an indicator that no further examination of (child) dependencies is required. To make that more concrete: `fedrq wrsrc python-pandas` lists `R-reticulate`, however, `fedrq subpkgs R-reticulate | fedrq pkgs -F requires | grep pandas` will return no results. Thus, python-pandas is indeed only a BR for R-reticulate and we don't need to examine the sub tree rooted at R-reticulate. With that and some other lessons learned, I know have a complete[1] tree (to the points where we can stop investigating). It's a rather large tree (see attachment). Summing up the results, we have: - 34 leaf package already excluding i686 - 33 leaf packages not yet excluding i686 - of those, only two are arched packages (or depending on arched packages) - python3-spyking-circus.noarch (depends on python3-statsmodels.x86_64) - python3-contextualbandits.x86_64 - 47 packages that BR python-pandas - of those, we only need to consider packages that have arched sub packages - python-elephant - python-astropy - pyproj - cantera - python-mne - root - arbor - rpy - R-reticulate - pyproj - python-fastavro - python-rapidfuzz - python-dask [1] While I believe the tree is complete, it does contain duplicates, which I didn't filter. E.g.: arbor BRs python3-pandas, yet arbor also BRs python3-seaborn, which itself has a runtime requirement on python3-pandas. Thus arbor is listed twice.
root: https://src.fedoraproject.org/rpms/root/pull-request/5
Given that noarch packages will no longer be built on i686, I went ahead and submitted PRs where needed. That means I only had to look into arched packages directly or indirectly depending on pandas. Here's the current status: ### Packages with runtime dependency on pandas - python-contextualbandits: https://src.fedoraproject.org/rpms/python-contextualbandits/pull-request/3 ### Packages with buildtime dependency on pandas - arbor: already excludes i686 - cantera: already excludes i686 - pyproj: https://src.fedoraproject.org/rpms/pyproj/pull-request/5 - python-mne: already excludes i686 - python-astropy: https://src.fedoraproject.org/rpms/python-astropy/pull-request/10 - python-ephyviewer: already excludes i686 - python-fastavro: already excludes i686 - python-google-cloud-monitoring: no arched subpackages - python-imbalanced-learn: no arched subpackages - python-jsonpickle: no arched subpackages - python-lsp-server: no arched subpackages - python-mplcursors: no arched subpackages - python-param: no arched subpackages - python-plotly: no arched subpackages - python-probeinterface: no arched subpackages - python-pytest-arraydiff: no arched subpackages - python-pytest-harvest: no arched subpackages - python-pytest-steps: no arched subpackages - python-rapidfuzz: https://src.fedoraproject.org/rpms/python-rapidfuzz/pull-request/6 - python-sklearn-genetic-opt: no arched subpackages - python-tabulate: no arched subpackages - root: https://src.fedoraproject.org/rpms/root/pull-request/5 - rpy: https://src.fedoraproject.org/rpms/rpy/pull-request/1 - R-reticulate: https://src.fedoraproject.org/rpms/R-reticulate/pull-request/1 - snakemake: no arched subpackages - stats-collect: no arched subpackages - BRed by wult, but that package is `ExclusiveArch x86_64` ### Non-trivial packages - python-dask - python-statsmodels I hope I've got this right this time ... Regarding the last category, non-trivial packages, those have circular dependencies and/or multiple packages depending on them. I plan on treating them the same way as pandas. In other words, determine the dependency tree for those packages and take it from there. In fact, I've already ran the analysis. The trees are much smaller than the tree for pandas. And the fact that noarch is no longer affected by i686 builds should reduce the work required as well. Packages listed in the pandas dependency tree, but not mentioned here, depend in some way on the two non-trivial packages and will be dealt with when those packages get sorted out.
Two more PRs from the dependency stack of `python-dask` and `python-statsmodels`: https://src.fedoraproject.org/rpms/python-dask/pull-request/8 https://src.fedoraproject.org/rpms/python-statsmodels/pull-request/7 Any packages depending on those, were either noarch BR packages or already excluded %{ix86}. Also, `python-dask` is kinda special, since it is actually noarch, but there's also a circular dependency involving `python-pandas+test`.
Where do we stand on this? I see that a number of the pull requests have been merged (like the above 2), but I haven't looked at all of them.
I think we are almost there. Of all the PRs provided only one is still pending: https://src.fedoraproject.org/rpms/root/pull-request/5 I'll ping the maintainers in the PR and ask music to rebase the PR. It can no longer be merged by the looks of it. Once that has been merged, we should be able to drop i686 for pandas. Koji no longer building noarch packages on i686 has helped a lot in reducing the number of PRs required.
(In reply to Sandro from comment #24) > I think we are almost there. Of all the PRs provided only one is still > pending: > > https://src.fedoraproject.org/rpms/root/pull-request/5 That has been merged today. I would appreciate a second pair of eyes to make sure I haven't missed anything. To proceed, I'm thinking of: 1. Smoke test in Copr 2. Evaluate results 3. Submit additional PRs if needed (based on 2) 4. Apply the change in rawhide (before F41 is branched) Does that approach sound reasonable?
I have submitted two more PRs: 1. https://src.fedoraproject.org/rpms/mlpack/pull-request/14 - `mlpack` is a leaf package with no dependent packages 2. https://src.fedoraproject.org/rpms/python-devicely/pull-request/4 - `python-devicely` is a noarch package, but for some reason it didn't use `BuildArch: noarch` Once these two are merged, we should be good to go dropping i686 in rawhide.