Bug 2263999 - Drop i686 support from python-pandas
Summary: Drop i686 support from python-pandas
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: python-pandas
Version: rawhide
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Sandro
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-02-13 08:52 UTC by Sandro
Modified: 2024-05-25 10:58 UTC (History)
11 users (show)

Fixed In Version: python-pandas-2.2.1-3.fc41
Clone Of:
Environment:
Last Closed: 2024-05-25 10:58:37 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Leaf status dependency tree of python-pandas (12.24 KB, text/plain)
2024-02-19 18:36 UTC, Sandro
no flags Details

Description Sandro 2024-02-13 08:52:29 UTC
Upstream has plans[0] for adding a dependency on PyArrow in the next major release (3.0). With `libarrow`[1] excluding `%{ix86}` and `%{arm}` we need to follow suit. Since a lot of packages depend on pandas, we should set that in motion now.

Even if upstream should decide not to depend on PyArrow, this effort is still worth it as a continuation of the i686 leaf removal effort[2] from a view release back. Besides, upstream has dropped 32-bit support in pandas a while back. Thus, this brings us in line with upstream.

[0] https://github.com/pandas-dev/pandas/issues/54466
[1] https://src.fedoraproject.org/rpms/libarrow
[2] https://fedoraproject.org/wiki/Changes/EncourageI686LeafRemoval

Reproducible: Always

Comment 1 Sandro 2024-02-13 09:01:36 UTC
Here is the list of packages directly depending on pandas:

R-reticulate
arbor
cantera
mlpack
openms
pyproj
python-SALib
python-arviz
python-astropy
python-bioframe
python-bluepyopt
python-contextualbandits
python-cro
python-dask
python-devicely
python-ephyviewer
python-fastavro
python-formulaic
python-geopandas
python-geoplot
python-hass-data-detective
python-hdfs
python-hdmf
python-hypothesis
python-imbalanced-learn
python-jsonpickle
python-libpysal
python-lsp-server
python-mapclassify
python-mizani
python-mne
python-mplcursors
python-netpyne
python-neurom
python-neurosynth
python-niaaml
python-niaarm
python-niaclass
python-niapy
python-nilearn
python-opfunu
python-pandas-datareader
python-pandas-flavor
python-param
python-partd
python-patsy
python-pingouin
python-plotly
python-plotnine
python-probeinterface
python-pybids
python-pynwb
python-pyongc
python-pypet
python-pytest-arraydiff
python-pytest-harvest
python-pytest-regressions
python-pytest-steps
python-rapidfuzz
python-scikit-uplift
python-sciunit
python-seaborn
python-sklearn-genetic-opt
python-sklearn-nature-inspired-algorithms
python-sport-activities-features
python-statsmodels
python-succulent
python-tabulate
python-xarray
python-yfinance
root
rpy
snakemake
stats-collect
wult

I'll go through them to figure out which are leaf packages not yet excluding 32-bit arches and file bugs/PRs for them, working my way up.

Comment 2 Ankur Sinha (FranciscoD) 2024-02-13 10:11:22 UTC
The leaf removal will have to be done from the leaf side of things going up the dep tree, so this could touch quite a few packages. It's maybe worth putting through the change process to ensure everyone knows about this? Also maybe worth explicitly emailing the python-sig ML to ensure they're aware of this?

Comment 3 Zbigniew Jędrzejewski-Szmek 2024-02-13 10:21:39 UTC
According to https://fedoraproject.org/wiki/Changes/EncourageI686LeafRemoval,
the removal can be done with a minimum of fuss. I don't think it makes sense
to go through the Change process, because that would delay the whole procedure
quite a bit. I'd say just write a heads-up to fedora-devel with all the maintainers
of packages that will be affected in CC, wait a week, and do the deed.
If there are dependent packages that FTBFS, pull requests can be filed,
and proven packager privileges can be used to merge them.

Comment 4 Ankur Sinha (FranciscoD) 2024-02-13 10:27:45 UTC
Sounds good, just as long as the community and package maintainers are clearly aware of this.

I think it'll still require the generation of a recursive dependency tree and its traversal from the leaves up towards the root (pandas). I.e., it won't be enough to only look at direct dependencies of pandas. Right?

Comment 5 Sandro 2024-02-13 11:31:31 UTC
Thanks for the input. I will send a mail out to the devel and python lists, so package maintainers are aware. I'll probably do this in batches for packages that are leaves in the dependency tree. And, yes, we will have to go through the dependency tree recursively, working our way in (or up).

Currently the list of dependent packages is split almost equal between leaves (34) and non-leaves (36). Of the leaf packages I have yet to look up which packages may already exclude 32-bit arches. I can post both lists here if people are interested.

Comment 6 Ben Beasley 2024-02-13 12:42:07 UTC
Sure would be nice if https://pagure.io/koji/issue/3809 could happen. A lot of the dependent packages are noarch.

Quoting from https://github.com/pandas-dev/pandas/issues/54466#issuecomment-1941380678,

> […] future mandatory pyarrow dependency will not imply current pyarrow package but the new pyarrow-core (libarrow only) or pyarrow-base (libarrow and libparquet only), that will be published for the first time in a matter of weeks.

I don’t think this affects the likely need to exclude 32-bit architectures in the future, but it’s useful context.

Comment 7 Sandro 2024-02-13 13:12:02 UTC
I think you are correct. The split of pyarrow in smaller sub packages will still require us to drop 32-bit support in pandas since libarrow in Fedora already excludes those.

Regarding koji, I will add a comment to the ticket announcing that we will ExcludeArch a great number of noarch packages. Hopefully there will be a solution to the noarch builds on i686 issue soon.

If not, should we consider the use of ExclusiveArch? That's probably opening another can of worms and might require us to tackle noarch packages that already make use of ExcludeArch.

Comment 8 Ben Beasley 2024-02-13 13:16:04 UTC
(In reply to Sandro from comment #7)
> If not, should we consider the use of ExclusiveArch? That's probably opening
> another can of worms and might require us to tackle noarch packages that
> already make use of ExcludeArch.

What would ExclusiveArch do that ExcludeArch wouldn’t? (Other than make life harder for people working on things like RISC-V?)

Comment 9 Sandro 2024-02-13 13:42:56 UTC
I thought it would allow building only on the specified arches. Though, my understanding might be wrong. But then I fail to fully understand this example[1]:

```
BuildArch: noarch
# List the arches that the dependent package builds on below
ExclusiveArch: %{ix86} %{arm} x86_64 noarch
```

Going by the comment, I thought this dictates what arches to use for building. Thus excluding noarch from ExclusiveArch and have it use only 64-bit arches/builders might do the trick.

[1] https://docs.fedoraproject.org/en-US/packaging-guidelines/#_arch_specific_runtime_and_build_time_dependencies

Comment 10 Ben Beasley 2024-02-13 16:57:23 UTC
(In reply to Sandro from comment #9)
> I thought it would allow building only on the specified arches. Though, my
> understanding might be wrong. But then I fail to fully understand this
> example[1]:
> 
> ```
> BuildArch: noarch
> # List the arches that the dependent package builds on below
> ExclusiveArch: %{ix86} %{arm} x86_64 noarch
> ```
> 
> Going by the comment, I thought this dictates what arches to use for
> building. Thus excluding noarch from ExclusiveArch and have it use only
> 64-bit arches/builders might do the trick.
> 
> [1]
> https://docs.fedoraproject.org/en-US/packaging-guidelines/
> #_arch_specific_runtime_and_build_time_dependencies

I think you are correct about what this does, but I also think that if i686, x86_64, ppc64le, aarch64, and s390x are the only possible architectures (that is, ignoring alternative architectures), then

> BuildArch: noarch
> ExclusiveArch: x86_64 ppc64le aarch64 s390x noarch

and

> BuildArch: noarch
> ExcludeArch: %{ix86}

express the same thing, but the latter better represents the *intent* of excluding 32-bit architectures, and doesn’t have to be modified to allow new 64-bit architectures in the future.

Comment 11 Sandro 2024-02-13 17:55:11 UTC
(In reply to Ben Beasley from comment #10)
> express the same thing, but the latter better represents the *intent* of
> excluding 32-bit architectures, and doesn’t have to be modified to allow new
> 64-bit architectures in the future.

I agree. With Koji honoring ExcludeArch for selecting builders all would be fine. I thought with using ExclusiveArch, preferably with a macro that defines 64-bit arches not including noarch, koji might be wrangled into submission. Of course, fixing the way Koji selects builders for noarch packages is the preferred solution.

Comment 12 Ben Beasley 2024-02-13 18:20:59 UTC
(In reply to Sandro from comment #11)
> I agree. With Koji honoring ExcludeArch for selecting builders all would be
> fine. I thought with using ExclusiveArch, preferably with a macro that
> defines 64-bit arches not including noarch, koji might be wrangled into
> submission. Of course, fixing the way Koji selects builders for noarch
> packages is the preferred solution.

So, forgive me if you’re not confused or misled in the way I think you might be, but

> BuildArch: noarch
> ExcludeArch: %{ix86}

does work today in the way you would expect.

The problem is that it shouldn’t be needed: noarch packages *should not* block arched packages from dropping i686. A lot of arched packages would be leaf packages, if not for a few dozen noarch Python packages that depend on them. Because noarch packages *could* be assigned to an i686 builder, their dependencies have to maintain i686 support, or they’ll FTBFS randomly. But adding ExcludeArch to all those noarch packages is tedious, so it tends not to happen, and so everything ends up having to keep i686 support. If noarch packages were just never built on i686 by policy, then none of that would be necessary, and we would only have to worry about arched dependent packages. That’s what https://pagure.io/koji/issue/3809 is about.

Comment 13 Ben Beasley 2024-02-13 18:24:36 UTC
I recently had to restore i686 support in python-fastapi and python-opentelemetry-contrib even though they were

> BuildArch: noarch
> ExcludeArch: %{ix86}

for more than a year, because a new noarch package python-sentry-sdk started depending on them, and then some arched packages started depending on python-sentry-sdk, and it was easier to remove the ExcludeArch than to start sending PR’s to all the dependent packages. I’m not sure what it would take to prevent that situation.

Comment 14 Sandro 2024-02-13 21:10:29 UTC
(In reply to Ben Beasley from comment #12)
> The problem is that it shouldn’t be needed: noarch packages *should not*
> block arched packages from dropping i686.

Thank you! Honestly, my brain is not firing on all cylinders today. I might be getting sick. We'll see tomorrow.

I need to step away from the keyboard for a while and give my brain some rest. I'm not joking.

> So, forgive me if you’re not confused or misled in the way I think you might be, but

You could only half imagine the ways in which I get confused and misled. But you were spot on, I was. I appreciate you helping me to get back on track. I really do.

And now I will call it a day, have an early night and, if my past experience is anything to go by, wake up tomorrow with a fever and what not. 🥵

Comment 15 Sandro 2024-02-16 11:30:33 UTC
(In reply to Ben Beasley from comment #6)
> Sure would be nice if https://pagure.io/koji/issue/3809 could happen. A lot
> of the dependent packages are noarch.

Indeed, it would. It seems there is some movement.

With the premise that we only need to worry about arched packages, the list of packages turns out to be rather small:

mlpack
python-contextualbandits
python-devicely
python-statsmodels
python-reproject

I'll double check my results and start submitting PRs.

Comment 16 Ben Beasley 2024-02-16 12:55:50 UTC
Unfortunately, I think there’s more than that, including arched packages with noarch binary packages like python-dask, which build on every architecture, and packages that have indirect dependencies on python-pandas via noarch packages.

I started with “fedrq wrsrc python-pandas.” (Usually wrsrc -s is nicer, but I wanted to make sure to capture packages that only had runtime dependencies on pandas.) The first entry is R-reticulate, https://src.fedoraproject.org/rpms/R-reticulate/blob/rawhide/f/R-reticulate.spec. It has no BuildArch/ExcludeArch/ExclusiveArch, so it’s a normal arched package that builds everywhere, and it’s therefore affected. It has its own tree of dependent packages: both R-rsconnect and R-sessioninfo depend on it.

              /→ R-rsconnect → ???
R-reticulate <
              \→ R-sessioninfo → ???

R-rsconnect and R-sessioninfo are noarch, so they are OK, but what if an arched package depends on pandas indirectly *via* R-rsconnect? R-pkgdown depends on R-rsconnect, but it is also noarch. R-devtools and R-lobstr depend on R-pkgdown: R-devtools is noarch, but R-lobstr is not, so it is potentially impacted. And the arched package R-dplyr depends on it. I’m using asterisks below to mark noarch packages.

                                               /→ *R-devtools* → ???
               /→ *R-rsconnect* → *R-pkgdown* <
              /                                \→ R-lobstr → R-dplyr → ???
R-reticulate <                                
              \→ R-sessioninfo → ???

Filling this out a little further in the same way:
                                                                                      /→ R-RMariaDB \               /→ *R-BiocFileCache* → *R-biomaRt*
                                                                       /→ *R-DBItest*<→ R-RPostgres  >→ *R-dbplyr* <
                                                /→ <B> *R-devtools* → <               \→ R-odbc ----/               \→ (R-dplyr; see <A>)
                                               /                       \→ R-profvis → *R-ggplot2* → (15 dependent packages) → ???
               /→ *R-rsconnect* → *R-pkgdown* <
              /                                \→ R-lobstr → <A> R-dplyr → (18 dependent packages) → ???
R-reticulate <                    /→ (*R-devtoools*; see <B>)
              \                  /→ R-pkgcache
               \→ R-sessioninfo <                  /→ (*R-devtoools*; see <B>)
                                 \→ *R-rcmdcheck* <
                                  \→ *R-reprex*    \→ *R-rhub* → (*R-devtoools*; see <B>)

Ok, now that I’ve drawn part of a scary graph that seems like it’s going to consume most of the R ecosystem, I’ll point out that Pandas is only a test dependency for R-reticulate, and we should be able to cut off this entire tree at the root by conditionalizing the BuildRequires there on architecture and skipping some or all tests on i686. But the above is still a good example of how each directly-dependent package needs to be considered individually, and how indirectly-dependent packages are important too.

Comment 17 Ben Beasley 2024-02-16 13:31:20 UTC
For R-reticulate: https://src.fedoraproject.org/rpms/R-reticulate/pull-request/1

Comment 18 Sandro 2024-02-16 14:30:25 UTC
(In reply to Ben Beasley from comment #16)
> Unfortunately, I think there’s more than that, including arched packages
> with noarch binary packages like python-dask, which build on every
> architecture, and packages that have indirect dependencies on python-pandas
> via noarch packages.
> 
> I started with “fedrq wrsrc python-pandas.” (Usually wrsrc -s is nicer, but
> I wanted to make sure to capture packages that only had runtime dependencies
> on pandas.) The first entry is R-reticulate,
> https://src.fedoraproject.org/rpms/R-reticulate/blob/rawhide/f/R-reticulate.
> spec. It has no BuildArch/ExcludeArch/ExclusiveArch, so it’s a normal arched
> package that builds everywhere, and it’s therefore affected. It has its own
> tree of dependent packages: both R-rsconnect and R-sessioninfo depend on it.

It would have been too easy otherwise. ;)

I didn't consider packages only build requiring pandas. That's why R-reticulate is not on my list. Nor did I consider sub packages, which I should have, since, as you pointed out, a single src package may provide noarch and arched sub packages.

Back to the drawing board it is. Thanks for pointing out the flaws and for submitting the first PR getting things rolling. I'll post an updated list taking above points into consideration.

Comment 19 Sandro 2024-02-19 18:36:54 UTC
Created attachment 2017692 [details]
Leaf status dependency tree of python-pandas

Now that I've been doing a bit more digging and poking, I hopefully have a more complete picture of the state of affairs.

(In reply to Ben Beasley from comment #16)
> Ok, now that I’ve drawn part of a scary graph that seems like it’s going to
> consume most of the R ecosystem, I’ll point out that Pandas is only a test
> dependency for R-reticulate, and we should be able to cut off this entire
> tree at the root by conditionalizing the BuildRequires there on architecture
> and skipping some or all tests on i686. But the above is still a good
> example of how each directly-dependent package needs to be considered
> individually, and how indirectly-dependent packages are important too.

Regarding above, I've learned that packages, for which only the source package dependents on the package being investigated, this is an indicator that no further examination of (child) dependencies is required.

To make that more concrete:

`fedrq wrsrc python-pandas` lists `R-reticulate`, however, `fedrq subpkgs R-reticulate | fedrq pkgs -F requires | grep pandas` will return no results. Thus, python-pandas is indeed only a BR for R-reticulate and we don't need to examine the sub tree rooted at R-reticulate.

With that and some other lessons learned, I know have a complete[1] tree (to the points where we can stop investigating). It's a rather large tree (see attachment). Summing up the results, we have:

- 34 leaf package already excluding i686
- 33 leaf packages not yet excluding i686
  - of those, only two are arched packages (or depending on arched packages)
    - python3-spyking-circus.noarch (depends on python3-statsmodels.x86_64)
    - python3-contextualbandits.x86_64
- 47 packages that BR python-pandas
  - of those, we only need to consider packages that have arched sub packages
    - python-elephant
    - python-astropy
    - pyproj
    - cantera
    - python-mne
    - root
    - arbor
    - rpy
    - R-reticulate
    - pyproj
    - python-fastavro
    - python-rapidfuzz
    - python-dask

[1] While I believe the tree is complete, it does contain duplicates, which I didn't filter. E.g.: arbor BRs python3-pandas, yet arbor also BRs python3-seaborn, which itself has a runtime requirement on python3-pandas. Thus arbor is listed twice.

Comment 20 Ben Beasley 2024-02-20 14:10:00 UTC
root: https://src.fedoraproject.org/rpms/root/pull-request/5

Comment 21 Sandro 2024-03-06 17:00:31 UTC
Given that noarch packages will no longer be built on i686, I went ahead and submitted PRs where needed. That means I only had to look into arched packages directly or indirectly depending on pandas. Here's the current status:

### Packages with runtime dependency on pandas

- python-contextualbandits: https://src.fedoraproject.org/rpms/python-contextualbandits/pull-request/3

### Packages with buildtime dependency on pandas

- arbor: already excludes i686
- cantera: already excludes i686
- pyproj: https://src.fedoraproject.org/rpms/pyproj/pull-request/5
- python-mne: already excludes i686
- python-astropy: https://src.fedoraproject.org/rpms/python-astropy/pull-request/10
- python-ephyviewer: already excludes i686
- python-fastavro: already excludes i686
- python-google-cloud-monitoring: no arched subpackages
- python-imbalanced-learn: no arched subpackages
- python-jsonpickle: no arched subpackages
- python-lsp-server: no arched subpackages
- python-mplcursors: no arched subpackages
- python-param: no arched subpackages
- python-plotly: no arched subpackages
- python-probeinterface: no arched subpackages
- python-pytest-arraydiff: no arched subpackages
- python-pytest-harvest: no arched subpackages
- python-pytest-steps: no arched subpackages
- python-rapidfuzz: https://src.fedoraproject.org/rpms/python-rapidfuzz/pull-request/6
- python-sklearn-genetic-opt: no arched subpackages
- python-tabulate: no arched subpackages
- root: https://src.fedoraproject.org/rpms/root/pull-request/5
- rpy: https://src.fedoraproject.org/rpms/rpy/pull-request/1
- R-reticulate: https://src.fedoraproject.org/rpms/R-reticulate/pull-request/1
- snakemake: no arched subpackages
- stats-collect: no arched subpackages
  - BRed by wult, but that package is `ExclusiveArch x86_64`

### Non-trivial packages

- python-dask
- python-statsmodels

I hope I've got this right this time ...

Regarding the last category, non-trivial packages, those have circular dependencies and/or multiple packages depending on them. I plan on treating them the same way as pandas. In other words, determine the dependency tree for those packages and take it from there. In fact, I've already ran the analysis. The trees are much smaller than the tree for pandas. And the fact that noarch is no longer affected by i686 builds should reduce the work required as well.

Packages listed in the pandas dependency tree, but not mentioned here, depend in some way on the two non-trivial packages and will be dealt with when those packages get sorted out.

Comment 22 Sandro 2024-03-11 14:14:48 UTC
Two more PRs from the dependency stack of `python-dask` and `python-statsmodels`:

https://src.fedoraproject.org/rpms/python-dask/pull-request/8
https://src.fedoraproject.org/rpms/python-statsmodels/pull-request/7

Any packages depending on those, were either noarch BR packages or already excluded %{ix86}. Also, `python-dask` is kinda special, since it is actually noarch, but there's also a circular dependency involving `python-pandas+test`.

Comment 23 Orion Poplawski 2024-04-28 03:46:24 UTC
Where do we stand on this?  I see that a number of the pull requests have been merged (like the above 2), but I haven't looked at all of them.

Comment 24 Sandro 2024-04-28 10:06:05 UTC
I think we are almost there. Of all the PRs provided only one is still pending:

https://src.fedoraproject.org/rpms/root/pull-request/5

I'll ping the maintainers in the PR and ask music to rebase the PR. It can no longer be merged by the looks of it.

Once that has been merged, we should be able to drop i686 for pandas. Koji no longer building noarch packages on i686 has helped a lot in reducing the number of PRs required.

Comment 25 Sandro 2024-04-29 17:03:12 UTC
(In reply to Sandro from comment #24)
> I think we are almost there. Of all the PRs provided only one is still
> pending:
> 
> https://src.fedoraproject.org/rpms/root/pull-request/5

That has been merged today. I would appreciate a second pair of eyes to make sure I haven't missed anything.

To proceed, I'm thinking of:

1. Smoke test in Copr
2. Evaluate results
3. Submit additional PRs if needed (based on 2)
4. Apply the change in rawhide (before F41 is branched)

Does that approach sound reasonable?

Comment 26 Sandro 2024-05-12 11:10:33 UTC
I have submitted two more PRs:

1. https://src.fedoraproject.org/rpms/mlpack/pull-request/14 - `mlpack` is a leaf package with no dependent packages
2. https://src.fedoraproject.org/rpms/python-devicely/pull-request/4 - `python-devicely` is a noarch package, but for some reason it didn't use `BuildArch: noarch`

Once these two are merged, we should be good to go dropping i686 in rawhide.


Note You need to log in before you can comment on or make changes to this bug.