Bug 1096912 - pip bundles a bunch of deps
Summary: pip bundles a bunch of deps
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: python-pip
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Charalampos Stratakis
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-05-12 16:28 UTC by Toshio Ernie Kuratomi
Modified: 2017-12-10 05:08 UTC (History)
11 users (show)

Fixed In Version: python-pip-9.0.1-10.fc26 python-pip-9.0.1-13.fc27
Clone Of:
Environment:
Last Closed: 2017-12-04 20:14:50 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Toshio Ernie Kuratomi 2014-05-12 16:28:52 UTC
Description of problem:

Newer versions of python-pip bundle a bunch of dependencies.  Everything in the _vendor subdirectory.  We need to get rid of those.

Version-Release number of selected component (if applicable):

1.5.4-4


Steps to Reproduce:
1.  rpm -ql |grep _vendor

Actual results:
See that there's a bunch of bundled libraries in the _vendor directory

Expected results:
No bundled libraries

Additional info:

I know that unbundling from pip will cause issues with the python-3.4 bundling issue.  Ways to solve this:

* Adapt rewheel to do recursive rewheeling.  This is probably what we need to do.  We may also have to do something about pip's changing of some import statements to find the libraries in the _vendor directory but that's not a problem for all of the bundled libraries -- we can cross that bridge when we get to it.  Upstream pip is probably amenable to taking patches if we can work out a way to do that and not break their current use cases.
* Get a bundling exception for python-pip.
* Get a bundling exception for python3.
  - The way I understand it, we could unbundle python-pip's libraries but that would break certain python3 functionality.  If leaving ensurepip bundled in python3 does allow unbundling in python=pip, that may be more palatable due to the increased security team of upstream python and the more limited code usage in ensurepip.

We may need to apply for a temporary bundling exception for one of the latter options with a commitment to implement recursive rewheel and fixing of imports in future updates.

Comment 1 Bohuslav "Slavek" Kabrda 2014-05-16 07:37:51 UTC
(In reply to Toshio Ernie Kuratomi from comment #0)
> Description of problem:
> 
> Newer versions of python-pip bundle a bunch of dependencies.  Everything in
> the _vendor subdirectory.  We need to get rid of those.
> 
> Version-Release number of selected component (if applicable):
> 
> 1.5.4-4
> 
> 
> Steps to Reproduce:
> 1.  rpm -ql |grep _vendor
> 
> Actual results:
> See that there's a bunch of bundled libraries in the _vendor directory
> 
> Expected results:
> No bundled libraries
> 
> Additional info:
> 
> I know that unbundling from pip will cause issues with the python-3.4
> bundling issue.  Ways to solve this:
> 
> * Adapt rewheel to do recursive rewheeling.  This is probably what we need
> to do.  We may also have to do something about pip's changing of some import
> statements to find the libraries in the _vendor directory but that's not a
> problem for all of the bundled libraries -- we can cross that bridge when we
> get to it.  Upstream pip is probably amenable to taking patches if we can
> work out a way to do that and not break their current use cases.

So, the problem with this was pointed out by Donald Stufft - if we unbundle and then do recursive rewheeling (which we, technically, can), then users will likely be able to uninstall pip's dependencies in virtualenv (e.g. pip uninstall requests), which will make pip useless. Also, if users upgrade requests in virtualenv, it may cause pip not to work. This is beyond our control and this makes me think that we should rather go with bundling and get a bundling exception - what do you think?

> * Get a bundling exception for python-pip.
> * Get a bundling exception for python3.
>   - The way I understand it, we could unbundle python-pip's libraries but
> that would break certain python3 functionality.  If leaving ensurepip
> bundled in python3 does allow unbundling in python=pip, that may be more
> palatable due to the increased security team of upstream python and the more
> limited code usage in ensurepip.
> 
> We may need to apply for a temporary bundling exception for one of the
> latter options with a commitment to implement recursive rewheel and fixing
> of imports in future updates.

IMO bundling exception would only be needed for python-pip, as python3 only requires python-pip, but doesn't in fact bundle anything on its own.

Debian seems to be taking an approach that's sort of "in between" bundling and unbundling [1]. IIUC, they'll have python-pip dependencies bundled inside it, but they'll "re-bundle" them there from their system packages; we could use this approach too, but I'm not too fond of it.

I'd appreciate your comments on the above, thanks!

[1] https://lists.debian.org/debian-python/2014/05/msg00025.html

Comment 2 Toshio Ernie Kuratomi 2014-05-20 00:35:08 UTC
Upstream python3 bundles pip.

Upstream pip bundles many things.

Right now, we've got the downstream python3 not bundling pip.  But my impression is that that makes it harder for things like python3's built-in venv implementation from working if we also unbundle the stuff in pip.  Whereas it would just work if we left it inside of python3.  Correct me on that part if I'm wrong.

That's why I mentioned that it *may* make sense to leave pip bundled in python3 but unbundle the deps in its own package.  But my understanding of hte situation could be flawed.

At PyCon, Barry was telling me that he thought Debian might do a complete unbundling and re-assemble the wheels at runtime via rewheel.  But that was preliminary planning so it could have changed.  If they are going to the plan you mention;; using system packages at build-time, that would be preferable to bundling as it is essentially static linking.  That's easier to track and maintain than bundling.

But we want to first examine whether it's possible to unbundle and use more normal "dynamic linking" as per normal.  And if that's not possible, then come up with the least-bad alternative.

Comment 3 Bohuslav "Slavek" Kabrda 2014-05-21 09:25:23 UTC
(In reply to Toshio Ernie Kuratomi from comment #2)
> Upstream python3 bundles pip.
> 
> Upstream pip bundles many things.
> 
> Right now, we've got the downstream python3 not bundling pip.  But my
> impression is that that makes it harder for things like python3's built-in
> venv implementation from working if we also unbundle the stuff in pip.

I wouldn't say "harder". The problem is that we wouldn't match upstream behaviour (and users expectations), since by e.g. uninstalling requests in venv would render pip useless.
 
> Whereas it would just work if we left it inside of python3.  Correct me on
> that part if I'm wrong.

Do you mean unbundle stuff from python-pip, but leave pip wheel bundled in python3? I'd be against that, since that's bundling^2 ;)

> That's why I mentioned that it *may* make sense to leave pip bundled in
> python3 but unbundle the deps in its own package.  But my understanding of
> hte situation could be flawed.

As noted above, it's possible (technically), but I'd be strongly against it.

> At PyCon, Barry was telling me that he thought Debian might do a complete
> unbundling and re-assemble the wheels at runtime via rewheel.  But that was
> preliminary planning so it could have changed.  If they are going to the
> plan you mention;; using system packages at build-time, that would be
> preferable to bundling as it is essentially static linking.  That's easier
> to track and maintain than bundling.
> 
> But we want to first examine whether it's possible to unbundle and use more
> normal "dynamic linking" as per normal.  And if that's not possible, then
> come up with the least-bad alternative.

I think we should adapt our current approach this way:

- during python3-pip build, we "rm -rf" all the bundled stuff inside pip and copy the system packages files in pip/_vendor (adapting their imports etc)
- therefore python3-pip will still not depend on the bundled packages in runtime, it will "statically link" them
- the rest is the same - if user runs pyvenv, we use rewheel on setuptools and pip

The advantage compared to the Debian approach is (if I understand their approach correctly) is IMHO in this situation:

- there is a CVE for one of the dependencies bundled inside python3-pip, e.g. for python3-requests
- Debian fixes the issue in python3-requests and build it; since they "link dynamically" (again, IIUC), their users will see fixed requests and will (correctly) assume, that system python3-pip is safe to use, too; this may however lead them to conclusion that pip in newly created environments is safe to use, too - this is however not the case until Debian rebuilds pip to refresh the wheel their python3 uses. So for some transitional period of time, Debian users may get unsafe pip in pyvenv, while having safe system pip.
- Fedora, on the other hand, rebuilds python3-requests and says "python3-requests" is fine, but since python3-pip doesn't depend on requests in runtime, there is no promise that we got that fixed, too. When we fix python3-pip, it'll be for both a newly created pyvenv *and* system *at the same time*.

(Another problem is that we also have to figure out a way to add downstream versioning to rewheeled wheels, but since there are some problems in python-wheel with that, it'll have to wait for now).

Hope the example makes sense... If I forgot/missed something, please say so.

Comment 4 Miro Hrončok 2014-05-28 17:10:15 UTC
Would it be possible to make python3-pip depend on python3-requests and others and symlink the libraries, currently bundled, to _vendor?

Comment 5 Bohuslav "Slavek" Kabrda 2014-06-12 07:10:32 UTC
So I came up with what seems to be a "proper solution", just to find out that Debian maintainers came up with it first and I just misread what they want to do. So here it is (the Debian way):
- create wheels of all dependencies bundled in pip in buildtime and store them in /usr/share/python-wheels (e.g. requests, six, ... wheels will be there)
- when pyvenv is run, copy this directory to the new virtualenv
- make modifications of pip/_vendor/__init__.py to import *from the wheels* (these are done *both* in system and in virtualenv IIUC, but that's not so important since I have another plan)
- now, user cannot accidentally uninstall/upgrade/downgrade one of pip's dependencies and everything just works; the only minor downside is, that if user runs "pip install --upgrade pip", new upstream pip will be installed and the wheels from /usr/share/python-wheels will be left in virtualenv with no use

My plan:
- package all the dependencies bundled in pip as wheels in RPM, so that rewheel can be used on them (so, no /usr/share/python-wheels)
- when pyvenv is run, rewheel these and install them in pip/_vendor
- make adaptations similar to Debian's that would import these properly

I actually think that it'd be possible to make modifications to pip/_vendor/__init__.py that might be acceptable for upstream, but I'll have to experiment with that a bit first. Also, Barry Warsaw from Debian mentioned that considered installing the system wheels into pip/_vendor/ in virtualenv, but Donald Stufft, the upstream pip maintainer didn't think that's a good idea - I'm going to talk to Donald about this, since I'd like to know his reasoning for this.

One way or the other, we can do this properly without any form of bundling; I'll just have to talk to FPC about packaging the pip-bundled libraries as wheels in RPM, since that's not an approved/guidelined approach right now.

Comment 6 Donald Stufft 2014-12-12 19:51:34 UTC
For the record, I've made a few changes to the next version of pip (6.0) that will make this ticket easier.

1. I've added an import hook to pip/_vendor/__init__.py that will correctly alias pip._vendor.anything to anything if the vendored items have been removed. This means that pip's code base can (and should) continue to do things like from pip._vendor import requests even if the bundled deps are removed.
2. I've added an environment flag that you can set when installing pip from an sdist (NOT FROM A WHEEL) that will install pip in "unbundled" mode. This isn't supported for end users and exists entirely for downstream. It will not adjust pip's dependency information at all but it will essentially just install pip without the stuff in pip/_vendor.

I'm not against putting the Wheels in pip/_vendor. To be honest that's where I'd prefer they go. However if that's done then the metadata files need to be modified to mention that those files are part of pip, so that they'll be removed if pip is removed.

After seeing how it's worked so far in Ubuntu and Debian I have thoughts about the general solution that they've adopted.

- On the upstream side, I still prefer bundling, but if someone is going to unbundle I think it's likely the sanest solution.
- It has a sort of problem in that people will do something like upgrade requests in the system python with pip, and then break pip. We've gotten a few issues in pip's bug tracker from these.

The way Debian's patch works, you don't need to rewrite imports at all in the bundled software, since they are just adding the Wheels to the sys.path when invoking pip the import statements will work just fine (and that will include with 6.0 when you won't need to rewrite imports inside of pip either).

In Debian pip only adds the wheels to sys.path inside of a virtual environment, if that could be modified so that pip _always_ imports from Wheels, even at the system level that would mean that pip install requests<1.0 won't break pip. It would of course break _other_ system software that Fedora has packaged that uses requests, but at least with a working pip they could uninstall it with ``pip uninstall requests``.

Overall I think I'm going to add another feature to pip where it'll also look inside of pip/_vendor for .whl files and add them to sys.path. Which would mean that all you'd need to do is drop the wheel files in there, and adjust the metadata so that it knows that those files are part of pip. Everything else (in a pip 6.0+ world) would be handled within pip itself.

Comment 7 Donald Stufft 2015-01-03 10:30:46 UTC
Just an update, pip 6.0 has released and it has the changed I described above. It has an environment variable you can use when running ``setup.py install`` (not when installing from Wheel though, since there's no "hook" to dynamically change what is installed there) that won't install the bundled deps and it'll look for .whl files in the ``pip/_vendor/`` directory and add them directly to ``sys.path``. In addition when you try to import anything from ``pip._vendor`` if the bundled items are gone it'll attempt to do the same import without the pip.vendor prefix and act as an "alias". The pip._vendor prefix alias fallback option works in conjunction with the Wheels or a standalone.

Hopefully that all helps make things easier for you all, if there's more we can do let me know.

Comment 8 Jaroslav Reznik 2015-03-03 15:48:22 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle.
Changing version to '22'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22

Comment 9 Fedora End Of Life 2016-07-19 11:30:41 UTC
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 10 Miro Hrončok 2016-07-19 11:40:56 UTC
Still relevant for at least F23.

Comment 11 Nick Coghlan 2016-07-20 05:39:39 UTC
Reset to Rawhide and also reset to the default assignee, since Slavek isn't working specifically on pip any more.

Note that the status quo here is that:

- the python3 package doesn't bundle pip, it has a circular dependency relationship with python3-pip and relies on rewheel to pull that in whenever ensurepip needs it
- python3-pip bundles its dependencies, but upstream are good about updating those dependencies when needed

I think the main point of concern now is whether python3-pip is actually using the system certificate store to verify certificates - I know Donald tried to enable that upstream as the default behaviour a while ago, but ran into problems with reliably autodetecting where the system certificate store actually *was* (since Fedora, Debian and other distros vary in where and how they store it, but don't provide a standard way for applications to find out where that is)

Comment 12 Donald Stufft 2016-07-20 19:47:00 UTC
The SSL problem boiled down to:

* Some platforms provide a file that contains the platform certificates.
* Some platforms provide a directory that contains the platform certificates.
* Some platforms provide both a file and a directory and expect you to use both.
* Some platforms provide nothing.

With the above, and looking at the requests API it allows you to only pass one location, either a file or a directory, but not two locations. Thus we could theoretically cover the first two cases (the most common cases) but not the third case. However when attempting to do that we discovered (from memory):

* Debuntu uses a file, but has the SSL APIs that return the location of the file broken so they point to the wrong place and has an empty directory that exists.
* Fedora (Or CentOS? It was one of the yum distros) uses a directory, but also had a file that was returned that was empty.

This left us in a place where if we preferred the file we were broken on both Debuntu and on Fedora (Debuntu because they pointed to the wrong place, Fedora because it pointed to an empty file) and if we preferred the directory we were broken on Debuntu (because they use a file, assuming they fixed the path returned) but we worked on Fedora.

In all cases the APIs to *load* the default certificates (rather than return the default paths) did the right thing, but we can't just blindly do that because:

* requests doesn't offer an API to pass in an already existing SSLContext which we could have called that on.
* We need to detect if ``SSLContext.load_default_verify_locations()`` actually added any certificates or if the platform doesn't provide any. If the platform doesn't provide any then we needed to fall back t our bundled certificates. However OpenSSL doesn't provide a way to detect if any certificates have actually been loaded (and in the case of the directory, it doesn't actually load them until it tries to verify a certificate using that).

Comment 13 Jan Kurik 2016-07-26 04:06:32 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 25 development cycle.
Changing version to '25'.

Comment 14 Petr Viktorin (pviktori) 2017-10-31 15:27:02 UTC
We need to add Provides: bundled(...) here.

Comment 15 Fedora End Of Life 2017-11-16 18:57:03 UTC
This message is a reminder that Fedora 25 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 25. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '25'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 25 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 16 Charalampos Stratakis 2017-11-20 12:43:39 UTC
pip semi supports debundling [0].

As a first course of action, I'll add virtual provides for the bundles libraries and then will explore the possibility of debundling the libraries.

[0] https://github.com/pypa/pip/blob/master/src/pip/_vendor/README.rst#debundling

Comment 17 Charalampos Stratakis 2017-11-20 15:45:14 UTC
PR opened at: https://src.fedoraproject.org/rpms/python-pip/pull-request/1

Comment 18 Fedora Update System 2017-11-29 13:25:42 UTC
python-pip-9.0.1-13.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-8f72bb7e29

Comment 19 Fedora Update System 2017-11-29 13:29:53 UTC
python-pip-9.0.1-10.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-4278c9f049

Comment 20 Fedora Update System 2017-12-02 19:52:57 UTC
python-pip-9.0.1-13.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-8f72bb7e29

Comment 21 Fedora Update System 2017-12-02 22:38:26 UTC
python-pip-9.0.1-10.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-4278c9f049

Comment 22 Fedora Update System 2017-12-04 20:14:50 UTC
python-pip-9.0.1-10.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.

Comment 23 Fedora Update System 2017-12-10 05:08:01 UTC
python-pip-9.0.1-13.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.