Bug 2327289
| Summary: | `rpm` python module cannot be loaded from a WSGI script any more | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Kamil Dudka <kdudka> |
| Component: | rpm | Assignee: | Karolina Surma <ksurma> |
| Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 41 | CC: | encukou, igor.raits, kdudka, ksurma, mdomonko, mhroncok, packaging-team-maint, pmatilai, pviktori, tuju |
| Target Milestone: | --- | Keywords: | Regression |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| URL: | https://github.com/openscanhub/openscanhub/pull/307#issuecomment-2485638645 | ||
| Whiteboard: | |||
| Fixed In Version: | rpm-5.99.91-1.fc43 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2025-07-14 14:59:34 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
This seems to be a duplicate of bug #2018743. I am not sure why OpenScanHub was able to import all Python modules except rpm. On a second look, I think it is a bug that one cannot import the rpm python module without `WSGIApplicationGroup %{GLOBAL}`. The option is not new on Fedora 41 and its default value has not been changed: https://modwsgi.readthedocs.io/en/latest/configuration-directives/WSGIApplicationGroup.html
How could the rpm python module ever be loaded when the import is the very first statement of the WSGI script (and there is no other application on the server)?
It looks more like a missing/incorrect initialization issue (either in mod_wsgi or in the rpm module itself).
That check is new in 4.20, coming from here: https://github.com/rpm-software-management/rpm/commit/d76026101492d1dd9f001d071626146218636c87 Thanks for the hint! I think the cause is that the rpm module uses a global variable for the flag, which does not work well with Python subinterpreters. I think it should use PyInterpreterState_Get() to access global state specific to a python interpreter. Of course, if you use global variables for other Python objects (as the above commit message suggests), moving the flag only is not a proper solution. The flag is just a messenger, as the commit message says it's supposed to protect pointers to Python type objects. I'm not sure it ever was truly safe to reload the rpm module, but AIUI this change https://github.com/rpm-software-management/rpm/commit/2e74eec2444975ace7258b1affccbcaa31af3b85 (required for the Python stable ABI) is what this new guard is for. CC'ing @encukou whose patch series this was for comments. I have no clue about the WSGI stuff, but it does sound fishy that this triggers in the first place. > The flag is just a messenger Indeed, `rpm` was never safe to use in multiple interpreters, now it raises an error. Setting `WSGIApplicationGroup %{GLOBAL}` is a workaround for `mod_wsgi`. A proper solution is to [isolate the module state], but at the time, it would have been too invasive to do with RPM's backwards-compatibility requirements. Perhaps it's time to have another go. [isolate module state]: https://docs.python.org/3/howto/isolating-extensions.html#isolating-extensions-howto Karolina (CCd), you might be interested in this as a practical way to learn about Python's C API, and perhaps you might be able to get some work time allocated for rpm. (I can mentor & review.) PyInterpreterState_Get() is part of the Python stable ABI. > I have no clue about the WSGI stuff, but it does sound fishy that this triggers in the first place. The second call to PyInit__rpm() comes from here: https://github.com/python/cpython/blob/60403a5409ff2c3f3b07dd2ca91a7a3e096839c7/Python/import.c#L2102 It already happens on the very first request to httpd if you run the minimal example from comment #0. > PyInterpreterState_Get() is part of the Python stable ABI. Yes, but it'll give you an opaque pointer to the interpreter state. It's impossible to implement things on top of that, but I wouldn't recommend it. `PyModuleDef` & `PyModule_GetState` are also part of the stable ABI, and provide module-local storage and a way to free memory/resources at shutdown. Here's a PoC, which I haven't touched in a while: https://github.com/encukou/rpm/commits/python-abi3/ It requires Python 3.10+. I currently don't recall why exactly; AFAIK there was some helpful function that's 3.10+ only. In any case, it was not a change in mod_wsgi what triggered these troubles. The second call to PyInit__rpm() with the default configuration of mod_wsgi was introduced by a change in the Python interpreter itself: https://github.com/python/cpython/commit/b2cd54a4fb2ecdb7b1d30bda8af3314d3a32031e > Karolina (CCd), you might be interested in this as a practical way to learn about Python's C API, and perhaps you might be able to get some work time allocated for rpm. (I can mentor & review.)
I am interested and if that's something rpm developers would like to see included in the project, I'll be able to start working on it from the 2nd half of January 2025.
(In reply to Petr Viktorin from comment #9) > Here's a PoC, which I haven't touched in a while: > https://github.com/encukou/rpm/commits/python-abi3/ > It requires Python 3.10+. I currently don't recall why exactly; AFAIK there was some helpful function that's 3.10+ only. From https://github.com/rpm-software-management/rpm/issues/2345#issuecomment-1803753174: For reference, the Python version where this limitation can be lifted relatively easily is 3.10 (which adds API for type/module association, like PyType_GetModule). Later versions make it easier still. So I guess that's why. Python 3.10 is a bit so-and-so at this point. RHEL 9 only has Python 3.9 but then RHEL-latest and what's in it will change next year. We don't have any concrete technical dependency on RHEL-latest, it's just a useful reference point when considering "is X too new?", so I think this is acceptable at this point, it's not like rpm v6 will ever go anywhere near RHEL 9 anyhow. > I am interested and if that's something rpm developers would like to see included in the project, I'll be able to start working on it from the 2nd half of January 2025. That'd be great! This is being worked on. No pull requests to share yet. rpm 6.0 beta1 should have this |
# rpm -q httpd python3-{mod_wsgi,rpm} httpd-2.4.62-2.fc41.x86_64 python3-mod_wsgi-5.0.0-6.fc41.x86_64 python3-rpm-4.20.0-1.fc41.x86_64 Reproducible: Always Steps to Reproduce: dnf install -y httpd python3-{mod_wsgi,rpm} mkdir /etc/httpd/wsgi echo import rpm > /etc/httpd/wsgi/rpm-import-test.wsgi cat > /etc/httpd/conf.d/rpm-import-test.conf << EOF <VirtualHost *:80> WSGIDaemonProcess rpm WSGIProcessGroup rpm WSGIScriptAlias /rpm-import-test /etc/httpd/wsgi/rpm-import-test.wsgi process-group=rpm <Directory "/etc/httpd/wsgi"> Require all granted </Directory> </VirtualHost> EOF systemctl restart httpd curl http://localhost/rpm-import-test tail /var/log/httpd/error_log Actual Results: [Tue Nov 19 08:21:38.566984 2024] [mpm_event:notice] [pid 2068:tid 2068] AH00489: Apache/2.4.62 (Fedora Linux) mod_wsgi/5.0.0 Python/3.13 configured -- resuming normal operations [Tue Nov 19 08:21:38.567009 2024] [core:notice] [pid 2068:tid 2068] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND' [Tue Nov 19 08:21:38.715796 2024] [wsgi:error] [pid 2074:tid 2175] [remote ::1:52896] mod_wsgi (pid=2074): Failed to exec Python script file '/etc/httpd/wsgi/rpm-import-test.wsgi'. [Tue Nov 19 08:21:38.715843 2024] [wsgi:error] [pid 2074:tid 2175] [remote ::1:52896] mod_wsgi (pid=2074): Exception occurred processing WSGI script '/etc/httpd/wsgi/rpm-import-test.wsgi'. [Tue Nov 19 08:21:38.723327 2024] [wsgi:error] [pid 2074:tid 2175] [remote ::1:52896] Traceback (most recent call last): [Tue Nov 19 08:21:38.726316 2024] [wsgi:error] [pid 2074:tid 2175] [remote ::1:52896] File "/etc/httpd/wsgi/rpm-import-test.wsgi", line 1, in <module> [Tue Nov 19 08:21:38.726328 2024] [wsgi:error] [pid 2074:tid 2175] [remote ::1:52896] import rpm [Tue Nov 19 08:21:38.726334 2024] [wsgi:error] [pid 2074:tid 2175] [remote ::1:52896] File "/usr/lib64/python3.13/site-packages/rpm/__init__.py", line 38, in <module> [Tue Nov 19 08:21:38.726336 2024] [wsgi:error] [pid 2074:tid 2175] [remote ::1:52896] from rpm._rpm import * [Tue Nov 19 08:21:38.726347 2024] [wsgi:error] [pid 2074:tid 2175] [remote ::1:52896] ImportError: cannot load rpm module more than once per process Expected Results: Successful import of the rpm python module. In the context of my minimal example, it means: [Tue Nov 19 08:22:29.150791 2024] [wsgi:error] [pid 1994:tid 2097] [remote ::1:56224] mod_wsgi (pid=1994): Target WSGI script '/etc/httpd/wsgi/rpm-import-test.wsgi' does not contain WSGI application 'application'. This works fine on Fedora 40 with the following packages: httpd-2.4.62-2.fc40.x86_64 python3-mod_wsgi-5.0.0-1.fc40.x86_64 python3-rpm-4.19.1.1-1.fc40.x86_64 OpenScanHub is broken on Fedora 41+ because of this bug: https://github.com/openscanhub/openscanhub/pull/307#issuecomment-2485638645