Bug 2327289 - `rpm` python module cannot be loaded from a WSGI script any more
Summary: `rpm` python module cannot be loaded from a WSGI script any more
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: rpm
Version: 41
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Karolina Surma
QA Contact: Fedora Extras Quality Assurance
URL: https://github.com/openscanhub/opensc...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-11-19 13:26 UTC by Kamil Dudka
Modified: 2025-07-14 14:59 UTC (History)
10 users (show)

Fixed In Version: rpm-5.99.91-1.fc43
Clone Of:
Environment:
Last Closed: 2025-07-14 14:59:34 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Fedora Package Sources rpm pull-request 67 0 None None None 2025-07-08 17:51:11 UTC

Description Kamil Dudka 2024-11-19 13:26:32 UTC
# rpm -q httpd python3-{mod_wsgi,rpm}
httpd-2.4.62-2.fc41.x86_64
python3-mod_wsgi-5.0.0-6.fc41.x86_64
python3-rpm-4.20.0-1.fc41.x86_64

Reproducible: Always

Steps to Reproduce:
dnf install -y httpd python3-{mod_wsgi,rpm}
mkdir /etc/httpd/wsgi
echo import rpm > /etc/httpd/wsgi/rpm-import-test.wsgi
cat > /etc/httpd/conf.d/rpm-import-test.conf << EOF
<VirtualHost *:80>
    WSGIDaemonProcess rpm
    WSGIProcessGroup rpm
    WSGIScriptAlias /rpm-import-test /etc/httpd/wsgi/rpm-import-test.wsgi process-group=rpm
    <Directory "/etc/httpd/wsgi">
        Require all granted
    </Directory>
</VirtualHost>
EOF
systemctl restart httpd
curl http://localhost/rpm-import-test
tail /var/log/httpd/error_log

Actual Results:  
[Tue Nov 19 08:21:38.566984 2024] [mpm_event:notice] [pid 2068:tid 2068] AH00489: Apache/2.4.62 (Fedora Linux) mod_wsgi/5.0.0 Python/3.13 configured -- resuming normal operations
[Tue Nov 19 08:21:38.567009 2024] [core:notice] [pid 2068:tid 2068] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[Tue Nov 19 08:21:38.715796 2024] [wsgi:error] [pid 2074:tid 2175] [remote ::1:52896] mod_wsgi (pid=2074): Failed to exec Python script file '/etc/httpd/wsgi/rpm-import-test.wsgi'.
[Tue Nov 19 08:21:38.715843 2024] [wsgi:error] [pid 2074:tid 2175] [remote ::1:52896] mod_wsgi (pid=2074): Exception occurred processing WSGI script '/etc/httpd/wsgi/rpm-import-test.wsgi'.
[Tue Nov 19 08:21:38.723327 2024] [wsgi:error] [pid 2074:tid 2175] [remote ::1:52896] Traceback (most recent call last):
[Tue Nov 19 08:21:38.726316 2024] [wsgi:error] [pid 2074:tid 2175] [remote ::1:52896]   File "/etc/httpd/wsgi/rpm-import-test.wsgi", line 1, in <module>
[Tue Nov 19 08:21:38.726328 2024] [wsgi:error] [pid 2074:tid 2175] [remote ::1:52896]     import rpm
[Tue Nov 19 08:21:38.726334 2024] [wsgi:error] [pid 2074:tid 2175] [remote ::1:52896]   File "/usr/lib64/python3.13/site-packages/rpm/__init__.py", line 38, in <module>
[Tue Nov 19 08:21:38.726336 2024] [wsgi:error] [pid 2074:tid 2175] [remote ::1:52896]     from rpm._rpm import *
[Tue Nov 19 08:21:38.726347 2024] [wsgi:error] [pid 2074:tid 2175] [remote ::1:52896] ImportError: cannot load rpm module more than once per process

Expected Results:  
Successful import of the rpm python module.  In the context of my minimal example, it means:

[Tue Nov 19 08:22:29.150791 2024] [wsgi:error] [pid 1994:tid 2097] [remote ::1:56224] mod_wsgi (pid=1994): Target WSGI script '/etc/httpd/wsgi/rpm-import-test.wsgi' does not contain WSGI application 'application'.

This works fine on Fedora 40 with the following packages:
httpd-2.4.62-2.fc40.x86_64
python3-mod_wsgi-5.0.0-1.fc40.x86_64
python3-rpm-4.19.1.1-1.fc40.x86_64

OpenScanHub is broken on Fedora 41+ because of this bug:
https://github.com/openscanhub/openscanhub/pull/307#issuecomment-2485638645

Comment 1 Kamil Dudka 2024-11-19 14:26:15 UTC
This seems to be a duplicate of bug #2018743.  I am not sure why OpenScanHub was able to import all Python modules except rpm.

Comment 2 Kamil Dudka 2024-11-19 14:45:27 UTC
On a second look, I think it is a bug that one cannot import the rpm python module without `WSGIApplicationGroup %{GLOBAL}`.  The option is not new on Fedora 41 and its default value has not been changed: https://modwsgi.readthedocs.io/en/latest/configuration-directives/WSGIApplicationGroup.html

How could the rpm python module ever be loaded when the import is the very first statement of the WSGI script (and there is no other application on the server)?

It looks more like a missing/incorrect initialization issue (either in mod_wsgi or in the rpm module itself).

Comment 3 Panu Matilainen 2024-11-19 15:11:54 UTC
That check is new in 4.20, coming from here: https://github.com/rpm-software-management/rpm/commit/d76026101492d1dd9f001d071626146218636c87

Comment 4 Kamil Dudka 2024-11-20 07:18:42 UTC
Thanks for the hint!  I think the cause is that the rpm module uses a global variable for the flag, which does not work well with Python subinterpreters.  I think it should use PyInterpreterState_Get() to access global state specific to a python interpreter.

Comment 5 Kamil Dudka 2024-11-20 07:23:34 UTC
Of course, if you use global variables for other Python objects (as the above commit message suggests), moving the flag only is not a proper solution.

Comment 6 Panu Matilainen 2024-11-20 08:18:53 UTC
The flag is just a messenger, as the commit message says it's supposed to protect pointers to Python type objects. I'm not sure it ever was truly safe to reload the rpm module, but AIUI this change https://github.com/rpm-software-management/rpm/commit/2e74eec2444975ace7258b1affccbcaa31af3b85 (required for the Python stable ABI) is what this new guard is for. CC'ing @encukou whose patch series this was for comments.

I have no clue about the WSGI stuff, but it does sound fishy that this triggers in the first place.

Comment 7 Petr Viktorin 2024-11-20 09:02:38 UTC
> The flag is just a messenger

Indeed, `rpm` was never safe to use in multiple interpreters, now it raises an error. Setting `WSGIApplicationGroup %{GLOBAL}` is a workaround for `mod_wsgi`.
A proper solution is to [isolate the module state], but at the time, it would have been too invasive to do with RPM's backwards-compatibility requirements. Perhaps it's time to have another go. 

[isolate module state]: https://docs.python.org/3/howto/isolating-extensions.html#isolating-extensions-howto

Karolina (CCd), you might be interested in this as a practical way to learn about Python's C API, and perhaps you might be able to get some work time allocated for rpm. (I can mentor & review.)

Comment 8 Kamil Dudka 2024-11-20 09:08:36 UTC
PyInterpreterState_Get() is part of the Python stable ABI.

> I have no clue about the WSGI stuff, but it does sound fishy that this triggers in the first place.

The second call to PyInit__rpm() comes from here: https://github.com/python/cpython/blob/60403a5409ff2c3f3b07dd2ca91a7a3e096839c7/Python/import.c#L2102

It already happens on the very first request to httpd if you run the minimal example from comment #0.

Comment 9 Petr Viktorin 2024-11-20 10:01:53 UTC
> PyInterpreterState_Get() is part of the Python stable ABI.

Yes, but it'll give you an opaque pointer to the interpreter state. It's impossible to implement things on top of that, but I wouldn't recommend it.

`PyModuleDef` & `PyModule_GetState` are also part of the stable ABI, and provide module-local storage and a way to free memory/resources at shutdown.

Here's a PoC, which I haven't touched in a while: https://github.com/encukou/rpm/commits/python-abi3/
It requires Python 3.10+. I currently don't recall why exactly; AFAIK there was some helpful function that's 3.10+ only.

Comment 10 Kamil Dudka 2024-11-20 11:17:06 UTC
In any case, it was not a change in mod_wsgi what triggered these troubles.  The second call to PyInit__rpm() with the default configuration of mod_wsgi was introduced by a change in the Python interpreter itself: https://github.com/python/cpython/commit/b2cd54a4fb2ecdb7b1d30bda8af3314d3a32031e

Comment 11 Karolina Surma 2024-12-13 12:36:33 UTC
> Karolina (CCd), you might be interested in this as a practical way to learn about Python's C API, and perhaps you might be able to get some work time allocated for rpm. (I can mentor & review.)

I am interested and if that's something rpm developers would like to see included in the project, I'll be able to start working on it from the 2nd half of January 2025.

Comment 12 Panu Matilainen 2024-12-16 08:37:03 UTC
(In reply to Petr Viktorin from comment #9)
> Here's a PoC, which I haven't touched in a while:
> https://github.com/encukou/rpm/commits/python-abi3/
> It requires Python 3.10+. I currently don't recall why exactly; AFAIK there was some helpful function that's 3.10+ only.

From https://github.com/rpm-software-management/rpm/issues/2345#issuecomment-1803753174:
For reference, the Python version where this limitation can be lifted relatively easily is 3.10 (which adds API for type/module association, like PyType_GetModule). Later versions make it easier still.

So I guess that's why.

Python 3.10 is a bit so-and-so at this point. RHEL 9 only has Python 3.9 but then RHEL-latest and what's in it will change next year. We don't have any concrete technical dependency on RHEL-latest, it's just a useful reference point when considering "is X too new?", so I think this is acceptable at this point, it's not like rpm v6 will ever go anywhere near RHEL 9 anyhow. 

> I am interested and if that's something rpm developers would like to see included in the project, I'll be able to start working on it from the 2nd half of January 2025.

That'd be great!

Comment 13 Miro Hrončok 2025-02-26 13:09:50 UTC
This is being worked on. No pull requests to share yet.

Comment 14 Karolina Surma 2025-06-13 09:28:35 UTC
PR: https://github.com/rpm-software-management/rpm/pull/3808

Comment 15 Miro Hrončok 2025-07-08 17:51:11 UTC
rpm 6.0 beta1 should have this


Note You need to log in before you can comment on or make changes to this bug.