A freshly deployed ceph-mgr on F39 fails with: [WRN] MGR_MODULE_DEPENDENCY: 14 mgr modules have failed dependencies Module 'balancer' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'crash' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'devicehealth' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'iostat' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'orchestrator' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'pg_autoscaler' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'progress' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'prometheus' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'rbd_support' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'restful' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'stats' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'status' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'telemetry' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'volumes' has failed dependency: PyO3 modules may only be initialized once per interpreter process journal says: Dec 23 13:18:42 flamingo ceph-mgr[12361]: 2023-12-23T13:18:42.444+0900 ffffb63fa980 -1 mgr[py] Module not found: 'mgr_module' Dec 23 13:18:42 flamingo ceph-mgr[12361]: 2023-12-23T13:18:42.444+0900 ffffb63fa980 -1 mgr[py] Traceback (most recent call last): Dec 23 13:18:42 flamingo ceph-mgr[12361]: File "/usr/share/ceph/mgr/mgr_module.py", line 28, in <module> Dec 23 13:18:42 flamingo ceph-mgr[12361]: from mgr_util import profile_method Dec 23 13:18:42 flamingo ceph-mgr[12361]: File "/usr/share/ceph/mgr/mgr_util.py", line 6, in <module> Dec 23 13:18:42 flamingo ceph-mgr[12361]: import bcrypt Dec 23 13:18:42 flamingo ceph-mgr[12361]: File "/lib64/python3.12/site-packages/bcrypt/__init__.py", line 33, in <module> Dec 23 13:18:42 flamingo ceph-mgr[12361]: from . import _bcrypt # noqa: I100 Dec 23 13:18:42 flamingo ceph-mgr[12361]: ^^^^^^^^^^^^^^^^^^^^^ Dec 23 13:18:42 flamingo ceph-mgr[12361]: ImportError: PyO3 modules may only be initialized once per interpreter process Dec 23 13:18:42 flamingo ceph-mgr[12361]: 2023-12-23T13:18:42.444+0900 ffffb63fa980 -1 mgr[py] Class not found in module 'rook' Dec 23 13:18:42 flamingo ceph-mgr[12361]: 2023-12-23T13:18:42.444+0900 ffffb63fa980 -1 mgr[py] Error loading module 'rook': (22) Invalid argument Dec 23 13:18:42 flamingo ceph-mgr[12361]: 2023-12-23T13:18:42.444+0900 ffffb63fa980 -1 log_channel(cluster) log [ERR] : Failed to load ceph-mgr modules: dashboard, diskprediction_local, k8sevents, alerts, balancer, crash, devicehealth, influx, insights, iostat, localpool, mds_autoscaler, mirroring, nfs, orchestrator, osd_perf_query, osd_support, pg_autoscaler, progress, prometheus, rbd_support, restful, rgw, selftest, snap_schedule, stats, status, telegraf, telemetry, test_orchestrator, volumes, zabbix, rook This is a self-built mock build of ceph 18.2.1-3.fc39 with LTO disabled (workaround for bug 2241339), but I think that change is unlikely to be causing this issue (without LTO disabled it's even more broken). Reproducible: Always Steps to Reproduce: 1. Install ceph-mgr 2. Deploy a fresh ceph-mgr into an existing cluster Actual Results: All Python ceph-mgr modules fail to start Expected Results: ceph-mgr modules start successfully.
Relevant: https://github.com/pyca/cryptography/issues/9016
Also: https://github.com/ceph/ceph/pull/54710
I tested a quick monkeypatch of that pull request and I can confirm it fixes the problem for all but the "restful" ceph-mgr module. We probably want to backport that into the ceph package.
Same here with ceph-mgr inside Fedora CoreOS.
Same issue after upgrade F38->F39 x86_64 with last updates in my infra LABS VM. :(.
This is now more broken somehow, even with the `dashboard` fix all the modules are still broken due to the `bcrypt` dependency (not sure why it used to work) :/
The quick fix is to downgrade to python3-bcrypt from the f38 branch (3.2.2-5, I made a F39 build as a test). Fedora might want to consider this a bug in that package, since it represents a regression in usage (no longer works in subinterpreters) :/. Scratch build here, if anyone wants to grab RPMs to try: https://koji.fedoraproject.org/koji/taskinfo?taskID=117671599
fixed in ceph-18.2.3-1
Nope. # ceph health detail HEALTH_WARN 13 mgr modules have failed dependencies; noout flag(s) set [WRN] MGR_MODULE_DEPENDENCY: 13 mgr modules have failed dependencies Module 'balancer' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'crash' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'devicehealth' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'iostat' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'orchestrator' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'pg_autoscaler' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'progress' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'prometheus' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'rbd_support' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'stats' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'status' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'telemetry' has failed dependency: PyO3 modules may only be initialized once per interpreter process Module 'volumes' has failed dependency: PyO3 modules may only be initialized once per interpreter process How was this fixed in ceph-18.2.3-1, exactly? As far as I can tell there is no upstream solution for this problem at all yet.
As far as I can tell the only reasonable solution backportable to Fedora today is to: - Get `rust-pyo3` to backport this patch: https://git.st8l.com/luxolus/pyo3/commit/338c71d0ad10f7ae38b7b44e576d49b91ed20d99 (which adds a feature build) and, - Get `python3-bcrypt` to switch to a version with that feature enabled (as in https://github.com/pyca/bcrypt/issues/694#issuecomment-2125562077 minus the git stuff). That would then fix Ceph, and I believe those changes are safe to make (in the specific case of python3-bcrypt).
Since this affects every version of Fedora right now, I've set this to "rawhide".
(In reply to Hector Martin from comment #9) > > How was this fixed in ceph-18.2.3-1, exactly? As far as I can tell there is > no upstream solution for this problem at all yet. I suggest you use https://tracker.ceph.com/issues/63529 and the other trackers and PRs it references to resolve the issue. That issue was closed, as fixed, with a backport to 18.2.3, thus I closed this BZ against 18.2.3. If you want to debate the validity of whatever that fix was, I suggest you do it in the Ceph tracker.
If you click on the link you just posted, you will see that I did, indeed, debate the validity of the "fix" that the bug was closed for, a month ago. I'm guessing you didn't look. Further up in the comments of that upstream bug you will also see that the "fix" is specifically referred to as a "partial fix", which turned out to not really fix anything because it only removed one of two modules with a PyO3 dependency; while it helped in the past, that was only due to luck and incomplete checks in PyO3. This bug, as I reported here, still exists in all current Fedora versions. Therefore, it is evidently improper to close it as fixed, regardless of what upstream did or didn't do. I'm not here to debate semantics and upstream's usage of their own bug tracker, I'm here to get this bug fixed in Fedora, which it isn't, and hasn't been for over half a year. At the time you released ceph-18.2.3-1 in Fedora, the "fix" was already ineffective with current versions of pyo3/python3-bcrypt. Therefore, even though in some deployments and configurations the change that upstream made may have helped (for random reasons), in Fedora with the versions of other packages that existed concurrently, it didn't. I know this because I had already backported and monkeypatched my Ceph on F39 hosts at one point with the "fix", and that together with disabling some mgr modules worked for a while, but then it stopped working, and this all happened before you released that version with the "fix". I already mentioned this situation in comments #3 and #6. By the time I made comment #6, the fix in 18.2.3-1 was useless. It is honestly a bit ridiculous that Ceph has been completely broken in Fedora, and at this time all supported versions, for this long, and nobody has done anything about it. It's bad enough that users have to deal with this breakage (which, to be clear, makes Fedora completely broken for all Ceph cluster deployments, as ceph-mgr is a required component) and apparently neither Ceph upstream nor the Fedora side seem to care enough to actually push towards getting this *major breakage issue in current OS versions* fixed sooner rather than later. On top of that, closing the bug as "fixed" when it isn't feels rather insulting. Timeline: 2023-12-23 I file the bug (Ceph is completely broken in Fedora at this point) 2023-12-23 I immediately locate the upstream "fix" which **at this time** would in fact lead to a workable situation in Fedora, and suggest a backport. This bug is ignored by the maintainer for 5 months. 2024-05-14 I notice that, after some updates, the fix no longer works and now we really need bcrypt fixed. 2024-05-14 I again identify bcrypt as the only remaining culprit, and suggest a downgrade as a workaround. This bug continues to be ignored by the maintainer for 2 more months. 2024-07-02 ceph-18.2.3-1 is released in Fedora with the "fix", fixing nothing, and this bug is closed 2024-07-17 I notice the closed bug, and ask why it was closed, since Ceph is still broken 2024-07-17 I identify a series of steps that would actually get this bug fixed in Fedora. *Three times* I have already reported on the status quo of this bug and, *on the same day*, proposed a solution *that would actually work* at the time it was posted. You ignored this bug this whole time, only to close it when it wasn't fixed at all. So, what are we doing now? Is this going to get ignored for months again until maybe upstream pyo3 gets this fixed at some undefined point in the future? Or are you actually going to pay attention to the fixes I'm proposing *that actually work* and get this package, which again, is completely broken right now and has been for half a year, working in Fedora again?
Please don't cast negative aspirations on people, either of you. That said, this is a legitimate bug and the upstream fix was not sufficient, so it makes sense to reopen and get things fixed.
I am the package maintainer. I do not work on that part of Ceph. If you have a bone to pick with something in Ceph take it up with the person responsible for that part. You can also submit a PR here with your proposed fix. https://src.fedoraproject.org/rpms/ceph If it seems reasonable, I'll probably merge it. Or you can continue to rant.
As the package maintainer, I expect you to take responsibility for distro integration issues, which this is one, because the breakage happens with the specific version combinations of packages available in Fedora. I also expect you to take responsibility for the software you package outright **not working**, at all, in this distro. It's your job to file things upstream and figure out a solution for the package being completely broken (or at least take steps towards finding a solution), after users report a bug here. Instead you did absolutely nothing other than spuriously close this bug, so far. I still don't understand how we got here, after nearly 8 months of the package being completely broken for normal Ceph deployments **with potential fixes identified and ignored** - I understand not knowing what to do and not having time to track down solutions, but I've been literally telling you hints towards potential solutions all along. Do you actually test/use Ceph in Fedora at all? Or do you just package upstream releases and leave all the work of testing them to end users, and consider it acceptable for Ceph to just not work as long as upstream doesn't have a fix/solution? Seriously, Ceph has been completely broken in Fedora since F39 was released (in several ways, since I also ran into and tracked down bug 2241339). If you don't take responsibility for that, then who does, exactly? Either way, there is nothing to submit to rpms/ceph at this point, because I already identified that the most reasonable fix right now involves changing pyo3 and python3-bcrypt instead, and filed those bugs for you. I will now spend a couple months dealing with IRL issues and not touching my Ceph servers, and if when I come back from that nothing has happened still, I'll file PRs against those packages and get Neal to merge them. But either way, you don't get to close this bug until the problem is actually fixed one way or another, because this affects Ceph specifically and this bug depends on the others now. I'm now wondering if Fedora was a good choice for my Ceph cluster. I thought it would be, given Red Hat's involvement with Ceph and therefore presumably some ties to Fedora, but finding the whole thing totally broken for two straight Fedora releases is a huge disappointment.
PRs submitted to python-bcrypt to fix this: https://src.fedoraproject.org/rpms/python-bcrypt/pull-request/11 https://src.fedoraproject.org/rpms/python-bcrypt/pull-request/12 https://src.fedoraproject.org/rpms/python-bcrypt/pull-request/13
FEDORA-2024-897f32b326 (python-bcrypt-4.0.1-9.fc40) has been submitted as an update to Fedora 40. https://bodhi.fedoraproject.org/updates/FEDORA-2024-897f32b326
FEDORA-2024-b541e61b90 (python-bcrypt-4.0.1-9.fc39) has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2024-b541e61b90
FEDORA-2024-b541e61b90 has been pushed to the Fedora 39 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-b541e61b90` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-b541e61b90 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2024-897f32b326 has been pushed to the Fedora 40 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-897f32b326` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-897f32b326 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2024-b541e61b90 (python-bcrypt-4.0.1-9.fc39) has been pushed to the Fedora 39 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2024-897f32b326 (python-bcrypt-4.0.1-9.fc40) has been pushed to the Fedora 40 stable repository. If problem still persists, please make note of it in this bug report.
Works for me with ceph inside CoreOS.