Going through Rawhide openQA test failures, I noticed that the GNOME Software update test has failed every run since Fedora-Rawhide-20251106.n.0. The obvious change in that compose is dnf5: Package: dnf5-5.3.0.0-2.fc44 Old package: dnf5-5.2.17.0-2.fc44 gnome-software did not change. The failure is always the same. After we do our preparatory steps to ensure an update is available (downgrading a single package to a dummy build with a very low version, so the current official build of it will be available as an update), we run GNOME Software, go to the Updates page, click Refresh, wait for the refresh, then click the Download button. The Download button is immediately swapped to a "Cancel" button with a small progress bar about 30% full at the bottom, and then...it stays like that. The progress bar never moves. The operation never completes. We never see the "Restart & Update..." button appear. Weirdly, this is only affecting compose tests, not update tests. We run the same test on updates and that one is passing. I can't find a single instance of this failure on an update test. The obvious difference is that on compose tests, we have one test which runs an install from the compose's Workstation live image, runs through gnome-initial-setup, then uploads a disk image of the installed system. This test boots from that disk image and immediately runs the update test procedure. On update tests, the test boots from a disk image created by virt-install which is updated every two weeks, sets up a side repository containing the packages under test, runs a `dnf -y --refresh update`, reboots, *then* runs the update test procedure. I'll attach the journal from a failed test.
Created attachment 2117685 [details] journal from an affected run Here's the journal from an affected run, but there's not a lot in it. The test clicked 'Refresh' around 02:20:05, which looks like it produces a little flurry of activity ending at 02:20:07. Then the test clicked 'Download' at 02:20:24, but there appears to be nothing at all in the journal (or any other log files, I checked) at this time. I'll see if I can reproduce this manually and find anything out that way.
I reproduced this easily, without any side repo stuff. I just ran an install from a slightly older Workstation live image - Fedora-Workstation-Live-Rawhide-20251128.n.0.x86_64.iso - booted the installed system, ran Software, and tried to update. In fact, the first time I ran it, it got stuck at a "Refreshing data" page without me doing anything. After I cancelled that and ran it again, it behaved like openQA: I hit the refresh button, then I hit Download, and it got stuck. Tried again, same thing. I attached gdb to both the gnome-software and dnf5daemon-server processes and got backtraces of both, I'll attach them. While I was getting the dnf5daemon-server backtrace, gnome-software exited/crashed and there was an error in the journal: Dec 05 12:21:39 ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com dnf5daemon-server[4217]: Error sending D-Bus reply to org.rpm.dnf.v0.rpm.Rpm:list() call: [System.Error.ENOMSG] Failed to send D-Bus message (No message of desired type) not sure if that's significant.
Proposing as an F44 Beta blocker as this seems trivially reproducible, and violates Beta criterion "The installed system must be able appropriately to install, remove, and update software with the default tool for the relevant software type in all release-blocking desktops (e.g. default graphical package manager)."
Created attachment 2117688 [details] backtrace from dnf5daemon-server when gnome-software is stuck
Created attachment 2117691 [details] backtrace from gnome-software when it's stuck
Oh, hey, I just saw that if I leave it in this state for *ten minutes*, it eventually clears. I get an "Unable to download updates" error, with the details: "Failed to run transaction: offline rpm transaction test failed with code 6." There's nothing in the journal or dnf5.log. The available update list remains the same and the button goes back to being a Download button. If I click it, the same cycle happens.
Why do you think this caused by DNF5? dnf5-5.3.0.0-2.fc44 is in F44 repository since 2025-11-05. Few days ago I did dnf5-5.3.0.0-3.fc44 update <https://bodhi.fedoraproject.org/updates/FEDORA-2025-c2c0380ce6> and fedora-ci.koji-build.installability.functional reported failures on update like this: Transaction failed: Signature verification failed. OpenPGP check for package "libdnf5-plugin-rhsm-5.3.0.0-3.fc44.x86_64" (/var/cache/libdnf5/_dnf_local-71c913707df56d1b/packages/libdnf5-plugin-rhsm-5.3.0.0-3.fc44.x86_64.rpm) from repo "_dnf_local" has failed: The package is not signed. Your error message "Failed to run transaction: offline rpm transaction test failed with code 6." mentions code 6. 6 stands for ERROR_GPG_CHECK constant in libdnf5::rpm::Transaction::TransactionRunResult enum. I suspect it is the same issue. Probably triggered by <https://fedoraproject.org/wiki/Changes/Enforcing_signature_checking_by_default>.
Do you run updates on unsigned packages? Do you install them from a repository or from a file? dnf5-5.3.0.0-3.fc44 fixes disabling signature verification IF --no-gpgchecks option is passed. Can you try your test again with that build? Strangely first rpm implementing the Fedora change is 6.0.0-2, but that has not yet been tagged into F44.
> Why do you think this caused by DNF5? dnf5-5.3.0.0-2.fc44 is in F44 repository since 2025-11-05. From the first line of the original description: "I noticed that the GNOME Software update test has failed every run since Fedora-Rawhide-20251106.n.0." > Do you run updates on unsigned packages? Do you install them from a repository or from a file? The downgraded package we install to ensure an update is available is unsigned, but the package being upgraded *to* is signed.
I tried both with installing plain .rpm or through the .repo file from [1] (where I only changed to enable the repo) and it worked fine with `dnf5-5.3.0.0-7.fc44.x86_64` (+/- [2], but it's minor), no problem to update with the gnome-software using the dnf5deamon-server under the hood, neither with `dnf update`. I had patched gnome-software with changes for bug #2392057 , even that bug was not about signed packages. Either it has got fixed meanwhile or I did something "wrong", that I did not trigger the problem. [1] https://fedorapeople.org/groups/qa/openqa-repos/ [2] https://github.com/rpm-software-management/dnf5/issues/2595
I looked into the backtrace and the Thread 20 of the gnome-software is running an update, downloading packaged for offline update, currently waiting for the transaction to be finished. Thread 2 calls the `list` method of the `org.rpm.dnf.v0.rpm.Rpm` interface. Other threads seemed idle or boring. The daemon backtrace seems to be waiting for a confirmation of the key import, specifically for key `/etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-rawhide-x86_64` and `/etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-45-x86_64`. It does that in two threads. The gnome-software is supposed to react to the key import confirmation signal. I'm looking whether it does or not.
Okay, the prompt is somehow blocked in the gnome-software (my user is a sudoer, thus I should not be asked, but it can be unrelated). When I click the "Cancel" button the dialog to confirm import of the key shows up. Cancelling the dialog itself does not return gnome-software back to the normal. I reproduced it by removing the Fedora keys from the RPM by: rpmkeys -l then rpmkeys --delete HEXHEXHEX for each listed key. Your test case took a package signed either with the rawhide or with the f45 while those I tried here used f44 key. I'm not moving this to the gnome-software yet, give me some time to investigate what failed on the gnome-software side and whether there's anything needed on the dnf5 side or not, please.
The gnome-software receives the D-Bus signal about key import confirmation during the transaction run as expected. The gnome-software wants to figure out whether the key comes from an installed repository, thus it runs a `list` command on the same RPM (D-Bus) proxy, the same proxy as the transaction is running, and under the same session (in fact it's the Thread 2 in the gnome-software backtrace I mentioned in the comment #11). With dnf5-5.2.17.0-2.fc44.x86_64, the first build in koji before 5.3.0.0-2, the daemon can run the list with no problem, it returns the package information for the key file, then gnome-software sees it comes from the repository and tells the dnf5daemon-server that the key can be imported, without bothering a (sudo) user. With the newer dnf5 the D-Bus proxy is locked or something, it does not allow to run other operations (at least the list not), while waiting for the response for the key import confirmation, thus the call from the gnome-software to list the packages which provide the file is starving on the dnf5daemon-server side. Can you (dnf5) do anything about it? Should I (gnome-software) do anything about it? (Like open a new session and check the package there, instead of in the "busy" session. The downside can be limited session count in such case, as I recall you have the number very low.)
Thanks a lot for looking into this, Milan.
I haven't 100% confirmed this but I am convinced the deadlock is caused by https://github.com/rpm-software-management/dnf5/pull/2448. I would like to say the read-only `list` should be possible during a transaction waiting on key import confirmation but it would be best to discuss this with @mblaha
> I haven't 100% confirmed this but I am convinced the deadlock is caused by... Looks like that, I agree. That change makes sense, especially for the thread unsafety of those libraries mentioned in the pull request. If I read it correctly, it is possible to workaround "the problem" by opening a new session and run the list in it (the lock seems to be per session). I do not know the consequences and how it works under, but I guess when some of the libsolv/libdnf5 files are shared between sessions it can still cause trouble, like in a concurrent write of such file in both sessions? You probably do not want to read the libsolv/libdnf5 files while other thread writes to it, but I do not know the code.
As this is sort of important, I added a patch to the dnf5 plugin to use a temporary session when searching for the source of the RPM key which is to be imported. It fixes the starving on the dnf5daemon side. I think it is a good change, the locking for thread safety on the dnf5daemon side makes sense. The relevant builds are gnome-software-50~beta-3.fc45 gnome-software-50~beta-3.fc44 I keep this open in case you'd want to do anything on the dnf5daemon side, but feel free to move this to the gnome-software and close it.