1995817 – GNOME Software updates fail with "Prepared update not found" error

Bug 1995817 - GNOME Software updates fail with "Prepared update not found" error

Summary: GNOME Software updates fail with "Prepared update not found" error

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	gnome-software
Sub Component:
Version:	35
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Milan Crha
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:	AcceptedBlocker openqa
Depends On:
Blocks:	F35BetaBlocker
TreeView+	depends on / blocked

Reported:	2021-08-19 19:57 UTC by Adam Williamson
Modified:	2021-09-03 18:10 UTC (History)
CC List:	8 users (show)
Fixed In Version:	gnome-software-41~beta-3.fc35
Clone Of:
Environment:
Last Closed:	2021-09-03 18:10:07 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
tarball of /var/log from a failed test run (1.59 MB, application/gzip) 2021-08-19 19:57 UTC, Adam Williamson	no flags	Details
tarball of /var/log from a passed test run (2.41 MB, application/octet-stream) 2021-08-31 20:40 UTC, Adam Williamson	no flags	Details
View All

Description Adam Williamson 2021-08-19 19:57:35 UTC

Created attachment 1815800 [details]
tarball of /var/log from a failed test run

There is an openQA test which works by downgrading a package (python3-kickstart) to a dummy low-versioned build to ensure at least one update is available, then runs GNOME Software and does an update.

On the current Fedora 35 compose, this test is failing. The GNOME Software update workflow works - it shows updates as available and then lets you hit "Restart & Update" - but after the test clicks that button, under the 'Restart' dialog, we see a PackageKit error:

Unable to install updates:
GDBus.Error:org.gtk.GDBus.UnmappedGError.Quark._pk_2dengine_2derror-_2dquark.Code1: Prepared update not found: /var/lib/PackageKit/prepared-update

https://openqa.fedoraproject.org/tests/951738#step/desktop_update_graphical/26

after the system reboots, python3-kickstart is still at the dummy low version - it was not updated to the current 'real' build, as is expected to happen if the update process worked properly.

I'll attach a tarball of /var/log from the test to this bug.

Proposing as a Beta blocker per Beta criterion "The installed system must be able appropriately to install, remove, and update software with the default tool for the relevant software type in all release-blocking desktops (e.g. default graphical package manager). This includes downloading of packages to be installed/updated." - https://fedoraproject.org/wiki/Fedora_35_Beta_Release_Criteria#Installing.2C_removing_and_updating_software

Comment 1 Adam Williamson 2021-08-24 21:54:31 UTC

+3 in https://pagure.io/fedora-qa/blocker-review/issue/397 , marking accepted.

Comment 2 Adam Williamson 2021-08-26 21:45:58 UTC

Weirdly, this bug has become more intermittent over the last few days. Till Tuesday it failed on just about every run. Since Tuesday it's been much more patchy; you can see the history at https://openqa.fedoraproject.org/tests/958345#next_previous (expand to 50 entries for the best view). It still happens occasionally, but only 2 times in the last 12 tries, and both of those passed on the automatic retry. I've no idea what's going on there.

Comment 3 Milan Crha 2021-08-30 09:59:01 UTC

Could it be sort of timing issue? Does the `/var/lib/PackageKit/prepared-update` exist some time later?

I do not know how to reconstruct `/var/log/messages` from the packed `/var/log/` directory, I'm sorry. Maybe the update prepare failed, but this was not properly communicated to the caller, in this case to the gnome-software (of the gnome-software incorrectly interpreted it).

I think you may get a similar outcome when you run:

   $ pkcon update --only-download

on the affected machine/test, if it's possible to be done.

Comment 4 Adam Williamson 2021-08-30 15:29:28 UTC

"I do not know how to reconstruct `/var/log/messages` from the packed `/var/log/` directory, I'm sorry"

You can do `journalctl -f var/log/journal/<hit tab here>/system.journal` to see the system log.

IIRC I recreated this manually when it was first happening and the directory did exist after the failure, so yes, could be some sort of timing issue.

"I think you may get a similar outcome when you run:"

do you mean trying this over and over might be a way to recreate the failure on demand?

Comment 5 Adam Williamson 2021-08-30 15:29:57 UTC

sorry, that's `journalctl --file`, not `journalctl -f`. -f is something else.

Comment 6 Milan Crha 2021-08-31 11:00:28 UTC

Thanks for the journalctl command. I do not see anything obvious from:

   $ journalctl --file /tmp/var/log/journal/420a13d22bca48079192999f4f4ec245/system.journal | grep -i packagekit

The user journal shows the "Prepared update not found: /var/lib/PackageKit/prepared-update" error only.

What the user journal shows it probably related, namely:

> Aug 19 18:30:20 fedora gnome-software[1800]: not handling error no-security for action refresh: Failed to obtain authentication.

a minute before the error. There are also two "Error marking directory as changed: Permission denied", but, if I'm not mistaken, it's from the Flatpak.

Do you have logs for a successful test, please? I'd like to check whether it contains the same errors.

Comment 7 Milan Crha 2021-08-31 14:24:00 UTC

There's truly something odd here. I've just reproduced something similar, maybe not exactly the same, on Fedora 34.
a) click to install an RPM application (say Gnote)
b) this requires the root password and I enter it, thus the application is installed successfully
c) few seconds later I click to uninstall the application, which fails with
   an error: "you do not have permission to remove software" in the Software

This state stands as long as polkit (or what) remembers the entered credentials (indicated by a "key" icon in the status bar, with a context menu "Drop all elevated privileges"). It doesn't seem to be enough to have these (auto-)forgotten, I'm forced to restart the gnome-software process to be able to uninstall the app, when I'm asked for the credentials again. It (I do not know which 'it') lets me install the application with the old privileges, and also install it again, this time.

Comment 8 Adam Williamson 2021-08-31 19:52:01 UTC

we're not set to upload logs when this test passes, but I'll hack that in on staging, should be able to get them in a few minutes (I'll have to run the test till it passes :>)

Comment 9 Adam Williamson 2021-08-31 20:40:46 UTC

Created attachment 1819534 [details]
tarball of /var/log from a passed test run

Comment 10 Milan Crha 2021-09-01 08:25:36 UTC

Thanks. I see there failed same things in both instances, but there are also differences. While the good run says:

> Aug 31 22:21:53 fedora gnome-software[1754]: not GsPlugin error pk-control-error-quark:0: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name is not activatable

the bad run says:

> Aug 19 15:40:56 fedora gnome-software[1887]: not GsPlugin error pk-control-error-quark:0: GDBus.Error:org.freedesktop.DBus.Error.NameHasNoOwner: Could not activate remote peer.
> Aug 19 15:40:56 fedora gnome-software[1887]: not GsPlugin error pk-control-error-quark:0: GDBus.Error:org.freedesktop.DBus.Error.NameHasNoOwner: Could not activate remote peer.
> Aug 19 15:40:56 fedora gnome-software[1887]: not handling error failed for action refresh: GDBus.Error:org.freedesktop.DBus.Error.NameHasNoOwner: Could not activate remote peer.

I do not see from it which service it tries to connect to. It can be packagekit daemon or polkit or any other.

Nonetheless, looking on the video of the failed test, which I think I checked properly (it's truly quick), what happens in the test is this:
a) you log in to the gnome-shell, which also starts gnome-software in the background
b) you log in in a text terminal and make changes using dnf
c) then you switch back to gnome-shell
d) "run" gnome-software, its graphical UI
e) switch to the "Updates" page, where you are greeted only with the "OS Updates" and the "Restart & Update" button.

The thing is, the changes you made under the hood in the dnf, are not detected by the PackageKit, the less by the gnome-software, thus whatever the "OS Updates" have piled up (it can be clicked, which shows what is piled for the update; I'd be interested in that list of the packages, thus I'm downloading the latest compose to give it a try locally), then it might not contain the manually downgraded package, unless you do it just before the gnome-software asks the PackageKit to find out things for the update.

Even if PackageKit notices the changes made in the dnf, the button on the Updates page in the gnome-software should read "Download", not "Restart & Update". This download prepares the update, thus fills the "prepared-update" file. The download is done when automatic updates are enabled (they are enabled by default) in the gnome-software, but the check happens only roughly a minute after it's started. That's another bottleneck, because when the changes in the test terminal take longer than a minute, then the changes are not prepared and, the most importantly, nor recognized.

If you are wondering, then there are two "get the updates" calls in the gnome-software. One is to populate the Updates page, another one is related to the automatic updates. The first only gets the list, the later can also download the packages, thus prepare the update.

This makes me think that the gnome-software shows an outdated information in the Updates tab. I'm going to test it here locally, once the compose image is downloaded (it's somehow slow today from here).

Comment 11 Milan Crha 2021-09-01 16:01:28 UTC

I'm able to reproduce this. The problem is in the gnome-software, thus I move this bug there. There's no timing issue with respect of the "prepared-update" file, with respect of its creation and/or deletion.

The notes about the dnf commands being run while gnome-software is running still stand.

Comment 12 Adam Williamson 2021-09-01 19:03:59 UTC

Yes, this is more or less what happens. The reason is to ensure an update is available whenever this test runs. If, for instance, it was run on a brand new base disk image for a stable release, or when it runs on a nightly Rawhide compose, it's possible for there to be no 'regular' updates available. So the thing we do in the background is make sure an update is available; we do this by removing python3-kickstart, then installing an empty python3-kickstart-1.0-1 (IIRC) package. This assures us that the system will see the actual current version of python3-kickstart as an 'update' - i.e. that an update will be available from the regular system repositories.

The logic after switching back from the console is, uh...tricky, because it covers both KDE and GNOME and it has to cover different possible behaviours in both, because the same code is used across three releases at a time and both you and KDE insist on changing how your update tools work quite often :D This is what it currently does for GNOME:

1. Check for a welcome screen, hit enter if it sees one
2. Wait till it sees the button to select the 'Updates' view, then click it
3. Wait up to 90 seconds for the screen to be still for 5 seconds (this is in case Software is still refreshing)
4. Wait until it sees either an 'apply' (for GNOME, that's 'Restart & Update'), 'download' ('Download') or 'refresh' (the refresh button at top left) button
5. Check for either the 'download' or 'apply' button. If it sees 'apply', go to next step. If it sees 'download', click it, then wait for 'apply' (only). If it sees neither, and it's on the first iteration of this loop, it clicks the 'last match', which should be a match of the 'refresh' button from step 4, then goes to the next iteration of this loop (so back to the start of this step, except it will not click 'refresh' again, it will expect to see 'download' or 'apply' only)
6. Waits for the screen to be still for a bit (this is to avoid some odd race, I forget what) then expects to see the 'apply' button (again, for GNOME that's 'Restart & Update') and clicks it
7. Looks for a reboot confirmation button and clicks it if it appears

We've used the 'dummy package' approach as long as we've had this test. The exact logic for dealing with the update tool UI after we switch back to the desktop has been through quite a few iterations, it's quite annoying to maintain and it'd be much easier if both KDE and GNOME just stopped fiddling with how the tools work! I can't honestly remember all the explanations for exactly how it's set up, but there *are* explanations for everything, and I could dig them out of git history if necessary (e.g. why we don't just always refresh first). If either of these turn out to cause problems that are otherwise 'invalid' we can tweak them - to use rpm instead of dnf, for e.g., or not use a side repo, or reboot after the fiddling (though that's technically a bit tricky actually) - but that's what it's doing.

The video is faster than real time and can actually skip things, yeah. It depends on exactly how openQA is behaving. The video is a compilation of screenshots, it's not a 'true' video recording.

Comment 13 Milan Crha 2021-09-01 20:04:40 UTC

Okay, if you count with both "Download" && "Restart & Update" then it's all fine. My only relevant point is, and it's like that since the day 0, from my point of view, that you should not fiddle with the repositories from the command line when the gnome-software is running, because the last know state in gnome-software does not reflect the current state. You workaround it with the "Refresh" button eventually.

In any case, I have a fix for this, filled upstream as:
https://gitlab.gnome.org/GNOME/gnome-software/-/merge_requests/953

Comment 14 Fedora Update System 2021-09-01 20:09:37 UTC

FEDORA-2021-9677463ae9 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2021-9677463ae9

Comment 15 Adam Williamson 2021-09-01 20:12:06 UTC

We don't do anything to the repos while GNOME Software has been launched by the test. If it's running in the background, that's sorta invisible. :D But yes, the repo stuff has been happening since day 1 of openQA testing, so something must indeed have changed for it to be causing problems. Thanks for the fix.

Comment 16 Michael Catanzaro 2021-09-01 20:18:31 UTC

(In reply to Milan Crha from comment #13)
> My only relevant point is, and it's like that since the day 0, from my
> point of view, that you should not fiddle with the repositories from the
> command line when the gnome-software is running, because the last know state
> in gnome-software does not reflect the current state.

Problem is gnome-software is *always* running in the background.

Ideally dnf would have some way to notify gnome-software when a state refresh is required. :/

Comment 17 Milan Crha 2021-09-01 20:25:08 UTC

(In reply to Adam Williamson from comment #15)
> We don't do anything to the repos while GNOME Software has been launched by
> the test. If it's running in the background, that's sorta invisible.

Yeah, the gnome-software is auto-started after login, kind of by design, I think, and it's left running for the whole session life, no matter how many times the GUI window is opened and closed. I do not know how the KDE does that. I think the safest for the test would be to enter the text console before you are logging into the gnome-shell, prepare DNF (the repos) there, and only then login to the GNOME Shell and play with the Software. Your workarounds work too, thus should be fine for now.

> ...so something must indeed have changed for it to be causing problems.

What I fixed is what I caused in the gnome-software upstream. The previous paragraph applies in general, for the reason Michael wrote above - there are no notifications of the changes from the DNF/PackageKit to the listeners. No idea whether libdnf5 will do any better in this regard, neither when it'll be ready for the production.

Comment 18 Adam Williamson 2021-09-01 20:26:28 UTC

Just a few more details on that: the repo the downgraded package comes from is never configured as enabled. The config file says enabled=0. We run a single dnf command with `--enablerepo=<thatrepo>` to install the package.

For KDE, we actually do a 'pkcon refresh force' after installing the downgraded package, because KDE does not reliably pick up the update unless we do that. I don't like that we have to do that, though, it feels like it could potentially paper over actual bugs. I like that for GNOME it's never been necessary.

Comment 19 Milan Crha 2021-09-01 20:44:06 UTC

(In reply to Adam Williamson from comment #18)
> Just a few more details on that: the repo the downgraded package comes from
> is never configured as enabled. The config file says enabled=0. We run a
> single dnf command with `--enablerepo=<thatrepo>` to install the package.

I'm sorry for the confusion. When I said "fiddling with the repos", I did not mean enable/disable repositories. It's related, but not that much. As there are no change notifications, the applications hold the state accurate at certain time. When the user installs/uninstalls/updates/downgrades/enables-repo/disables-repo from the command line (in other words: "fiddling with repos"), it'll invalidate the state in the application, but the application doesn't know about it, thus it shows the old state. It has consequences, sometimes huge, sometimes not.

The change I did for this bug doesn't have anything to do with such invalidated state in the application.

> For KDE, we actually do a 'pkcon refresh force' after installing the downgraded package...

That should not be needed at all, I believe, when you downgrade the package before you log in the 'test' user to the desktop (KDE/GNOME/...).

Comment 20 Adam Williamson 2021-09-01 21:41:09 UTC

For boring reasons of how openQA test logic works and how we schedule this test in different circumstances, it would be somewhat awkward to do that. Not impossible if it becomes necessary, but I'd have to sort of, you know, wiggle things around a bit. :P

Comment 21 Fedora Update System 2021-09-02 15:55:50 UTC

FEDORA-2021-9677463ae9 has been pushed to the Fedora 35 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-9677463ae9`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-9677463ae9

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 22 Adam Williamson 2021-09-02 21:26:10 UTC

Fix for this is looking good. Since I added it to openQA's list of 'workarounds' (builds from updates-testing that it pulls in to all update tests), the test has not failed once in 15 tries. Thanks.

Comment 23 Fedora Update System 2021-09-03 18:10:07 UTC

FEDORA-2021-9677463ae9 has been pushed to the Fedora 35 stable repository.
If problem still persists, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.