Bug 1638563 - Updates often not found on first click of 'refresh' button (openQA testing)
Summary: Updates often not found on first click of 'refresh' button (openQA testing)
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: gnome-software   
(Show other bugs)
Version: 28
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Richard Hughes
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-11 22:29 UTC by Adam Williamson
Modified: 2018-11-03 02:56 UTC (History)
2 users (show)

Fixed In Version: gnome-software-3.28.2-4.fc28 gnome-software-3.28.2-4.fc27
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-10-20 23:51:34 UTC
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
the diff showing the debug statements I added to gnome-software for testing (2.74 KB, patch)
2018-10-11 22:29 UTC, Adam Williamson
no flags Details | Diff
the verbose output from gnome-software, with my added debug statements (1.11 MB, text/plain)
2018-10-11 22:44 UTC, Adam Williamson
no flags Details
PackageKit log messages from the run (277.67 KB, text/plain)
2018-10-11 22:44 UTC, Adam Williamson
no flags Details

Description Adam Williamson 2018-10-11 22:29:02 UTC
In openQA, there is a test which tries to check GNOME Software can correctly find and install updates (not upgrades), with whatever packages are in the update being tested. Note the test is *not* checking that GNOME Software can install the *update under test itself*, but that *with that update installed*, GNOME Software can still find and install a 'dummy' update. The test does this:

1) Boot a clean disk image of the release under test (created by virt-manager)
2) Log in as a user account created as part of the install
3) Run through the user-mode gnome-initial-setup that appears on this first login
4) From the clean desktop, switch to a VT, log in as root
5) Fiddle with the repo definitions a bit (switch from metalink to mirrorlist)
5) Install Bodhi CLI client, download the packages from the update to a scratch directory, create a local repo for the directory
6) Run 'dnf -y update'
7) Reboot
8) Log into the desktop again
9) Switch to a VT, log in as root
10) With dnf, remove python3-kickstart, if it is installed
11) With dnf, install a dummy python3-kickstart-1.1 package from a special side repo
12) Switch back to the desktop
13) Run gnome-software, go to the Updates pane
14) Click the 'refresh' button

At this point, what the test expects to happen is: gnome-software should notice that an update is available for python3-kickstart (as we know for sure the distro will contain a higher-versioned python3-kickstart package) and show that an update is available for install. At which point the test clicks the 'apply' button and carries on from there.

However, many times when this test runs on F27 and F28, that is not what happens. Instead, after 14), gnome-software thinks for a bit and then shows the "Software is up to date" screen. i.e. it does *not* think an update is available.

If I add a workaround to the test which makes it retry when this happens - spot the 'Software is up to date' screen, and click the 'refresh' button again - then it seems to always work on the *second* try. The update is found and the test proceeds.

I believe at least for some time, openQA was running into https://bugzilla.redhat.com/show_bug.cgi?id=1314991 here. For a while we actually got the "Cancelled by user action" error mentioned in comment #7 there, and the workaround cued off that. But I'm not so sure that's the case any *more*. We aren't seeing that error, and after I poked at the problem a bit and did some debugging, I'm not convinced it's a 'transaction collision' problem that we're seeing now.

I have tweaked the test a bit on the staging instance so it restarts PackageKit in debugging mode (with --verbose) and runs gnome-software with --verbose also, and captures the output from those. I have also been testing with custom gnome-software builds that actually add additional debugging statements: these are all prefaced with 'XXX:' for easy identification. I will attach the patch file so you can see at what points in the code I actually added these statements.

Basically, from the logs, it seems like on the *first* refresh attempt, gnome-software does notice in some sense that an update is available for python3-kickstart. But when the 'generic update' plugin - which does the work of gathering updates for things that don't have appdata into a generic "OS Updates" entry - runs the `gs_plugin_generic_updates_merge_os_update` function which is supposed to tell it whether it should include a given package update into that, it returns FALSE, and that's the problem. The "list of packages to be put into the generic OS Updates app" winds up being empty, which means the generic OS Updates 'app' is not created (this is all in `gs_plugin_refine`), and so we're told no updates are available.

On the *second* refresh attempt, `gs_plugin_generic_updates_merge_os_update` returns TRUE, and python3-kickstart *is* added into the list for the 'OS Updates' entry, which - because the list is not empty - now actually gets created and shows up in the UI.

I have not yet figured out *why* this is happening. From my debug log statements, it seems that it is hitting this conditional on the first attempt:

	/* this is only for grouping system-installed packages */
	if (gs_app_get_bundle_kind (app) != AS_BUNDLE_KIND_PACKAGE ||
	    gs_app_get_scope (app) != AS_APP_SCOPE_SYSTEM) {
		g_debug ("XXX merge_os_update returning FALSE!");
		return FALSE;
	}

(note the added log statement, which we see in the logs), and so it returns FALSE. But on the second attempt, it does not hit that one - it gets past it and hits this one instead:

	if (gs_app_get_kind (app) == AS_APP_KIND_GENERIC) {
		g_debug ("XXX merge_os_update returning TRUE (generic)!");
		return TRUE;
	}

and returns TRUE. So somehow, the 'bundle kind' is not 'package' or the 'scope' is not 'system' on the *first* try, but on the *second* try, both those things are true. I don't know why this might be, yet.

This bug does not appear to be affecting F29 yet, but I'm not *sure* if that means it's actually been fixed, or if there might be some sort of difference in how the images or repos behave or something in F29.

Comment 1 Adam Williamson 2018-10-11 22:29 UTC
Created attachment 1493063 [details]
the diff showing the debug statements I added to gnome-software for testing

Comment 2 Adam Williamson 2018-10-11 22:44 UTC
Created attachment 1493077 [details]
the verbose output from gnome-software, with my added debug statements

Comment 3 Adam Williamson 2018-10-11 22:44 UTC
Created attachment 1493078 [details]
PackageKit log messages from the run

Timings for the run (to try and cross compare with the logs):

1. First boot reaches g-i-s about 21:05:11
2. First boot reaches desktop and switches to VT about 21:06:24
3. First boot finishes installing updates and reboots at 21:13:21.0441
4. Second boot reaches clean desktop about 21:14:55
5. Second boot finishes setting up the dummy update about 21:16:45
6. gnome-software launched about 21:17:51
7. First 'refresh' attempt at 21:18:04
8. 'Software is up to date' screen appears within the 60 seconds before 21:20:08.0377 (cannot be more precise)
9. Second 'refresh' attempt at 21:20:09

Comment 4 Adam Williamson 2018-10-11 22:54:35 UTC
I just did a variant run of the test where I had it just sit at the front page of Software for two minutes before switching to the Updates screen, in case that made it work on the first refresh...but nope. Still failed on the first refresh, worked on the second.

Comment 5 Adam Williamson 2018-10-12 01:48:24 UTC
I think this may be an ordering issue. I just added a few more debug statements, notably in gs-plugin-packagekit-refine.c gs_plugin_refine_app() , which is the most obvious thing that would *set* the scope and kind for an update like this, and that looks like a bingo. In this run, the *first* 'refresh' click was at 01:33:36.0868 and the *second* was at 01:34:02.0142 . Check this log extract:

01:34:00:0168 As  run 0x55dfcfdde450~GsPlugin::generic-updates(gs_plugin_add_updates;gs_plugin_refine)
01:34:00:0168 Gs  XXX in app_list loop for app (null)!
01:34:00:0168 Gs  XXX in merge_os_update for app (null) / python3-kickstart
01:34:00:0168 Gs  XXX merge_os_update returning FALSE!
01:34:00:0168 Gs  XXX kind was: (null)
01:34:00:0168 Gs  XXX scope was: (null)
01:34:00:0168 Gs  XXX in app_list loop for app (null)!
01:34:00:0168 Gs  XXX in merge_os_update for app (null) / python3-ordered-set
01:34:00:0168 Gs  XXX merge_os_update returning FALSE!
01:34:00:0168 Gs  XXX kind was: (null)
01:34:00:0168 Gs  XXX scope was: (null)
01:34:00:0168 Gs  XXX returning TRUE at point 2!
01:34:00:0168 As  run 0x55dfcfdde450~GsPlugin::packagekit-refine(gs_plugin_add_updates;gs_plugin_refine)
01:34:00:0168 As  run 0x55dfcfdde450~packagekit-refine[name->id]
01:34:00:0168 As  run 0x55dfcfdde450~packagekit-refine[id->update-details]
01:34:00:0168 PK  role now get-update-detail
01:34:00:0181 Gs  emitting global waiting
01:34:00:0275 As  run 0x55dfcd1ac230~packagekit-refine[source->license]
01:34:00:0276 As  run 0x55dfcd1ac230~packagekit-refine[update-urgency]
01:34:00:0276 Gs  plugin packagekit-refine took 2.2 seconds to do refine
01:34:00:0276 GsPluginPackageKit XXX In PK gs_plugin_refine_app for app (null) / python3-kickstart
01:34:00:0276 GsPluginPackageKit XXX set scope to SYSTEM
01:34:00:0276 Gs  XXX In gs_app_set_bundle_kind for app (null) / python3-kickstart
01:34:00:0276 GsPluginPackageKit XXX set kind to PACKAGE
01:34:00:0276 GsPluginPackageKit XXX In PK gs_plugin_refine_app for app (null) / python3-ordered-set
01:34:00:0276 GsPluginPackageKit XXX set scope to SYSTEM
01:34:00:0276 Gs  XXX In gs_app_set_bundle_kind for app (null) / python3-ordered-set
01:34:00:0276 GsPluginPackageKit XXX set kind to PACKAGE
01:34:00:0276 Gs  emitting global finished

...

01:34:08:0931 As  run 0x55dfd0c55000~GsPlugin::generic-updates(gs_plugin_add_updates;gs_plugin_refine)
01:34:08:0931 Gs  XXX in app_list loop for app (null)!
01:34:08:0931 Gs  XXX in merge_os_update for app (null) / python3-kickstart
01:34:08:0931 Gs  XXX merge_os_update returning TRUE (generic)!
01:34:08:0931 Gs  XXX in app_list loop for app (null)!
01:34:08:0931 Gs  XXX in merge_os_update for app (null) / python3-ordered-set
01:34:08:0931 Gs  XXX merge_os_update returning TRUE (generic)!
01:34:08:0932 Gs  XXX reached point 3 'create new meta object'!
01:34:08:0932 As  run GsPlugin::packagekit-history(gs_plugin_refine;gs_plugin_refine)
01:34:08:0932 Gs  autogenerating unique-id for org.gnome.Software.OsUpdate
01:34:08:0932 Gs  */*/*/os-update/org.gnome.Software.OsUpdate/* non-transient state now updatable-live
01:34:08:0932 As  run GsPlugin::packagekit-refine-repos(gs_plugin_refine;gs_plugin_refine)
01:34:08:0932 As  run packagekit-refine-repos[repo-filename->id]
01:34:08:0932 Gs  moving python3-kickstart to parent */*/*/os-update/org.gnome.Software.OsUpdate/*
01:34:08:0932 Gs  2nd resolve pass for unadopted wildcards
01:34:08:0932 Gs  moving python3-ordered-set to parent */*/*/os-update/org.gnome.Software.OsUpdate/*


Basically, it looks to me like on the *first* refresh, we find the package update, then run the generic-updates plugin - which does not add the package to 'OS Updates' because its kind and scope are null - *then* run packagekit-refine , which sets the kind and scope to PACKAGE and SYSTEM...but too late for this round, because generic-updates *already ran* and didn't work right. But now the kind and scope have been set, on the *second* refresh, generic-updates does the right thing and adds the package to "OS Updates".

So...basically to fix this, packagekit-refine needs to happen before generic-updates...I think.

Comment 6 Adam Williamson 2018-10-12 02:02:52 UTC
Hah - and gnome-software git master has exactly that added:

gs_plugin_add_rule (plugin, GS_PLUGIN_RULE_RUN_AFTER, "packagekit-refine");

It was part of this commit:

https://gitlab.gnome.org/GNOME/gnome-software/commit/37103e6e5f15d39843d468775da2b53ab145bd2a

so I think it wasn't actually intended to fix this, but I think will have had the *effect* of fixing this. I'm going to try making just that change on top of the F28 package, and see if it makes this problem go away...

Comment 7 Adam Williamson 2018-10-12 02:21:07 UTC
Aha - and there's a plausible explanation for why this broke, too - this commit:

https://gitlab.gnome.org/GNOME/gnome-software/commit/9eab01082b07acd525b8a390ab6b43136cf400a5

which appeared in 3.28.1. That's the commit that *added* this check for 'scope' and 'kind' - it should have also ensured that generic-updates ran after packagekit-refine, but it didn't. I will send an MR upstream to fix this on the 3-28 branch, and do updates for F27 and F28...

Comment 9 Fedora Update System 2018-10-12 03:31:43 UTC
gnome-software-3.28.2-4.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-5a93cc4270

Comment 10 Fedora Update System 2018-10-12 03:31:49 UTC
gnome-software-3.28.2-4.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-d2d222fe5c

Comment 11 Fedora Update System 2018-10-12 20:54:07 UTC
gnome-software-3.28.2-4.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-5a93cc4270

Comment 12 Fedora Update System 2018-10-12 21:28:04 UTC
gnome-software-3.28.2-4.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-d2d222fe5c

Comment 13 Fedora Update System 2018-10-20 23:51:34 UTC
gnome-software-3.28.2-4.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Comment 14 Fedora Update System 2018-11-03 02:56:16 UTC
gnome-software-3.28.2-4.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.