Bug 1314991

Summary: GNOME Software may wrongly claim system is up-to-date because update check transaction got cancelled by a foreground transaction and not restarted
Product: [Fedora] Fedora Reporter: Alan Jenkins <alan.christopher.jenkins>
Component: gnome-softwareAssignee: Richard Hughes <rhughes>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 28CC: awilliam, klember, rhughes, richard
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-28 22:52:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch by switching to pk_task API
none
Patch v2 by switching to pk_task API none

Description Alan Jenkins 2016-03-05 12:49:46 UTC
Description of problem:

GNOME Software checks for and downloads security updates using PK background transactions.[1]  Transactions could be interrupted for a number of reasons.

Background transactions in particular are immediately canceled when there's a foreground transaction.  The original intention was that they be resumed as soon as the foreground transaction finishes.[2]  This can be seen working with pkcon --background[2].  However GNOME Software doesn't restart the background transaction in the same way.

I only have a few results; it looks like I end up with a delay of hours[3].  But nor do they rule out that updates could be delayed by days in some circumstances.

PK transactions happen more often than people might think.  E.g. trying to run a non-existent command in a terminal will cause a PK search, to try and find a package with that command.  It wouldn't be unusual for me to repeatedly mis-spell commands over the course of a day :).

I've been having various different problems with updates lately.[3]  I'd much prefer an immediate restart like `pkcon --background` does.  It makes it much clearer what's going on - it would be much easier to tell if there's yet another problem that's interfering with this update process.

I'm not sure why the difference between GNOME Software and pkcon.  I looked at pkcon and couldn't see any code that explicitly restarted background transactions.

There's a second bug that causes annoyance here.  When GNOME Software transactions are interrupted (except by the Stop button), they seem to be reported as completing successfully.  (Big tick, "software is up to date", also "Last checked" is updated with the current time).

[1] Background transactions seems to be used even when you open the GNOME Software gui and click the refresh button.  Which wouldn't be an obvious problem except for the bug I'm reporting here.

[2] Initial discussion about background transactions: http://markmail.org/message/tocrz6r52dvhip3n

[3] I've had trouble with updates for a while, and re-installed in the past few weeks.  So I've been paranoid & hence noticed this behaviour.  "journalctl | grep cancelled-priority" shows a number of lines.  Ideally you'd want to see what it looks like on systems that have been installed (and using GNOME Software for updates) for a few months.


Version-Release number of selected component (if applicable):
gnome-software-3.18.3-1.fc23.x86_64

How reproducible:
always (just get the timing right):

Steps to Reproduce:
1. Run `pkmon` in a terminal window/tab
2. Open a second terminal window/tab
3. Type the command `pkcon get-updates`, but don't run it yet
3. gnome-software
4. Updates tab
5. Refresh button (top left) and then immediately
6. switch back to the terminal and execute the `pkcon get-updates` command by hitting enter

Actual results:
The `pkmon` command first shows the GNOME Software transaction i.e. `get-updates`.  It then terminates with "exit code: cancelled-priority".  The second transaction from `pkcon get-updates` finishes with "exit code: success".

Expected results:
Same as actual results, but then followed by an immediate restart of the GNOME Software transaction.

Comment 1 Alan Jenkins 2016-03-10 14:37:53 UTC
Ah, here's a hilarious scenario I was running into without even using the terminal:

1. gnome-software
2. Updates tab
3. Refresh button (top left), and then quickly
4. activities -> type "mon".  Launch GNOME System Monitor, to watch the download throughput.

Actual result:

GNOME Software immediately cancels the download, while claiming success & an up to date system.  Despite `pkcon get-updates` showing outstanding security updates.

This is because Activities search includes a search for uninstalled packages, as well as for installed apps.

Typically when logging in, I immediately launch Firefox using the Activities search.  One suspicion is that this was canceling background update downloads.  

As I say, it raises the spectre of scenarios where security updates are delayed indefinitely.

Comment 2 Alan Jenkins 2016-03-13 16:16:02 UTC
> I'm not sure why the difference between GNOME Software and pkcon.

The problem is gnome-software uses the pk_client API (almost exclusively).  pkcon uses the preferred pk_task API, and that's what restarts background transactions.

I'm not planning on migrating everything myself; it looks like a fair amount of code (and opportunities to miss something important).

Comment 3 Richard Hughes 2016-03-31 15:38:20 UTC
Err, PackageKit should only invalidate the list of updates if there's an action transaction (e.g. install, update or remove). Searching shouldn't clear the updates list.

Comment 4 Alan Jenkins 2016-03-31 15:40:49 UTC
Searching doesn't clear the update list.  Searching aborts the background transactions.  If the transaction was caused by a user clicking refresh in GNOME software, it then claims success + no updates required.  (If the updates list was originally empty.  I don't know what happens if GNOME Software was already displaying some updates as needing to be installed).

Comment 5 Richard Hughes 2016-04-01 11:43:37 UTC
(In reply to Alan Jenkins from comment #4)
> I don't know what happens if GNOME
> Software was already displaying some updates as needing to be installed).

They reappear the next time the next "hour" recheck happens.

Comment 6 Alan Jenkins 2016-04-01 12:57:25 UTC
(In reply to Richard Hughes from comment #5)
> (In reply to Alan Jenkins from comment #4)
> > I don't know what happens if GNOME
> > Software was already displaying some updates as needing to be installed).
> 
> They reappear the next time the next "hour" recheck happens.

Of course, independent of whether it was already displaying them :).  I was actually trying to avoid implying that this issue makes updates disappear, because I haven't seen whether or not that happens.  I wasn't concerned with the case where some updates have been found already.  In that case I would already have a notification.

That was my original concern, that I wasn't getting a notification when I should have been.  As I say I've had a number of different things break update notifications.  E.g. In stock Debian 7 or 8 they just never happen.  librepo originally failed when I set up a caching proxy.  So I needed to investigate this and not just "wait for it to sort itself out", so to speak.

I've had several updates since, and journalctl shows "cancelled-priority" hasn't happened again.  It must be that I'm not always using Activities search at login time.  So if you want to triage this - for my own use right now, it's not high severity/priority.

Comment 7 Alan Jenkins 2016-08-19 16:04:24 UTC
I've installed Fedora 24 and repeated the manual GUI test case from Comment 1.

I haven't noticed a change to my analysis so far.  Just an update, that one aspect of the manual GUI test case has been mitigated.

Now, if you're running GNOME Software manually, and you happen to pre-empt the background transactions it uses, you get a nice explicit "Cancelled by user action" illustrated by the symbol for "no entry".

Nice little quality bump there :).

---

There is yet something else I don't understand.  I left this F24 install unbooted for at least a week.  Coming back to F24, opening GNOME Software and running a manual refresh didn't provide any updates.  I'm sure there are security updates, because the kernel version was lower than my F23 system (4.6.4 vs 4.6.6).  Using `pkcon` I was able to install the kernel security update.  Aborting `pkcon` before completing the install, and retrying GNOME Software, did not change the result.

Comment 8 Alan Jenkins 2016-09-29 14:36:02 UTC
Created attachment 1205962 [details]
Patch by switching to pk_task API

I've now tested this patch (make sure the refresh calls use pk_task).  It appears to work & confirm my reasoning.

Unfortunately I don't know whether it will have other side effects.  When I look at the PK code to retry the transaction, I see pk_task will additionally automatically accept stuff e.g. "untrusted" packages, since we set the pk_client to be non-interactive.  I'm not sure whether the patch would change behaviour in that respect.  There's a number of other calls using pk_task already.

I posted a query about that point to the gnome-software ML.  https://mail.gnome.org/archives/gnome-software-list/2016-September/msg00001.html

Comment 9 Alan Jenkins 2016-09-29 14:50:27 UTC
Created attachment 1205969 [details]
Patch v2 by switching to pk_task API

Patch fix.

Let's attach the version that uses the correct function name, so it doesn't generate a compiler warning.

I swear it passed my manual testing (and failed with the patch reverted).  Maybe the plugins are allowed to build with unresolved symbols, and then the modified plugin just didn't load.

Comment 10 Fedora End Of Life 2016-11-24 15:56:18 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 11 Alan Jenkins 2016-11-24 17:35:59 UTC
Changed version - I tested this on F24.

Comment 12 Fedora End Of Life 2017-07-25 20:17:40 UTC
This message is a reminder that Fedora 24 is nearing its end of life.
Approximately 2 (two) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 24. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '24'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 24 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 13 Alan Jenkins 2017-07-25 20:51:24 UTC
Change specified in the attached patch is still relevant to git HEAD.  File is moved to plugins/packagekit/gs-plugin-packagekit-refresh.c.  There is still no sign of any code to restart the background transaction when it is interrupted. E.g. which happens when using the keyboard to quickly start an app by name from the launcher.

Can I please get a review as to whether or not the patch change would be insecure due to automatically accepting "untrusted" packages?

Comment 14 Kalev Lember 2017-07-26 09:25:21 UTC
Looks sane to me. Thanks for the patch! Sorry for the delay, this fell through the cracks.

Pushed to master (with slightly reworded commit message) as https://git.gnome.org/browse/gnome-software/commit/?id=9e9d62d3d1013a3f35385253f95ff8a7d10f97d6

Comment 15 Alan Jenkins 2017-07-26 10:30:56 UTC
Thank you for looking at this again :).  Pretty please, can you make clearer why this is considered secure?

As far as I could tell, the change means our non-interactive PackageKit client will now automatically retry without PK_TRANSACTION_FLAG_ENUM_ONLY_TRUSTED if one of these errors occurred

		case PK_ERROR_ENUM_GPG_FAILURE:
		case PK_ERROR_ENUM_BAD_GPG_SIGNATURE:
		case PK_ERROR_ENUM_MISSING_GPG_SIGNATURE:
		case PK_ERROR_ENUM_CANNOT_INSTALL_REPO_UNSIGNED:
		case PK_ERROR_ENUM_CANNOT_UPDATE_REPO_UNSIGNED:

I can imagine I'm missing something here.  I've only recently understood why it's secure for `dnf -y` and `pkcon -y` to accept updated repo keys without prompting.[1]  Even just an acknowledgement that someone has looked at this security concern, would be really nice to have.

Regards
Alan

[1] https://unix.stackexchange.com/questions/331668/fedora-25-upgrade-doesnt-install-key-for-package-updates-whats-going-on/331669#331669

Comment 16 Alan Jenkins 2017-07-26 11:04:23 UTC
It's written as if there's a single flag to both 1) allow installing specified unsigned packages from the local filesystem, and 2) disable the configured checksums for all network repos.  But that can't be right.  You'd never want 2), even separately.  If you trust an unsigned repo R, what you want is to disable signature checking _in the config for R_.

Comment 17 Kalev Lember 2017-07-26 14:02:07 UTC
Can you join #gnome-software on irc.gnome.org and ask hughsie? I am unsure how ONLY_TRUSTED is supposed to work.

Comment 18 Alan Jenkins 2017-07-26 14:58:24 UTC
Thanks.  Out of caution, could you revert the patch for now please.  I will open a separate bug to write up why I think ONLY_TRUSTED is broken.  When we have some better understanding, we can look at this patch again :).  Hope that makes more sense.

Comment 19 Kalev Lember 2017-07-26 15:46:48 UTC
Done.

Comment 20 Alan Jenkins 2017-07-26 18:49:52 UTC
Thanks!  ONLY_TRUSTED issue now reported as https://bugs.freedesktop.org/show_bug.cgi?id=101935

Comment 21 Adam Williamson 2017-08-17 18:35:41 UTC
If this bug is now about gnome-software sometimes failing with "cancelled by user action", I would be *very* interested in a fix for this. It frequently causes the openQA "update the system with gnome-software" test to fail, which is very annoying. Thanks.

Comment 22 Alan Jenkins 2017-08-17 19:12:16 UTC
Heh, thanks for attempting to read through this  Adam :).  I'm sure it's the same issue.

---

Technically I _think_ it's possible to apply the simple patch here, and rely on pk-offline-update still using the old API which prevents unsigned updates.  But as I wrote in the ONLY_TRUSTED bug[1], this security question remained unanswered for a year.  Hence my preference is to see *some* progress on that, before applying the patch and switching to the new API, which suffers from this error in the underlying PK design.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=101935

Even an analysis which confirmed the above paragraph would be nice :).  IMO it's not that hard to see the concern I raised if you're looking at the code.

I might be able to fix the dnf or apt backends at some point.  I don't want to promise anything for the moment though.

Comment 23 Fedora End Of Life 2018-05-03 08:43:18 UTC
This message is a reminder that Fedora 26 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 26. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '26'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 26 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 24 Fedora End Of Life 2018-05-29 11:55:00 UTC
Fedora 26 changed to end-of-life (EOL) status on 2018-05-29. Fedora 26
is no longer maintained, which means that it will not receive any
further security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 25 Adam Williamson 2018-10-10 20:49:35 UTC
So I just read through this whole thing again. And it seems like it never exactly got resolved, and my experience indicates this may well still be a problem.

The openQA test which tries to test that installing updates with GNOME Software often fails. Now I came back to this bug, it reminds me that, for some time, this failure was 'noticeable' - as Alan noted in comment #7, around Fedora 24 time, when this problem occurred, we got an actual error in GNOME Software, "Cancelled by user action". When that was the case, I added a sort of workaround to openQA - it would check for that error, and if it saw it, register a soft failure and try again (by clicking the 'refresh' button).

I just checked the openQA records and that match hasn't happened for six months, so it seems like something changed and now we don't see that error message any more. But *the test still fails*, quite a lot, and I rather suspect it's still the same bug - just for some reason it's gone back to failing 'silently' rather than showing an error message.

It doesn't look to me like Alan's patch ever got reapplied after it was reverted. There was some back-and-forth on the bug he filed regarding the ONLY_TRUSTED thing, but it was eventually closed for the gitlab move without an entirely clear resolution.

Re-opening for now. I am currently trying to get a run of the failing openQA test with additional gnome-software and PK debugging info to confirm that what's going on is still basically this bug...

Comment 26 Adam Williamson 2018-10-11 01:28:59 UTC
So, I did some debugging on the openQA case here, and...well, I've got some info, but I'm not totally clear what's going on yet.

I tweaked the openQA tests to get verbose logs. This does actually possibly change the behaviour a bit, but I think it probably still hits the same basic bug.

On staging openQA, I made the test use verbose mode for packagekit by editing the systemd unit to pass --verbose , doing `systemctl daemon-reload`, then `systemctl restart packagekit.service`. I also made the test use verbose mode for gnome-software by having it run it from a shell, doing `killall gnome-software` then `gnome-software --verbose > /home/test/gs.txt 2>&1`.

This of course means compared to the 'vanilla' version of the test we have a packagekit restart that didn't happen before, and we're probably killing the "/usr/bin/gnome-software --gapplication-service" that seems to get run as part of a GNOME session and running a fresh gnome-software from a console, rather than launching from the overview. But this kinda can't be avoided, to get the necessary debug info.

I also tweaked the test a bit further to do a workaround for the bug: if it hits the 'refresh' button, then sees the "Software is up to date" screen (it looks for the big check mark), it tries again (hits the refresh button again).

So from doing this and poking at the logs, at first what I thought happened was this:

* The test launches gnome-software
* src/gs-update-monitor.c gs_update_monitor_init sets up a 60 second timer after which it will run check_updates_on_startup_cb
* The test goes to the Updates panel and hits the refresh button, the app goes off looking for available updates
* While it's still in the middle of that, the timer expires, check_updates_on_startup_cb gets called, and calls `restart_updates_check`, and that basically stomps on the manual refresh the test was trying to do
* Somehow, this combination of events results in the UI showing the system as up-to-date
* With the new workaround, the test hits the refresh button again, and *this* time the update check succeeds

You can see a run of the modified test here: https://openqa.stg.fedoraproject.org/tests/375434

The detailed log messages from gnome-software are here:
https://openqa.stg.fedoraproject.org/tests/375434/file/desktop_update_graphical-gs.log

Note the 'soft failure' during desktop_update_graphical - that's the workaround kicking in. If you look at the video or thumbnails you can see that the test hits refresh, then after a while the "Software is up to date" screen appears, so the test clicks refresh again and this time it works. Here are some key timings from this test run:

22:55:57.0453 gnome-software launched
22:56:12.0119 refresh button clicked (first time)
22:58:15.0298 "Software is up to date" screen seen
22:58:16.0490 refresh button clicked (second time)
22:58:43.0834 "apply" button found (i.e. g-s successfully found updates)

And here's a key message from the gnome-software verbose log:

22:56:58:0647 Gs  First hourly updates check

Note this appears right between where the test clicks 'refresh' and where the "Software is up to date" screen appears.

HOWEVER...I tried to prove this, and sorta disproved it. I did a build of gnome-software which delays that "First hourly updates check" to happen after an hour instead of a minute. And...the test still fails on the first attempt! Here's that run:

https://openqa.stg.fedoraproject.org/tests/375508
https://openqa.stg.fedoraproject.org/tests/375508/file/desktop_update_graphical-gs.log

the observed behaviour is just the same: the *first* time the test hits 'refresh' button, it comes up "Software is up to date". The *second* time, it correctly finds the available update and shows it in the generic "OS Updates". And we can see in the log that my workaround worked - there is no "First hourly updates check". So...that hourly check *DOESN'T* seem to be what's breaking stuff here. But I don't know what is!

I'm now doing a gnome-software build with some added debug logging to try and get closer to the bottom of things, but at this point I think this may be a different bug.

Comment 27 Adam Williamson 2018-10-12 01:17:15 UTC
Created https://bugzilla.redhat.com/show_bug.cgi?id=1638563 for further debugging on that case...

Comment 28 Ben Cotton 2019-05-02 20:40:36 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 29 Ben Cotton 2019-05-28 22:52:12 UTC
Fedora 28 changed to end-of-life (EOL) status on 2019-05-28. Fedora 28 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.