Bug 1314991
Summary: | GNOME Software may wrongly claim system is up-to-date because update check transaction got cancelled by a foreground transaction and not restarted | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Alan Jenkins <alan.christopher.jenkins> | ||||||
Component: | gnome-software | Assignee: | Richard Hughes <rhughes> | ||||||
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 28 | CC: | awilliam, klember, rhughes, richard | ||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2019-05-28 22:52:12 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Alan Jenkins
2016-03-05 12:49:46 UTC
Ah, here's a hilarious scenario I was running into without even using the terminal: 1. gnome-software 2. Updates tab 3. Refresh button (top left), and then quickly 4. activities -> type "mon". Launch GNOME System Monitor, to watch the download throughput. Actual result: GNOME Software immediately cancels the download, while claiming success & an up to date system. Despite `pkcon get-updates` showing outstanding security updates. This is because Activities search includes a search for uninstalled packages, as well as for installed apps. Typically when logging in, I immediately launch Firefox using the Activities search. One suspicion is that this was canceling background update downloads. As I say, it raises the spectre of scenarios where security updates are delayed indefinitely. > I'm not sure why the difference between GNOME Software and pkcon.
The problem is gnome-software uses the pk_client API (almost exclusively). pkcon uses the preferred pk_task API, and that's what restarts background transactions.
I'm not planning on migrating everything myself; it looks like a fair amount of code (and opportunities to miss something important).
Err, PackageKit should only invalidate the list of updates if there's an action transaction (e.g. install, update or remove). Searching shouldn't clear the updates list. Searching doesn't clear the update list. Searching aborts the background transactions. If the transaction was caused by a user clicking refresh in GNOME software, it then claims success + no updates required. (If the updates list was originally empty. I don't know what happens if GNOME Software was already displaying some updates as needing to be installed). (In reply to Alan Jenkins from comment #4) > I don't know what happens if GNOME > Software was already displaying some updates as needing to be installed). They reappear the next time the next "hour" recheck happens. (In reply to Richard Hughes from comment #5) > (In reply to Alan Jenkins from comment #4) > > I don't know what happens if GNOME > > Software was already displaying some updates as needing to be installed). > > They reappear the next time the next "hour" recheck happens. Of course, independent of whether it was already displaying them :). I was actually trying to avoid implying that this issue makes updates disappear, because I haven't seen whether or not that happens. I wasn't concerned with the case where some updates have been found already. In that case I would already have a notification. That was my original concern, that I wasn't getting a notification when I should have been. As I say I've had a number of different things break update notifications. E.g. In stock Debian 7 or 8 they just never happen. librepo originally failed when I set up a caching proxy. So I needed to investigate this and not just "wait for it to sort itself out", so to speak. I've had several updates since, and journalctl shows "cancelled-priority" hasn't happened again. It must be that I'm not always using Activities search at login time. So if you want to triage this - for my own use right now, it's not high severity/priority. I've installed Fedora 24 and repeated the manual GUI test case from Comment 1. I haven't noticed a change to my analysis so far. Just an update, that one aspect of the manual GUI test case has been mitigated. Now, if you're running GNOME Software manually, and you happen to pre-empt the background transactions it uses, you get a nice explicit "Cancelled by user action" illustrated by the symbol for "no entry". Nice little quality bump there :). --- There is yet something else I don't understand. I left this F24 install unbooted for at least a week. Coming back to F24, opening GNOME Software and running a manual refresh didn't provide any updates. I'm sure there are security updates, because the kernel version was lower than my F23 system (4.6.4 vs 4.6.6). Using `pkcon` I was able to install the kernel security update. Aborting `pkcon` before completing the install, and retrying GNOME Software, did not change the result. Created attachment 1205962 [details] Patch by switching to pk_task API I've now tested this patch (make sure the refresh calls use pk_task). It appears to work & confirm my reasoning. Unfortunately I don't know whether it will have other side effects. When I look at the PK code to retry the transaction, I see pk_task will additionally automatically accept stuff e.g. "untrusted" packages, since we set the pk_client to be non-interactive. I'm not sure whether the patch would change behaviour in that respect. There's a number of other calls using pk_task already. I posted a query about that point to the gnome-software ML. https://mail.gnome.org/archives/gnome-software-list/2016-September/msg00001.html Created attachment 1205969 [details]
Patch v2 by switching to pk_task API
Patch fix.
Let's attach the version that uses the correct function name, so it doesn't generate a compiler warning.
I swear it passed my manual testing (and failed with the patch reverted). Maybe the plugins are allowed to build with unresolved symbols, and then the modified plugin just didn't load.
This message is a reminder that Fedora 23 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 23. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '23'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 23 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Changed version - I tested this on F24. This message is a reminder that Fedora 24 is nearing its end of life. Approximately 2 (two) weeks from now Fedora will stop maintaining and issuing updates for Fedora 24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '24'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 24 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Change specified in the attached patch is still relevant to git HEAD. File is moved to plugins/packagekit/gs-plugin-packagekit-refresh.c. There is still no sign of any code to restart the background transaction when it is interrupted. E.g. which happens when using the keyboard to quickly start an app by name from the launcher. Can I please get a review as to whether or not the patch change would be insecure due to automatically accepting "untrusted" packages? Looks sane to me. Thanks for the patch! Sorry for the delay, this fell through the cracks. Pushed to master (with slightly reworded commit message) as https://git.gnome.org/browse/gnome-software/commit/?id=9e9d62d3d1013a3f35385253f95ff8a7d10f97d6 Thank you for looking at this again :). Pretty please, can you make clearer why this is considered secure? As far as I could tell, the change means our non-interactive PackageKit client will now automatically retry without PK_TRANSACTION_FLAG_ENUM_ONLY_TRUSTED if one of these errors occurred case PK_ERROR_ENUM_GPG_FAILURE: case PK_ERROR_ENUM_BAD_GPG_SIGNATURE: case PK_ERROR_ENUM_MISSING_GPG_SIGNATURE: case PK_ERROR_ENUM_CANNOT_INSTALL_REPO_UNSIGNED: case PK_ERROR_ENUM_CANNOT_UPDATE_REPO_UNSIGNED: I can imagine I'm missing something here. I've only recently understood why it's secure for `dnf -y` and `pkcon -y` to accept updated repo keys without prompting.[1] Even just an acknowledgement that someone has looked at this security concern, would be really nice to have. Regards Alan [1] https://unix.stackexchange.com/questions/331668/fedora-25-upgrade-doesnt-install-key-for-package-updates-whats-going-on/331669#331669 It's written as if there's a single flag to both 1) allow installing specified unsigned packages from the local filesystem, and 2) disable the configured checksums for all network repos. But that can't be right. You'd never want 2), even separately. If you trust an unsigned repo R, what you want is to disable signature checking _in the config for R_. Can you join #gnome-software on irc.gnome.org and ask hughsie? I am unsure how ONLY_TRUSTED is supposed to work. Thanks. Out of caution, could you revert the patch for now please. I will open a separate bug to write up why I think ONLY_TRUSTED is broken. When we have some better understanding, we can look at this patch again :). Hope that makes more sense. Done. Thanks! ONLY_TRUSTED issue now reported as https://bugs.freedesktop.org/show_bug.cgi?id=101935 If this bug is now about gnome-software sometimes failing with "cancelled by user action", I would be *very* interested in a fix for this. It frequently causes the openQA "update the system with gnome-software" test to fail, which is very annoying. Thanks. Heh, thanks for attempting to read through this Adam :). I'm sure it's the same issue. --- Technically I _think_ it's possible to apply the simple patch here, and rely on pk-offline-update still using the old API which prevents unsigned updates. But as I wrote in the ONLY_TRUSTED bug[1], this security question remained unanswered for a year. Hence my preference is to see *some* progress on that, before applying the patch and switching to the new API, which suffers from this error in the underlying PK design. [1] https://bugs.freedesktop.org/show_bug.cgi?id=101935 Even an analysis which confirmed the above paragraph would be nice :). IMO it's not that hard to see the concern I raised if you're looking at the code. I might be able to fix the dnf or apt backends at some point. I don't want to promise anything for the moment though. This message is a reminder that Fedora 26 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 26. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '26'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 26 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 26 changed to end-of-life (EOL) status on 2018-05-29. Fedora 26 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. So I just read through this whole thing again. And it seems like it never exactly got resolved, and my experience indicates this may well still be a problem. The openQA test which tries to test that installing updates with GNOME Software often fails. Now I came back to this bug, it reminds me that, for some time, this failure was 'noticeable' - as Alan noted in comment #7, around Fedora 24 time, when this problem occurred, we got an actual error in GNOME Software, "Cancelled by user action". When that was the case, I added a sort of workaround to openQA - it would check for that error, and if it saw it, register a soft failure and try again (by clicking the 'refresh' button). I just checked the openQA records and that match hasn't happened for six months, so it seems like something changed and now we don't see that error message any more. But *the test still fails*, quite a lot, and I rather suspect it's still the same bug - just for some reason it's gone back to failing 'silently' rather than showing an error message. It doesn't look to me like Alan's patch ever got reapplied after it was reverted. There was some back-and-forth on the bug he filed regarding the ONLY_TRUSTED thing, but it was eventually closed for the gitlab move without an entirely clear resolution. Re-opening for now. I am currently trying to get a run of the failing openQA test with additional gnome-software and PK debugging info to confirm that what's going on is still basically this bug... So, I did some debugging on the openQA case here, and...well, I've got some info, but I'm not totally clear what's going on yet. I tweaked the openQA tests to get verbose logs. This does actually possibly change the behaviour a bit, but I think it probably still hits the same basic bug. On staging openQA, I made the test use verbose mode for packagekit by editing the systemd unit to pass --verbose , doing `systemctl daemon-reload`, then `systemctl restart packagekit.service`. I also made the test use verbose mode for gnome-software by having it run it from a shell, doing `killall gnome-software` then `gnome-software --verbose > /home/test/gs.txt 2>&1`. This of course means compared to the 'vanilla' version of the test we have a packagekit restart that didn't happen before, and we're probably killing the "/usr/bin/gnome-software --gapplication-service" that seems to get run as part of a GNOME session and running a fresh gnome-software from a console, rather than launching from the overview. But this kinda can't be avoided, to get the necessary debug info. I also tweaked the test a bit further to do a workaround for the bug: if it hits the 'refresh' button, then sees the "Software is up to date" screen (it looks for the big check mark), it tries again (hits the refresh button again). So from doing this and poking at the logs, at first what I thought happened was this: * The test launches gnome-software * src/gs-update-monitor.c gs_update_monitor_init sets up a 60 second timer after which it will run check_updates_on_startup_cb * The test goes to the Updates panel and hits the refresh button, the app goes off looking for available updates * While it's still in the middle of that, the timer expires, check_updates_on_startup_cb gets called, and calls `restart_updates_check`, and that basically stomps on the manual refresh the test was trying to do * Somehow, this combination of events results in the UI showing the system as up-to-date * With the new workaround, the test hits the refresh button again, and *this* time the update check succeeds You can see a run of the modified test here: https://openqa.stg.fedoraproject.org/tests/375434 The detailed log messages from gnome-software are here: https://openqa.stg.fedoraproject.org/tests/375434/file/desktop_update_graphical-gs.log Note the 'soft failure' during desktop_update_graphical - that's the workaround kicking in. If you look at the video or thumbnails you can see that the test hits refresh, then after a while the "Software is up to date" screen appears, so the test clicks refresh again and this time it works. Here are some key timings from this test run: 22:55:57.0453 gnome-software launched 22:56:12.0119 refresh button clicked (first time) 22:58:15.0298 "Software is up to date" screen seen 22:58:16.0490 refresh button clicked (second time) 22:58:43.0834 "apply" button found (i.e. g-s successfully found updates) And here's a key message from the gnome-software verbose log: 22:56:58:0647 Gs First hourly updates check Note this appears right between where the test clicks 'refresh' and where the "Software is up to date" screen appears. HOWEVER...I tried to prove this, and sorta disproved it. I did a build of gnome-software which delays that "First hourly updates check" to happen after an hour instead of a minute. And...the test still fails on the first attempt! Here's that run: https://openqa.stg.fedoraproject.org/tests/375508 https://openqa.stg.fedoraproject.org/tests/375508/file/desktop_update_graphical-gs.log the observed behaviour is just the same: the *first* time the test hits 'refresh' button, it comes up "Software is up to date". The *second* time, it correctly finds the available update and shows it in the generic "OS Updates". And we can see in the log that my workaround worked - there is no "First hourly updates check". So...that hourly check *DOESN'T* seem to be what's breaking stuff here. But I don't know what is! I'm now doing a gnome-software build with some added debug logging to try and get closer to the bottom of things, but at this point I think this may be a different bug. Created https://bugzilla.redhat.com/show_bug.cgi?id=1638563 for further debugging on that case... This message is a reminder that Fedora 28 is nearing its end of life. On 2019-May-28 Fedora will stop maintaining and issuing updates for Fedora 28. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '28'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 28 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 28 changed to end-of-life (EOL) status on 2019-05-28. Fedora 28 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |