| Summary: | Interrupting PackageKit update then retrying it, incorrectly claims "no updates available" | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Alan Jenkins <alan.christopher.jenkins> | ||||
| Component: | PackageKit | Assignee: | Richard Hughes <rhughes> | ||||
| Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 24 | CC: | jonathan, klember, rdieter, rhughes, smparrish | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-08-08 17:21:18 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
Alan Jenkins
2016-09-14 12:43:12 UTC
I notice I should add step 7 completes immediately. I.e. having cancelled step 6 presumably during one of the large repo downloads, step 7 doesn't appear to try downloading any other repo. There are two large repos: `fedora`, i.e. at time of release, and `updates`. Initial log analysis: 12:55:35 Hif Attempting to update fedora 12:55:35 PackageKit-Hif got state download-repository with hint (null) 12:55:35 Hif update state 509160/50816387 12:55:35 Hif update state 1017408/50816387 12:55:35 Hif update state 1531448/50816387 12:55:36 Hif update state 2036800/50816387 12:55:36 Hif update state 2542152/50816387 Last "update state" - network disconnected just after this point. 12:55:39 PackageKit CreateTransaction method called 5 other get-updates transactions get created, but by a different "sender" (?) There are exactly 5 repos in total - coincidence? 12:55:42 PackageKit Cancel method called on /17_bcecccbc 12:55:42 PackageKit same sender, no need to check uid 12:55:42 PackageKit emitting allow-cancel 0 12:55:42 PackageKit opened logind fd 34 12:55:42 PackageKit 6 transactions in list, 6 committed but not finished 12:55:42 Hif releasing lock 1 12:55:42 Hif Skipping refresh of fedora: cannot update repo 'fedora': Yum repo downloading error: Downloading error(s): repodata/8dcc989396ed27fadd252ba9b655019934bc3d9915f186f1f2f27e71eba7b42f-primary.xml.gz - Download failed: Curl error (42): Operation was aborted by an application callback ... Ctrl-C of pkcon feeds through 12:55:42 PackageKit transaction now finished 12:55:42 PackageKit running /18_baaacebb as previous one finished 12:55:42 PackageKit transaction now running 12:55:42 PackageKit 6 transactions in list, 5 committed but not finished next transaction (18) starts 12:55:42 Hif Skipping refresh of fedora: cannot update repo 'fedora': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried probably TCP connect() failure; DNS resolution was cached from earlier 12:55:42 Hif Skipping refresh of rpmfusion-free: cannot update repo 'rpmfusion-free': Cannot prepare internal mirrorlist: Curl error (6): Couldn't resolve host name for http://mirrors.rpmfusion.org/mirrorlist?repo=free-fedora-24&arch=x86_64 [Could not resolve host: mirrors.rpmfusion.org] Now failing DNS resolution, transaction 18 continues burning through the repos as expected. 12:55:42 PackageKit transaction now finished 12:55:42 PackageKit running /19_bccaebad as previous one finished 12:55:42 PackageKit transaction now running 12:55:42 PackageKit 6 transactions in list, 4 committed but not finished 12:55:42 PackageKit install uid now 1001 12:55:42 PackageKit install cmdline now PackageKit: get-updates 12:55:42 PackageKit setting role for /19_bccaebad to get-updates 12:55:42 PackageKit-Hif using cached sack HySack::release_ver[24]::filelists|remote 12:55:42 PackageKit backend was running for 12 ms 12:55:42 PackageKit emitting finished 'success', 12 12:55:42 PackageKit transaction now finished We see all subsequent transactions (including from step 7) answered from some higher-level cache - this is what causes the problem 12:55:52 PackageKit CreateTransaction method called 12:55:52 PackageKit job count now 23 12:55:52 PackageKit trying to open database '/var/lib/PackageKit/transactions.db' 12:55:52 PackageKit job count is now at 22 12:55:52 PackageKit transaction now new 12:55:52 PackageKit setting sender to :1.63 12:55:52 PackageKit adding transaction 0x55c7b473d640 12:55:52 PackageKit sending object path: '/23_deaceeaa' another 5 transactions created by the same mysterious sender 12:55:59 PackageKit setting sender to :1.382 then this last one must be my step 7 `pkcon get-updates`. Looking at the sender numbers, our 1.63 is plausibly gnome-software. The pattern of requests (5 in a row) is also consistent, having looked at the system logs for packagekitd. Can't be packagekitd sending requests to itself, because the UID is wrong. ---- Aside: the last debug message below is buggy, uses "%i" where it should be "%u": 12:55:35 PackageKit SetHints method called: locale=en_GB.UTF-8, background=false, interactive=true, cache-age=4294967295 12:55:35 PackageKit locale changed to en_GB.UTF-8 12:55:35 PackageKit cache-age changed to -1 So I think the gnome-software transaction is actually the key here. Normally (while online) canceling `pkcon get-updates` doesn't cause a problem. I don't think canceling it after we go offline is really causing a problem. The problem is gnome-software queues up a transaction, which gets released when we're offline (once I cancel my pkcon transaction). Submitting a transaction while we're offline, dumps failed results into the "higher level" cache. "PackageKit-Hif: using cached sack". Restarting packagekitd seems to be enough to clear this cache. If gnome-software canceled transactions when we went offline, it wouldn't be a problem. (I assume it doesn't *create* transactions when offline, instead waiting until we go online). I think it's still wrong if other software (like pkcon) could pollute the cache while we're offline. But it's more complex to work out how to fix it on the packagekitd side. We could avoid caching failures while offline. That's not exactly what this argument implies however. It's asking for a tight coupling with gnome-software - it should use exactly the same criteria. I don't know if gnome-software retries when the network connection is changed (while remaining on). Or if a second network connection is added. Or if there's a separate flag that flips after a captive portal login... The simplest solution is that failures don't need to be cached. The problem is, if the path to your repo is unreliable, every query is going to go out to the network. PK aims to make queries light weight so they can be integrated into the gui shell etc... It's not great for power consumption. And I don't want to add massive timeouts in every call to command-not-found while offline. So I think we need to make sure all the online/offline logic is in packagekitd, not gnome-software, and just have gnome-software subscribe to a signal that tells it the network status has changed & it might want to retry. (And invalidate the cached failures at that point). IMO if NM happens to have a signal that does exactly what clients would want, it would make a better abstraction if PK relayed the information. There's a question whether we want to assume the repos are unreachable when PK thinks we're offline. I mean, if PK only ever did the dumbest thing, looking at online/offline, that would almost be a fix on it's own. Except for some race conditions. And I'd call this bug report a race condition... I don't like this idea. Another question is whether we need to avoid caching failures indefinitely if we don't see changes in network connection status. I believe GNOME Software does something similar at a higher level - which will mitigate this bug report given that `pkcon -c -1` is working fine. So I'm not majorly concerned, it's just a bit of a worry. I'm also curious about when you ask for a refresh when we already have a cache, whether the cache is kept on failure. I still don't understand what the difference is between `pkcon -c -1 refresh` and `pkcon refresh force`. > need to avoid caching failures indefinitely if we don't see changes in network connection status. I believe GNOME Software does something similar at a higher level - which will mitigate this bug report
Wups, I don't think that's right. Whatever retry logic GNOME Software has, I think PK is returning "success" (but not updates). PK "hides" the failure, therefore GNOME Software would have no reason to increase the frequency it checks at.
E.g. running `pkcon refresh force` while offline does not show any error (and `pkmon` doesn't elaborate on this).
This message is a reminder that Fedora 24 is nearing its end of life. Approximately 2 (two) weeks from now Fedora will stop maintaining and issuing updates for Fedora 24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '24'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 24 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. I can't seem to reproduce this on Fedora 26. Yay! It looks like it changed to "failures don't need to be cached". As predicted, if the configured repos are unreachable, then every single search e.g. in the Gnome app launcher is sending fresh requests out onto the network. Fedora 24 changed to end-of-life (EOL) status on 2017-08-08. Fedora 24 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |