Bug 2016510

Summary: GNOME Software does not always show packages from third party repos if enabled via gnome-initial-setup
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: PackageKitAssignee: Richard Hughes <rhughes>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 35CC: bcotton, gnome-sig, jonathan, kparal, lruzicka, mcatanza, mcrha, otaylor, rdieter, rhughes, robatino, smparrish, vincent, zpytela
Target Milestone: ---Keywords: CommonBugs
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: https://ask.fedoraproject.org/t/we-are-testing-a-new-common-issues-category-and-process/18916 AcceptedBlocker https://fedoraproject.org/wiki/Common_F35_bugs#3rd-party-apps-missing
Fixed In Version: PackageKit-1.2.4-3.fc35 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-28 18:28:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1891955    
Attachments:
Description Flags
journal (steam not found, nvidia found)
none
journal when steam is missing (attempt 2)
none
gnome-software --verbose output none

Description Adam Williamson 2021-10-21 18:47:52 UTC
I'm filing a new bug for this as it's not really the same as https://bugzilla.redhat.com/show_bug.cgi?id=2010740 or https://bugzilla.redhat.com/show_bug.cgi?id=2001837 , where we've been discussing it so far.

With both of those bugs fixed, we still have a problem: even though fedora-third-party now reliably enables the third party repositories, GNOME Software does not always show the packages from them at first. They show up if you force a refresh by refreshing updates on the Updates pane, or running `pkcon refresh`. They would probably also start showing up in a day or two if you did nothing. But if you just run through g-i-s, enable third party repos, then run Software and search for NVIDIA, sometimes it's not there.

kparal was able to reproduce this problem very often (more times than not). I reproduced it on 40-50% of my tries. Some other testers rarely hit it.

The easiest fix for this may be to make fedora-third-party run pkcon refresh after enabling repos; we don't know if that's technically the *correct* fix. This would likely require further SELinux permissions, so CCing Zdenek.

In the Go/No-Go meeting today - discussing this problem under the topic of https://bugzilla.redhat.com/show_bug.cgi?id=2010740 - it was accepted as a blocker:

https://meetbot-raw.fedoraproject.org/fedora-meeting/2021-10-21/f35-final-go_no_go-meeting.2021-10-21-17.00.html

so I'm going to transfer that status here, and re-close 2001837.

Comment 1 Owen Taylor 2021-10-21 19:52:07 UTC
The difficulty here is figuring out if the system is using PackageKit or not....

A system might not be running PackageKit:

 - Because it is an rpm-ostree system
 - Because it is a rpm-based desktop system without GNOME 
 - Because it is not a desktop system at all
 - Because it is a future system where we have switched to dnf5

fedora-third-party is potentially used in all of these, though mostly in the first case. I don't love using the presence of /usr/bin/pkcon to know if PackageKit is in use since that goes against the general "installed is not the same as enabled".

fedora-third-party could check if packagekit is running by:

  gdbus call --system -d org.freedesktop.DBus -o /org/freedesktop/DBus -m org.freedesktop.DBus.NameHasOwner org.freedesktop.PackageKit

That will return (true,) if packagekit is running.

(Activation of PackageKit on a GNOME system is decidedly accidental - every instance of gnome-shell - including the greeter and gnome-initial-setup sessions has an instance of EndSessionDialog, which in its constructor gets a proxy for PackageKit, causing it to be activated. Or so it seems to me, and Ray Strode read the code the same way.)

If fedora-third-party detects that packagekit is in use, it should use 'pkcon repo-enable/repo-disable' - I don't think that 'pkcon refresh force' is a good idea - on a pretty fast network, that takes 34 seconds, doing that is definitely going to increase the probability of a user getting to GNOME Software while packagekitd is still downloading data.

Comment 2 Owen Taylor 2021-10-22 03:23:19 UTC
I discussed this some more with Ray Strode and we looked at the source code for PackageKit and libdnf, and it really looks like this *is supposed* to be handled there - libdnf has a directory monitor on /etc/yum.repos.d, and emits a signal, but doesn't refresh the appstream data.

Furthermore, in testing 'pkgconf repo-enable' has the same problem - the appstream data isn't updated.

The algorithm for refresh is something like:
 * Loop over repositories
 * Call dnf_repo_check()
 * If that returns FALSE, refresh the repository

But a FALSE return means "encountered an error" - which could be "stale cached data" - but a TRUE return does *not* mean "the repository is unchanged" or anything like that.


I'm going to move this to PackageKit - I'll look (not right now) to see if I can figure out if there's some reasonably compact fix or workaround there, but help would be appreciated.

Comment 3 Milan Crha 2021-10-22 05:50:52 UTC
Could anyone give a try to this gnome-software scratch build, please?
https://koji.fedoraproject.org/koji/taskinfo?taskID=77613497

More info about it is here:
https://bugzilla.redhat.com/show_bug.cgi?id=2001837#c106

Comment 4 Owen Taylor 2021-10-22 14:36:47 UTC
As I understand it, that scratch build adds https://gitlab.gnome.org/GNOME/gnome-software/-/merge_requests/1024 - I'm skeptical that it's going to make a difference - I think it's essentially dealing with the same bug, but it looks like it will only fix it when the repository enablement happens *through* the repo dialog, while here it is happening before GNOME Software is even running.

I'm still confused a bit here about exactly what is happening on initial install - my testing has been with disabling/enabling repositories which is a bit more likely to exercise edge cases. I'll try to get to an exact sequence of events leaving us in the bad state.

Comment 5 Owen Taylor 2021-10-22 17:58:30 UTC
OK, I understand the particular bug that is going on inside PackageKit.

 The way that appstream data is donwloaded is that pk_backend_refresh_cache_thread() calls dnf_repo_check() (several times...) and if that fails, it calls dnf_repo_update(), which does the actual download, and then dnf_utils_refresh_repo_appstream() to copy the XML into place.

 But there's another codepath that could do the download  dnf_utils_create_sack_for_filters() => dnf_utils_add_remote() => dnf_sack_add_repos [in libdnf] => dnf_repo_update() 

If that gets called *before* the refresh, then the repo will already be up-to-date and dnf_utils_refresh_repo_appstream() will do nothing. I'm not 100% sure what the code paths are that cause this ordering - is it GNOME Shell checking for updates on reboots? Something GNOME Software is doing in parallel to triggering a refresh? But in any case, it doesn't really matter.

You can put things into the bad state with:

 pkcon disable rpmfusion-nonfree-steam
 Remove all steam files from /var/cache/PackageKit, /var/cache/app-info/xmls
 pkcon enable rpmfusion-nonfree-steam
 pkcon get-updates

At that point, the steam appdata will not be added into /var/cache/app-info/xmls unless a refresh is forced.

Comment 6 Owen Taylor 2021-10-22 18:43:16 UTC
Status: have a patch, working on a scratch build to test it.

Comment 7 Owen Taylor 2021-10-22 19:23:58 UTC
Scratch build: https://koji.fedoraproject.org/koji/taskinfo?taskID=77672918
PackageKit pull request: https://github.com/PackageKit/PackageKit/pull/514

I've tested the patch to fix artificial reproductions and boot into a newly installed system (updating the system from chroot after install). The newly-installed-system reproduction is, I think, somewhat unreliable and racy, so it's *conceivable* that I just got lucky when I tested twice, but I don't think so. More testing appreciated, especially from those who can reproduce this reliably.

One thing to note is that when the user first logs into the newly created account:

 /var/cache/app-info/xmls

will start out empty and then fill in as packagekitd finishes downloading repo information. I don't think should cause problems even if the user runs Software immediately - but it could confuse testing if you are looking at that directory to see if things are working.

Comment 8 Adam Williamson 2021-10-22 23:47:02 UTC
Thanks Owen! I have openQA building a live image with the fix: https://openqa.stg.fedoraproject.org/tests/1416506 - when that job is finished, you should be able to see the live ISO for download on the Assets tab. I'll try it out when I can.

Comment 9 Adam Williamson 2021-10-23 18:45:35 UTC
I ran 10 tests with the live image from openQA, ten successes. Fix looks good to me. I'll get a PK build done.

Comment 10 Fedora Update System 2021-10-24 01:21:42 UTC
FEDORA-2021-72c037f1fc has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2021-72c037f1fc

Comment 11 Fedora Update System 2021-10-24 19:08:38 UTC
FEDORA-2021-72c037f1fc has been pushed to the Fedora 35 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-72c037f1fc`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-72c037f1fc

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 12 Kamil Páral 2021-10-25 10:42:06 UTC
Created attachment 1836703 [details]
journal (steam not found, nvidia found)

(In reply to Adam Williamson from comment #8)
> Thanks Owen! I have openQA building a live image with the fix:
> https://openqa.stg.fedoraproject.org/tests/1416506 - when that job is
> finished, you should be able to see the live ISO for download on the Assets
> tab. I'll try it out when I can.

I ran 10 round of g-i-s -> enable 3rd party repos -> gnome-software -> search "steam" and "nvidia". In 9 rounds, everything was ok. In 1 round, however, I only could see nvidia, but no steam. `pkcon search` knew about both, but gnome-software didn't see steam. I'm attaching a system journal from that round, if you can find anything in there. It's worth noting that I haven't seen this situation before. Either both were available, or both were unavailable. So seeing just one and not the other is new. Either the patch isn't perfect and there's still a path which allows this to happen, or this might also be a different bug which got uncovered (and hopefully is rarer than the original problem).

Comment 13 Lukas Ruzicka 2021-10-25 10:56:22 UTC
I have run 5 rounds of this reproducer and always saw both packages available, nvidia and steam. Twice, I had to restart the virtual machine, because Software took ages to download the metadata. pkmon did not show anything and looked like being in a stand-by mode.

Comment 14 Milan Crha 2021-10-25 12:39:32 UTC
> I only could see nvidia, but no steam.

Is the corresponding appstram file for the steam repository present in /var/cache/app-info/xmls/?

> Twice, I had to restart the virtual machine, because Software took ages to download the metadata.

The next time try to get a backtrace of the gnome-software, to see whether it does anything. It can be blocked on some locks.

Comment 15 Owen Taylor 2021-10-25 14:13:34 UTC
Lukas Ruzicka write:
> Twice, I had to restart the virtual machine, because Software took ages to download the metadata. pkmon did not show anything and looked like being in a stand-by mode.

Did you see a progress bar with a message saying that it was downloading metadata from flathub? I saw a hang here once when testing - I investigated some, but I couldn't get further than seeing that ostree was supposed to be pulling the appstream but was making no progress. I couldn't determine if it was a server-side problem on flathub or a bug in ostree.

Kamil Páral wrote:
> In 1 round, however, I only could see nvidia, but no steam. `pkcon search` knew about both, but gnome-software didn't see steam. I'm attaching a system journal from that round, if you can find anything in there

Hmm, that's sort of disheartening. I don't see any error messages in the log that would obviously explain it. The log shows two simultaneous transactions that likely were "racing" to download the repository data:
 
Oct 25 12:24:32 localhost-live PackageKit[1091]: refresh-cache transaction /1_cabeaadc from uid 1000 finished with cancelled-priority after 9230ms
Oct 25 12:24:51 localhost-live PackageKit[1091]: resolve transaction /2_ccaeecec from uid 1000 finished with success after 18302ms

But it's not obviously clear to me how that would result in the appstream data not being written. As Milan said, it would be really useful to know whether the steam appstream data ended up in /var/cache/app-info/xmls or not. There's also a lot of complexity in the GNOME Software appstream code / xmlb watches and it's not hard to imagine that if multiple notifications came in at the same time things could get confused. The previous problem could clearly be tracked to PackageKit because the files were missing from /var/cache/app-info/xmls and GNOME Software has nothing to do with that.

Also, it would be useful to know if restarting GNOME Software makes things better.

Comment 16 Kamil Páral 2021-10-25 14:31:18 UTC
(In reply to Milan Crha from comment #14)
> Is the corresponding appstram file for the steam repository present in
> /var/cache/app-info/xmls/?

Unfortunately I failed to make a VM snapshot, so I don't have that environment anymore :-/ I'll try to reproduce it again.

Comment 17 Adam Williamson 2021-10-25 16:33:35 UTC
On Lukas' note "Twice, I had to restart the virtual machine, because Software took ages to download the metadata" - I think I saw this once in my trials, but it did *eventually* get there if I waited long enough (several minutes at least).

Comment 18 Kamil Páral 2021-10-25 17:13:45 UTC
After a couple of attempts, I happened to trigger "nvidia present, steam missing" issue again.

$ ls -l /var/cache/app-info/xmls/
total 36
-rw-r--r--. 1 root root    92 Oct 25 18:54 phracek-PyCharm.xml
-rw-r--r--. 1 root root 21326 Oct 25 18:54 rpmfusion-nonfree-nvidia-driver.xml
-rw-r--r--. 1 root root  5080 Oct 25 18:54 rpmfusion-nonfree-steam.xml

$ stat /var/cache/app-info/xmls/rpmfusion-nonfree-steam.xml 
  File: /var/cache/app-info/xmls/rpmfusion-nonfree-steam.xml
  Size: 5080      	Blocks: 16         IO Block: 4096   regular file
Device: 21h/33d	Inode: 155855      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:fwupd_cache_t:s0
Access: 2021-10-25 18:54:23.613663123 +0200
Modify: 2021-10-25 18:54:23.613663123 +0200
Change: 2021-10-25 18:54:23.619663123 +0200
 Birth: 2021-10-25 18:54:23.613663123 +0200

$ stat /var/cache/app-info/xmls/rpmfusion-nonfree-nvidia-driver.xml 
  File: /var/cache/app-info/xmls/rpmfusion-nonfree-nvidia-driver.xml
  Size: 21326     	Blocks: 48         IO Block: 4096   regular file
Device: 21h/33d	Inode: 155803      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:fwupd_cache_t:s0
Access: 2021-10-25 18:54:23.750663123 +0200
Modify: 2021-10-25 18:54:20.664663123 +0200
Change: 2021-10-25 18:54:20.666663123 +0200
 Birth: 2021-10-25 18:54:20.664663123 +0200

It seems that steam.xml was created 3 seconds after nvidia.xml?

If I use "gnome-software --quit" to terminate gnome-software, and then start it again, then it *does* show Steam. So a reboot/relog would be sufficient to make the missing app visible (which is Steam in my case, but could probably easily happen to be Nvidia instead). Force-checking for updates inside gnome-software also fixes the visibility.

This time I made a VM snapshot, so I can provide additional information if needed.

Comment 19 Kamil Páral 2021-10-25 17:14:49 UTC
Created attachment 1836940 [details]
journal when steam is missing (attempt 2)

Comment 20 Kamil Páral 2021-10-25 17:19:04 UTC
Created attachment 1836942 [details]
gnome-software --verbose output

I captured this gnome-software output, after seeing nvidia present and steam not present, then quitting gnome-software using `gnome-software --quit`, and then running `gnome-software --verbose` (that's this file). Perhaps it can uncover why steam.xml wasn't considered during the initial run.

I understand it would be more helpful to provide `gnome-software --verbose` output from its initial run after the very first login, but for that I'd have to modify the installed system without booting into it first (and then be lucky enough to again trigger this bug). I can try to do that, if people can't figure out what's wrong from the data already provided.

Comment 21 Owen Taylor 2021-10-25 21:52:42 UTC
I think this is very likely the GNOME Software problem that is fixed in the scratch build referenced in https://bugzilla.redhat.com/show_bug.cgi?id=2016510#c3 - the symptom of that (as I understand it) that after the xmlb cache of the appstream data is rebuilt once, further file triggers don't work correctly. So probably it was rebuilt once before the steam appstream xml appeared, and wasn't rebuilt when the steam appstream xml appeared. Does that sound right, Milan?

There's something else that I worry about with the xmlb cache revalidation - when we are rebuilding the cache, the sequence is:

 * Create a new XbSilo from the current content
 * Add watches

So if a change happens between those two steps, it will get lost, isn't that right? But that should be rarer (though not *extremely* rare).

Comment 22 Adam Williamson 2021-10-25 22:10:30 UTC
I think we can fix less-common cases as post-release updates, if this mostly works most of the time it's probably good enough...does anyone think the current state is unreleasable?

Comment 23 Michael Catanzaro 2021-10-25 23:59:23 UTC
(In reply to Adam Williamson from comment #22)
> I think we can fix less-common cases as post-release updates, if this mostly
> works most of the time it's probably good enough...does anyone think the
> current state is unreleasable?

I agree with Adam.

Comment 24 Owen Taylor 2021-10-26 03:23:40 UTC
> I think we can fix less-common cases as post-release updates, if this mostly works most of the time it's probably good enough...does anyone think the current state is unreleasable?

No, I think this should be OK to release with - (with fingers crossed) it shouldn't affect too many people and unlike what we fixed in PackageKit, it does clear with a reboot.

Comment 25 Milan Crha 2021-10-26 05:48:28 UTC
(In reply to Owen Taylor from comment #21)
> So probably it was rebuilt once before the steam appstream xml appeared, and wasn't rebuilt
> when the steam appstream xml appeared. Does that sound right, Milan?

That's my guess too, yes.

> There's something else that I worry about with the xmlb cache revalidation -
> when we are rebuilding the cache, the sequence is:
> 
>  * Create a new XbSilo from the current content
>  * Add watches
> 
> So if a change happens between those two steps, it will get lost, isn't that
> right? But that should be rarer (though not *extremely* rare).

This is an interesting idea. I opened a bug for it upstream:
https://gitlab.gnome.org/GNOME/gnome-software/-/issues/1517

Comment 26 Kamil Páral 2021-10-26 06:46:23 UTC
(In reply to Owen Taylor from comment #21)
> I think this is very likely the GNOME Software problem that is fixed in the
> scratch build referenced in
> https://bugzilla.redhat.com/show_bug.cgi?id=2016510#c3 - the symptom of that
> (as I understand it) that after the xmlb cache of the appstream data is
> rebuilt once, further file triggers don't work correctly. So probably it was
> rebuilt once before the steam appstream xml appeared, and wasn't rebuilt
> when the steam appstream xml appeared. Does that sound right, Milan?

I'm adding a CommonBugs entry, so that we can mention this remaining issue. I'm also happy to test the updated gnome-software, if we decide to squeeze it through the freeze, or we slip again.

Comment 27 Lukas Ruzicka 2021-10-26 06:59:01 UTC
(In reply to Owen Taylor from comment #15)

> Did you see a progress bar with a message saying that it was downloading
> metadata from flathub? I saw a hang here once when testing - I investigated
> some, but I couldn't get further than seeing that ostree was supposed to be
> pulling the appstream but was making no progress. I couldn't determine if it
> was a server-side problem on flathub or a bug in ostree.

This was exactly the point where it got stalled.

> 
> Also, it would be useful to know if restarting GNOME Software makes things
> better.

It did not. Restarting the Gnome-Software opened it at the very same position with the Flathub metadata being downloaded.
Restarting the whole system worked and after it rebooted the metadata seemed to have been downloaded.

Comment 28 Milan Crha 2021-10-26 08:49:02 UTC
(In reply to Kamil Páral from comment #26)
> I'm also happy to test the updated gnome-software, if we decide to squeeze
> it through the freeze, or we slip again.

For what it's worth, there's an official 41.1 release by the end of this week. The scratch build was meant as a test whether things improved. If not, additional fixes could be done before the 41.1 release.

Comment 29 Kamil Páral 2021-10-26 11:46:35 UTC
I did 10 new passes today with RC1.2 and in all cases, repos worked as expected (no app missing from gnome-software). So unfortunately I'm probably unable to say whether the latest gnome-software snapshot improves the race condition from comment 12, because even the base RC iso works perfectly for me today :-/

Comment 30 Owen Taylor 2021-10-26 13:28:45 UTC
(In reply to Lukas Ruzicka from comment #27)
> (In reply to Owen Taylor from comment #15)
> 
> > Did you see a progress bar with a message saying that it was downloading
> > metadata from flathub? I saw a hang here once when testing - I investigated
> > some, but I couldn't get further than seeing that ostree was supposed to be
> > pulling the appstream but was making no progress. I couldn't determine if it
> > was a server-side problem on flathub or a bug in ostree.
> 
> This was exactly the point where it got stalled.
> 
> > 
> > Also, it would be useful to know if restarting GNOME Software makes things
> > better.
> 
> It did not. Restarting the Gnome-Software opened it at the very same
> position with the Flathub metadata being downloaded.
> Restarting the whole system worked and after it rebooted the metadata seemed
> to have been downloaded.

Could you have closed and opened the window but left GNOME Software running in the background? Unless you use gnome-software --quit or kill the process, it's going to keep running.

Following links in GNOME gitlab, I came across https://gitlab.gnome.org/GNOME/gnome-software/-/issues/1511 - which in combination of my memory of the backtrace I think explains what is going on:

 - ostree and GNOME Software plugin threads are both using GTask
 - There's a hard limit of 10 GTask threads
 - 9 plugin tasks were stuck in a GNOME Software side lock waiting for the 10'th thread
 - The 10 plugin task was in ostree waiting for a task pool thread to free up, so the download could continue

The master issue for fixing that is https://gitlab.gnome.org/GNOME/gnome-software/-/issues/1472

Comment 31 Lukas Ruzicka 2021-10-26 14:19:01 UTC
> Could you have closed and opened the window but left GNOME Software running
> in the background? Unless you use gnome-software --quit or kill the process,
> it's going to keep running.

Yeah, I keep forgetting about this all the time. So, in fact I DID NOT restart the application, only closed and opened the window.

Comment 32 Lukas Ruzicka 2021-10-26 15:21:31 UTC
(In reply to Kamil Páral from comment #29)
> I did 10 new passes today with RC1.2 and in all cases, repos worked as
> expected (no app missing from gnome-software). So unfortunately I'm probably
> unable to say whether the latest gnome-software snapshot improves the race
> condition from comment 12, because even the base RC iso works perfectly for
> me today :-/


Also ran 10 attempts. 

About 4 times, I had to restart the application using `gnome-software --quit` and opening it again. 
Steam and Nvidia packages were available 10 times out of 10, but:

During one attempt (the 8th one), when I started to look for the Steam package, the application reported "No packages found." I retyped the word `steam` again and hit enter and yet, I got the same result. Then, I typed `nvidia` and all nvidia packages were shows as available. Then I typed `steam` again and this time, steam package was available, too. I did not do anything else.

So after all, maybe there is some race condition which might get resolved later, when the application has been used for some time.

Comment 33 Fedora Update System 2021-10-28 18:28:07 UTC
FEDORA-2021-72c037f1fc has been pushed to the Fedora 35 stable repository.
If problem still persists, please make note of it in this bug report.