Created attachment 1816995 [details]
/var/log tarball from a failed test run
In Fedora-35-20210823.n.0, gnome-initial-setup-41~beta-1.fc35 landed, bringing back the "Software" page that stopped working several releases ago - see https://gitlab.gnome.org/GNOME/gnome-initial-setup/-/merge_requests/121 upstream. However, if we do a clean install and boot the installed system, when we reach that page and click the Next button, g-i-s just seems to hang with the button in 'activated' state. After waiting several minutes, it's still in that state. No tty is accessible so the system is unusable at this point.
Note to get even that far in a local VM I have to boot with 'enforcing=0' (it seems selinux is denying some things that prevent g-i-s reaching that point otherwise). openQA seems to reach the hang without needing to go to permissive mode, I'm not sure why (it may be because in openQA we create a root account in the installed system after installing but before rebooting).
I can't immediately see what the cause of this is. I'll attach a tarball of /var/log from an affected openQA test.
Proposing as a Beta blocker per Basic criterion "A system installed with a release-blocking desktop must boot to a log in screen where it is possible to log in to a working desktop using a user account created during installation or a 'first boot' utility" - we don't get to a working desktop.
It's easy to reproduce - get https://kojipkgs.fedoraproject.org/compose/branched/Fedora-35-20210823.n.0/compose/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-35-20210823.n.0.iso , run an install, then just boot the installed system and follow through g-i-s till it hangs. If it doesn't even boot to the first screen of g-i-s, boot with `enforcing=0` (I will file separate bugs on the SELinux denials).
Hm, gnome-initial-setup assumes that fedora-third-party finishes instantaneously. I can try reworking it to use a spinner until the command finishes, which would avoid a UI hang, but of course that won't fix the OpenQA test.
Would it be possible to run 'sudo fedora-third-party' in a tty to see why it's getting stuck?
I couldn't get into a tty when I first tested. I'll try it again tomorrow...
I can confirm, this is also happening for me. I cannot log onto the VM tty, because the liveuser user does not seem to work any longer and no new user has been created, so there is no way how to log into it.
You can set a root password and/or create a user from the installed system root after install but before rebooting (as we do in openQA), I will try that today. There's also the systemd debug shell, but for some reason that didn't work when I tried it yesterday; I'll try that again today as well.
(In reply to Adam Williamson from comment #4)
> You can set a root password
That's probably best. I'm going to try soon, probably later today since this is pretty urgent, and see what I find.
> and/or create a user from the installed system
> root after install but before rebooting (as we do in openQA),
Huh, doesn't that prevent gnome-initial-setup from running at all? It should.
> I will try
> that today. There's also the systemd debug shell, but for some reason that
> didn't work when I tried it yesterday; I'll try that again today as well.
I can't use that because it's going to be qwerty only and it's just too hard to guess which key corresponds to which letter.
> Huh, doesn't that prevent gnome-initial-setup from running at all? It should.
Oh, yeah, I guess creating a user would. Wasn't thinking. :D
> I can't use that because it's going to be qwerty only and it's just too hard to guess which key corresponds to which letter.
I'm not actually sure it necessarily is. We need to be able to set a correct keymap much earlier to decrypt encrypted partitions, so theoretically the rescue shell could use the correct layout. I don't think I've ever tested to see if it *does*, though.
OK, so now I have the rescue shell working. ps aux shows `pkexec --user root /usr/bin/fedora-third-party disabled` running, and `/usr/lib/polkit-1/polkit-agent-helper-1 root`. Perhaps there's an issue with policykit expecting user interaction or something?
attaching strace to the fedora-third-party process shows it sitting at:
restart_syscall(<... resuming interrupted read ...>
and stracing the polkit-agent-helper-1 process shows:
so it definitely looks like they're just sort of sitting around waiting for...something...that isn't happening. Journal doesn't show anything interesting from polkit.
Running either 'fedora-third-party disable' or 'pkexec --user root /usr/bin/fedora-third-party disable' from the root console directly returns immediately.
Oh, the problem definitely *is* that we're waiting for authentication. I killed the fedora-third-party process from the debug shell. That caused tty1 (where g-i-s was running) to go to the "Oh no! Something went wrong" screen. I then alt-f4'ed the "Oh no" screen, and found an authentication prompt hiding "behind" it:
==== AUTHENTICATING FOR org.fedoraproject.thirdparty.run ====
Authentication is required to configure software repositories
Authenticating as: root
so that definitely appears to be the issue.
Looks like adding `org.fedoraproject.thirdparty` to the things allowed in the g-i-s policykit policy fixes this, I tested by editing it live. kalev has run a scratch build with the same change, I'll run that through openQA to confirm the fix there.
Should be hopefully fixed in gnome-initial-setup-41~beta-1.fc35.1
FEDORA-2021-41d8b36cd2 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2021-41d8b36cd2
+3 in https://pagure.io/fedora-qa/blocker-review/issue/400 , marking accepted.
FEDORA-2021-41d8b36cd2 has been pushed to the Fedora 35 stable repository.
If problem still persists, please make note of it in this bug report.