Bug 2034336 - On minimal installs, login fails immediately after username entered
Summary: On minimal installs, login fails immediately after username entered
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: authselect
Version: rawhide
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Pavel Březina
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: openqa
: 2033952 (view as bug list)
Depends On:
Blocks: TRACKER-bugs-affecting-libguestfs F36BetaBlocker 2019052
TreeView+ depends on / blocked
 
Reported: 2021-12-20 17:47 UTC by Adam Williamson
Modified: 2022-01-12 16:30 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-12 06:23:23 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
anaconda logs (gathered by Frantisek) (234.09 KB, application/octet-stream)
2021-12-20 17:52 UTC, Adam Williamson
no flags Details
anaconda logs (Fedora Server, compose from December 17th) (276.44 KB, application/x-xz)
2021-12-20 19:05 UTC, František Zatloukal
no flags Details

Description Adam Williamson 2021-12-20 17:47:23 UTC
Since Fedora-Rawhide-20211215.n.0 , all openQA tests that do a minimal install are failing on login after install. When a valid username is entered, a message "Login incorrect" immediately appears and the username prompt reappears.

This does not happen on installs with the Server package set, and it does not happen on a minimal install of F35 upgraded to Rawhide. It only happens on fresh minimal installs.

Miroslav Vadkerti has reported the same on IRC, and Ondrej Mosnacek reported what's likely the same problem when deploying a Cloud image on devel@:

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/5RGSKLLKWYKJI7D2KLLJDVV6FXK2G342/

I suspect what's happening is anaconda's attempt to do initial authentication configuration on the installed system is now failing somehow. So, CCing anaconda folks. Also CCing zbyszek because he was in on initial discussions about this, when we didn't notice authselect had been updated and systemd was our main suspect.

I'm on vacation currently so Frantisek had been taking point on this, but I figured now multiple people had fallen over it, we should have a bug report at least.

This is a clear violation of Basic criterion "A system installed without a graphical package set must boot to a state where it is possible to log in through at least one of the default virtual consoles" for the release-blocking minimal install package set, so nominating as a Beta blocker.

Comment 1 Adam Williamson 2021-12-20 17:52:08 UTC
Created attachment 1847093 [details]
anaconda logs (gathered by Frantisek)

Here are the anaconda logs from an affected install, gathered by Frantisek. I found this line from `syslog` kinda suspicious:

20:24:46,846 WARNING org.fedoraproject.Anaconda.Modules.Security:DEBUG:anaconda.modules.security.installation:Authselect is not configured. Skipping.


it'd be interesting to compare that to the logs from a working (e.g. Server) install, I think.

Comment 2 František Zatloukal 2021-12-20 19:05:20 UTC
Created attachment 1847096 [details]
anaconda logs (Fedora Server, compose from December 17th)

Comment 3 František Zatloukal 2021-12-20 19:07:43 UTC
(In reply to Adam Williamson from comment #1)
> Created attachment 1847093 [details]
> anaconda logs (gathered by Frantisek)
> 
> Here are the anaconda logs from an affected install, gathered by Frantisek.
> I found this line from `syslog` kinda suspicious:
> 
> 20:24:46,846 WARNING
> org.fedoraproject.Anaconda.Modules.Security:DEBUG:anaconda.modules.security.
> installation:Authselect is not configured. Skipping.
> 
> 
> it'd be interesting to compare that to the logs from a working (e.g. Server)
> install, I think.

Hmm, the problem has to be elsewhere, just tried the Fedora Server (compose from 17th of Dec), it too has the line in logs:

18:50:24,838 WARNING org.fedoraproject.Anaconda.Modules.Security:DEBUG:anaconda.modules.security.installation:Authselect is not configured. Skipping.

However, it doesn't suffer from this issue.

I've added complete logs from Server installation as another attachment.

Comment 4 François Rigault 2021-12-20 19:42:07 UTC
*** Bug 2033952 has been marked as a duplicate of this bug. ***

Comment 5 Adam Williamson 2021-12-20 19:56:26 UTC
In the duplicate, François says the problem may be that the authselect package is simply not installed in the deployed system. Frantisek, can you check that?

Comment 6 François Rigault 2021-12-20 20:20:39 UTC
the authselect package is installed but I had to run "authselect select minimal --force" myself (through cloud-init) to make this work. Thanks for your help following this up.

Comment 7 Miroslav Vadkerti 2021-12-20 20:52:43 UTC
Thanks for the investigation. This is blocking update of the image in Fedora CI, we will for now use a non-affected image from 12th December: https://status.testing-farm.io/issues/2021-12-20-rawhide-outage/

Comment 8 František Zatloukal 2021-12-20 21:30:45 UTC
(In reply to François Rigault from comment #6)
> the authselect package is installed but I had to run "authselect select
> minimal --force" myself (through cloud-init) to make this work. Thanks for
> your help following this up.

Yeah, authselect is present even on borked installations.

Since the package rebase to 1.3.0 ( https://src.fedoraproject.org/rpms/authselect/c/f16f0317f49c99c849413353fb71228b9e1242b1?branch=rawhide ), authselect is doing:

%{_bindir}/authselect select %{default_profile} --force

under some circumstances that vary based on new installation/upgrade. 

I'll try to think about this more tomorrow, but if I had to guess, this would be the place where I'd bet the problem is.

Comment 9 František Zatloukal 2021-12-21 00:12:07 UTC
Okey, I've created a copr repo with updated authselect scriptlet: https://copr.fedorainfracloud.org/coprs/frantisekz/authselect-test/

The change on authselect.spec, just to verify my hypothesis was naive:

@@ -327,6 +327,8 @@ if [ -f %{forcefile} ]; then
     %__rm -f %{forcefile}
 fi

+%{_bindir}/authselect select %{default_profile} --force &> /dev/null
+
 # Apply any changes to profiles (validates configuration first internally)
 %{_bindir}/authselect apply-changes &> /dev/null


Adding that copr during a netinst installation process fixes the issue.

So, it seems the problem is:
https://src.fedoraproject.org/rpms/authselect/blob/rawhide/f/authselect.spec#_281-284

and/or

https://src.fedoraproject.org/rpms/authselect/blob/rawhide/f/authselect.spec#_325-328

I'll try to figure out some proper solution.

Comment 10 Adam Williamson 2021-12-21 00:23:31 UTC
Aha, well, then I rather suspect this shows us the problem:

21:22:21,107 INF dnf.rpm: /var/tmp/rpm-tmp.NEXkYx: line 2: /usr/bin/rm: No such file or directory
/var/tmp/rpm-tmp.NEXkYx: line 4: touch: command not found

I bet that's the `rm` and `touch` commands that the authselect-libs %pre script uses to try and create the "forcefile" failing:

# Check if this is a new installation.
%__rm -f %{forcefile}
if [ $1 -eq 1 ] ; then
    touch %{forcefile}
fi

I guess it's installed before coreutils. I think this may be the kind of situation where using lua would be safer?

Comment 11 Adam Williamson 2021-12-21 07:08:06 UTC
Frantisek sent a PR to just add `Requires(pre): coreutils`:

https://src.fedoraproject.org/rpms/authselect/pull-request/13

Comment 12 Richard W.M. Jones 2021-12-28 11:44:17 UTC
(Adding to virt-sysprep tracker, assuming this is not actually a bug
in virt-sysprep)

Comment 13 Adam Williamson 2021-12-28 17:09:01 UTC
Note, I actually merged František's alternative PR that rewrote the scriptlet to Lua:

https://src.fedoraproject.org/rpms/authselect/pull-request/14

but Rawhide composes are now failing on another pair of bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=2034715
https://bugzilla.redhat.com/show_bug.cgi?id=2035812

so until that's sorted out, we can't tell if the fix worked for sure.

Comment 14 Pavel Březina 2022-01-10 10:58:00 UTC
Hi, I was on PTO. Thank you for looking into it.

Comment 15 Adam Williamson 2022-01-12 06:23:23 UTC
Confirmed fixed in Fedora-Rawhide-20220111.n.1.

Comment 16 Pavel Březina 2022-01-12 11:32:57 UTC
Just a question - why it is better to use lua instead of Requires(pre): coreutils? coreutils are required anyway in later scriptlets as well.

Comment 17 František Zatloukal 2022-01-12 12:23:01 UTC
I guess Adam will chime in later on too.

From my point of view, the other scriptlets are not that of an issue, because they're being executed at the end of the transaction. The problem is in "pre" scriptlets, especially in a such package as authselect is, present on base installations, required by chroot base.

Adding Requires(pre) solved the issue just fine, however, in the future, there can be dependency chain changes that could result in coreutils depending (directly or indirectly) on authselect. And it would bite us again. The main idea is "smaller dependency chain is better, more change failure resistant to the future".

Comment 18 Adam Williamson 2022-01-12 16:30:20 UTC
Right, the thing that worries me is the dependency loop problem. We already actually have several dependency loops in core packages, and they have caused difficult-to-debug breakages in the past.

The scenario that would be a problem would be if coreutils somehow wound up depending on authconfig (not just directly, but via any of its own dependencies). If that happened, libdnf would have an unsolvable problem: authconfig wants it to install coreutils before authconfig itself, but the deps of coreutils include authconfig, which would mean it should install authconfig before coreutils. There is no correct solution to that problem (or others of a similar nature).

When it runs into loops of that nature, dnf *essentially* just guesses an order. It's not a pure guess - if nothing in the calculation changes, dnf will always resolve it the same way - but if some part of the calculation changed somehow, dnf might suddenly flip and decide to resolve it the other way. If one order happens to work out OK but the other doesn't, that can mean that some not-obviously-related change suddenly causes a loop to be resolved differently from the way it was before, and suddenly something that was working before is broken.

As I said, we've run into this problem more than once, and it's never fun to work out. So now I'm kinda sensitive to situations where we could conceivably run into it again, and try to avoid them!

As Frantisek said, other scriptlets are less of a problem due to *when* they'd run. One is a preun script (which runs on package removal), so doesn't really come into this 'initial deployment ordering' situation at all. The other is a %posttrans script, which of course runs after *all* packages have been installed, so again, ordering isn't a big deal.


Note You need to log in before you can comment on or make changes to this bug.