This is another consequence of https://bugzilla.redhat.com/show_bug.cgi?id=1644919 , but I'm filing it separately as we could potentially resolve it without actually making the operation faster. As discussed in that bug, an expensive SELinux operation is commonly run on first login to a system as a domain user. The amount of time it takes seems to vary quite a lot across different configurations, but in the openQA aarch64 tests, it seems to be taking *90* seconds: (Thu Dec 20 05:49:32 2018) [sssd[be[domain.local]]] [selinux_child_create_buffer] (0x4000): buffer size: 43 (Thu Dec 20 05:49:32 2018) [sssd[be[domain.local]]] [child_handler_setup] (0x2000): Setting up signal handler up for pid [23892] (Thu Dec 20 05:49:32 2018) [sssd[be[domain.local]]] [child_handler_setup] (0x2000): Signal handler set up for pid [23892] (Thu Dec 20 05:49:32 2018) [sssd[be[domain.local]]] [sdap_process_result] (0x2000): Trace: sh[0xaaaad0944d60], connected[1], ops[(nil)], ldap[0xaaaad095d4e0] (Thu Dec 20 05:49:32 2018) [sssd[be[domain.local]]] [sdap_process_result] (0x2000): Trace: end of ldap_result list (Thu Dec 20 05:49:32 2018) [sssd[be[domain.local]]] [write_pipe_handler] (0x0400): All data has been sent! (Thu Dec 20 05:51:02 2018) [sssd[be[domain.local]]] [child_sig_handler] (0x1000): Waiting for child [23892]. (Thu Dec 20 05:51:02 2018) [sssd[be[domain.local]]] [child_sig_handler] (0x0100): child [23892] finished successfully. and because it takes *so* long, we actually hit a timeout during login. This doesn't seem to be the systemd timeout - it's more that getty just suddenly exits, and systemd sees that and restarts it: Dec 20 02:50:30 client003.domain.local audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=getty@tty2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Dec 20 02:50:30 client003.domain.local systemd[1]: getty: Service has no hold-off time (RestartSec=0), scheduling restart. Dec 20 02:50:30 client003.domain.local systemd[1]: getty: Scheduled restart job, restart counter is at 1. Dec 20 02:50:30 client003.domain.local systemd[1]: Stopped Getty on tty2. Looking into this, it seems that getty calls 'login' after reading the username - per https://unix.stackexchange.com/questions/144303/change-tty-login-timeout-archlinux - and login has a default 60 second timeout. From 'man login': CONFIG FILE ITEMS login reads the /etc/login.defs(5) configuration file. Note that the configuration file could be distributed with another package (e.g. shadow-utils). The following configuration items are relevant for login(1): ... LOGIN_TIMEOUT (number) Max time in seconds for login. The default value is 60. I'm filing this as a bug against sssd, and not against util-linux, because 60 really does seem like a *reasonable* default, login certainly shouldn't take longer than that. So I don't think we should change the default distro-wide. However, we could potentially make it 120 or something for systems enrolled in a domain; or we could (possibly? I'm not sure if this is technically possible) tweak the implementation in sssd such that login knows login has 'succeeded' before the expensive SELinux operation happens. For openQA, I guess I'll tweak the test to set a higher LOGIN_TIMEOUT value in /etc/login.defs after enrolling the system and before attempting a login...
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle. Changing version to '31'.
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle. Changing version to 31.
For the record, this is still a problem. It's still taking about 90 seconds on aarch64.
I do have the LOGIN_TIMEOUT workaround in the openQA tests, btw, but they seem to have started failing again with an error "Authentication service cannot retrieve authentication info", which I think is different from before. I'll have to figure out where that error is coming from and see if that thing has a configurable timeout.
Welp, that's a PAM error - PAM_AUTHINFO_UNAVAIL - and it's probably coming from the SSSD PAM client, I guess...but we already bumped the sbus timeout to 120 seconds, and I don't know what other timeout we might be hitting here :/ I'll attach a current /var/log tarball from an affected aarch64 test.
Created attachment 1614953 [details] /var/log tarball from a recent aarch64 failure
Hi Adam, to which value did you set LOGIN_TIMEOUT? According to /var/log/secure Sep 11 18:20:14 localhost login[24669]: pam_sss(login:auth): authentication success; logname=LOGIN uid=0 euid=0 tty=tty2 ruser= rhost= user=test1 Sep 11 18:21:42 localhost login[24669]: Authentication service cannot retrieve authentication info The error happens 1min and 28s after the successful authentication. SSSD's selinux_child needed a bit longer: Sep 11 18:20:14 localhost login[24669]: pam_sss(login:auth): authentication success; logname=LOGIN uid=0 euid=0 tty=tty2 ruser= rhost= user=test1 Sep 11 18:21:42 localhost login[24669]: Authentication service cannot retrieve authentication info So my guess is that LOGIN_TIMEOUT is somewhat around 90s? bye, Sumit
No, it's 120. And IIRC, that is not the error we get when we hit that timeout (which is what was happening the first time I encountered this bug), we instead get 'login failed' or something like it.
This package has changed maintainer in the Fedora. Reassigning to the new maintainer of this component.
This message is a reminder that Fedora 31 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '31'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 31 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.