Hide Forgot
In Fedora-25-20160921.n.0 , you cannot log in to a system enrolled into a FreeIPA domain via kickstart as a user account configured in the domain (and permitted to log into the system). This is an openQA test failure: https://openqa.fedoraproject.org/tests/35324 We have three tests that enrol into (the same) FreeIPA domain in various ways: one enrols via a kickstart, one enrols by running 'realm join' from the installed system, and one enrols from Cockpit from the installed system. They all run through the same post-enrolment verification steps. We run the same tests on two openQA instances (prod and staging). On both instances, the two post-install enrolment tests passed, and the kickstart enrolment test failed - so this seems reproducible, it's probably not some kind of weird one-off. The same test passed for the three previous testable composes - 20160910.n.0, 20160911.n.0, 20160912.n.0. (We've had some compose issues lately and haven't had a testable F25 compose between 0912 and today). The same test passes on today's Rawhide compose - it's only failing on F25. The tests create a user account called 'test1' on the server and grant it access to do everything, everywhere: ipa hbacrule-add testrule --servicecat=all --hostcat=all ipa hbacrule-add-user testrule --users=test1 The kickstart command for enrolling the client system in the domain looks like this: realm join --one-time-password=monkeys ipa001.domain.local You can find logs from the test here: https://openqa.fedoraproject.org/tests/35324/file/freeipa_client_postinstall-var_log.tar.gz I do notice something in the anaconda program.log: 10:40:58,837 INFO program: Unable to find 'admin' user with 'getent passwd admin'! 10:40:58,837 INFO program: Unable to reliably detect configuration. Check NSS setup manually. I don't know for sure whether that's unexpected or new, though (I don't know if the same log message appears when the test passes). Note that an earlier step in the post-enrolment checks is to run nearly that same command - `getent passwd admin` - on the installed system, and that *does* work: https://openqa.fedoraproject.org/tests/35324#step/freeipa_client_postinstall/7 so somehow either that works in the post-install environment but not during install, or it works with upper-case DOMAIN.LOCAL but not lower-case domain.local, I guess.
Proposing as a Beta blocker per Alpha criterion "It must be possible to join the system to a FreeIPA or Active Directory domain at install time and post-install, and the system must respect the identity, authentication and access control configuration provided by the domain." - this is the 'at install time' case, as the only way you can enrol 'at install time' really is this way.
Adam, the sssd logs are empty. Can we see the same test with debug_level=10 added to all sections in sssd.conf file? Can we also see the nsswitch.conf file from the system?
var/log/message says: Sep 21 10:43:06 client001 audit: USER_AUTH pid=1331 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:local_login_t:s0-s0:c0.c1023 msg='op=PAM:authentication grantors=? acct="te st1" exe="/usr/bin/login" hostname=? addr=? terminal=tty3 res=failed' Sep 21 10:43:06 client001 audit: USER_LOGIN pid=1331 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:local_login_t:s0-s0:c0.c1023 msg='op=login id=1778200001 exe="/usr/bin/log in" hostname=? addr=? terminal=tty3 res=failed' Sep 21 10:43:06 client001 [sssd[krb5_child[1348]]]: Preauthentication failed and var/log/secure contains Sep 21 10:43:06 client001 login: pam_sss(login:auth): authentication failure; logname=LOGIN uid=0 euid=0 tty=tty3 ruser= rhost= user=test1 Sep 21 10:43:06 client001 login: pam_sss(login:auth): received for user test1: 17 (Failure setting user credentials) Sep 21 10:43:06 client001 login: FAILED LOGIN 1 FROM tty3 FOR test1, Authentication failure +1 for log files with debug_level=10.
hmm, this passed again on the last two days. I'll keep an eye on it for the next few days and if it doesn't happen again I guess we can chalk it up to weird magic pixies or something...
(In reply to Adam Williamson from comment #4) > hmm, this passed again on the last two days. I'll keep an eye on it for the > next few days and if it doesn't happen again I guess we can chalk it up to > weird magic pixies or something... Then please remove beta blocker for this BZ.
It's only a proposed blocker at this point. But sure.
Well, this did fail one more time since 0921.n.0, in one test run of 0924.n.0 on staging: https://openqa.stg.fedoraproject.org/tests/45008 but on the other hand it passed on another test run of the same image (I've re-run tests several times on staging lately for unrelated reasons): https://openqa.stg.fedoraproject.org/tests/45530 so this seems like some kind of very intermittent issue that can't be reproduced reliably...debugging it may be hard. But it definitely *is* happening, and I definitely *haven't* seen it happen ever with the post-install enrol tests...
(In reply to Adam Williamson from comment #7) > so this seems like some kind of very intermittent issue that can't be > reproduced reliably...debugging it may be hard. But it definitely *is* > happening, and I definitely *haven't* seen it happen ever with the > post-install enrol tests... And we *definitely* need to see verbose sssd log files requested in Comment 2
it's a bit tricky to get them, since I can't reproduce the bug on demand. I'll have to modify things so I can run the test over and over. So it'll take a bit.
Bump any progress with reproducer after 3 weeks?
Haven't had time, there were too many more urgent Beta issues to deal with. The bug has happened again once more on F25: https://openqa.fedoraproject.org/tests/39078 and on Rawhide, on staging: https://openqa.stg.fedoraproject.org/tests/46707 There was also another odd F25 failure a few days later: https://openqa.fedoraproject.org/tests/39852 where it failed even earlier in the verification steps, when it ran 'realm list' - no realm was shown. But I've got a big backlog of stuff to work on ATM so I don't know when I'm gonna make it to this...
Thank you very much for update. Requested data from comment2 are still required for analysis.
The requested data has not been provided in almost two months. Feel free to reopen with attached requested data.
well, openQA tests Rawhide and Branched. there is no Branched ATM, and Rawhide can never reach this point, because FreeIPA is broken long before this; it is impossible to deploy the server.
(In reply to Adam Williamson from comment #14) > well, openQA tests Rawhide and Branched. there is no Branched ATM, and > Rawhide can never reach this point, because FreeIPA is broken long before > this; it is impossible to deploy the server. Maybe it would be simpler to prepare job for f25/f24 rather then waiting for fixed freeIPA. And if the main problem is with BZ1387425 please try to increase priority of ticket with explanation because based on upstream ticket freeIPA team does not plan to fix it soon. Or maybe they can at least provide a workaround.
Well, no, it's not simpler. I'll spare you the long explanation, but it really isn't. =) Basically the function of openQA at present is release validation testing. There is no 'release validation testing' for f24 or f25 because they're already released. There are no nightly candidates to test. We could in theory have openQA do useful testing of stable releases, but conceptualizing that and setting it up is a whole big job of work, it's not a 'just twiddle this one setting here' kind of thing. The current immediate cause of server deployment failures on Rawhide is https://bugzilla.redhat.com/show_bug.cgi?id=1403352 , but of course there may well be other problems once that one's fixed.
I don't have anything that looks like a case of this in the last 11 months, FWIW...so it probably went away at some point.