In the Fedora-Rawhide-20190604.n.0 compose, the openQA 'upgrade a FreeIPA server and client from previous stable release' tests started failing. What these linked tests do is this: 1. Server test deploys a Fedora 30 FreeIPA server (with DNS server enabled) 2. Client test (another Fedora 30 system) enrols to the domain 3. Server test upgrades to Rawhide, checks ipa.service is running, signals the client to upgrade 4. Client test upgrades, checks client functionality works Throughout, the client uses the server as its DNS server. What seems to be happening is that, at step 4, the client upgrade fails, because the client suddenly cannot resolve kojipkgs.fedoraproject.org (which it needs to do to run the upgrade, the way things are set up in openQA). When I dug into this a little deeper, it seems that the problem is that, after the server upgrade, when the BIND running on the server forwards the resolution of kojipkgs.fedoraproject.org , the answer it gets back is the *external public* IP for that domain, not the *internal infra private* IP for it. This implies that it is not using the correct forwarder. When the test initially deploys the server, it configures the infra internal DNS servers as forwarders, and this is clearly working or the test would fail at step 2. But it seems that after the upgrade, the server stops forwarding requests to the infra internal DNS servers and instead is forwarding them somewhere else - probably to 8.8.8.8, which IIRC is the default. This means we get the public IP for kojipkgs, which is not accessible from inside infra, and so the test fails. Looking into the BIND logs on the server, it seems to have trouble connecting to LDAP after the upgrade. We see errors like this: 26-Jun-2019 19:47:23.581 LDAP configuration synchronization failed: socket is not connected 26-Jun-2019 19:47:23.581 ldap_syncrepl will reconnect in 60 seconds ... 26-Jun-2019 19:48:23.581 trying to establish LDAP connection to ldapi://%2fvar%2frun%2fslapd-DOMAIN-LOCAL.socket 26-Jun-2019 19:48:23.581 Using default keytab file name: FILE:/etc/named.keytab 26-Jun-2019 19:48:23.581 Credentials are not present in cache (Matching credential not found) 26-Jun-2019 19:48:23.581 Attempting to acquire new Kerberos credentials 26-Jun-2019 19:48:23.582 Failed to get initial credentials (TGT) using principal 'DNS/ipa001.domain.local' and keytab 'FILE:/etc/named.keytab' (Cannot contact any KDC for realm 'DOMAIN.LOCAL') 26-Jun-2019 19:48:23.582 ldap_syncrepl will reconnect in 60 seconds ... 26-Jun-2019 19:49:16.111 LDAP configuration synchronization failed: socket is not connected 26-Jun-2019 19:49:16.111 ldap_syncrepl will reconnect in 60 seconds ... 26-Jun-2019 19:50:17.739 trying to establish LDAP connection to ldapi://%2fvar%2frun%2fslapd-DOMAIN-LOCAL.socket 26-Jun-2019 19:50:17.739 Using default keytab file name: FILE:/etc/named.keytab 26-Jun-2019 19:50:17.769 Credentials are not present in cache (Matching credential not found) 26-Jun-2019 19:50:17.769 Attempting to acquire new Kerberos credentials 26-Jun-2019 19:50:17.962 Failed to get initial credentials (TGT) using principal 'DNS/ipa001.domain.local' and keytab 'FILE:/etc/named.keytab' (Pre-authentication failed: Invalid argument) 26-Jun-2019 19:50:17.963 ldap_syncrepl will reconnect in 60 seconds 26-Jun-2019 19:51:17.964 trying to establish LDAP connection to ldapi://%2fvar%2frun%2fslapd-DOMAIN-LOCAL.socket 26-Jun-2019 19:51:17.964 Using default keytab file name: FILE:/etc/named.keytab 26-Jun-2019 19:51:17.966 Credentials are not present in cache (Matching credential not found) 26-Jun-2019 19:51:17.966 Attempting to acquire new Kerberos credentials 26-Jun-2019 19:51:17.976 Failed to get initial credentials (TGT) using principal 'DNS/ipa001.domain.local' and keytab 'FILE:/etc/named.keytab' (Pre-authentication failed: Invalid argument) 26-Jun-2019 19:51:17.976 ldap_syncrepl will reconnect in 60 seconds krb5kdc.log shows this: Jun 26 19:50:17 ipa001.domain.local krb5kdc[1027](info): AS_REQ (7 etypes {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), DEPRECATED:arcfour-hmac(23), camellia128-cts-cmac(25), camellia256-cts-cmac(26), aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19)}) 10.0.2.100: NEEDED_PREAUTH: DNS/ipa001.domain.local for krbtgt/DOMAIN.LOCAL, Additional pre-authentication required Jun 26 19:50:17 ipa001.domain.local krb5kdc[1027](info): closing down fd 11 Jun 26 19:50:21 ipa001.domain.local krb5kdc[1029](info): AS_REQ (7 etypes {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), DEPRECATED:arcfour-hmac(23), camellia128-cts-cmac(25), camellia256-cts-cmac(26), aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19)}) 10.0.2.100: NEEDED_PREAUTH: ipa-dnskeysyncd/ipa001.domain.local for krbtgt/DOMAIN.LOCAL, Additional pre-authentication required Jun 26 19:50:21 ipa001.domain.local krb5kdc[1029](info): closing down fd 11 Jun 26 19:50:21 ipa001.domain.local krb5kdc[1029](info): AS_REQ (7 etypes {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), DEPRECATED:arcfour-hmac(23), camellia128-cts-cmac(25), camellia256-cts-cmac(26), aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19)}) 10.0.2.100: NEEDED_PREAUTH: ipa-dnskeysyncd/ipa001.domain.local for krbtgt/DOMAIN.LOCAL, Additional pre-authentication required Jun 26 19:50:21 ipa001.domain.local krb5kdc[1029](info): closing down fd 11 Jun 26 19:51:17 ipa001.domain.local krb5kdc[1027](info): AS_REQ (7 etypes {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), DEPRECATED:arcfour-hmac(23), camellia128-cts-cmac(25), camellia256-cts-cmac(26), aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19)}) 10.0.2.100: NEEDED_PREAUTH: DNS/ipa001.domain.local for krbtgt/DOMAIN.LOCAL, Additional pre-authentication required Jun 26 19:51:17 ipa001.domain.local krb5kdc[1027](info): closing down fd 11 Jun 26 19:51:23 ipa001.domain.local krb5kdc[1029](info): AS_REQ (7 etypes {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), DEPRECATED:arcfour-hmac(23), camellia128-cts-cmac(25), camellia256-cts-cmac(26), aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19)}) 10.0.2.100: NEEDED_PREAUTH: ipa-dnskeysyncd/ipa001.domain.local for krbtgt/DOMAIN.LOCAL, Additional pre-authentication required Jun 26 19:51:23 ipa001.domain.local krb5kdc[1029](info): closing down fd 11 Jun 26 19:51:23 ipa001.domain.local krb5kdc[1029](info): AS_REQ (7 etypes {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), DEPRECATED:arcfour-hmac(23), camellia128-cts-cmac(25), camellia256-cts-cmac(26), aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19)}) 10.0.2.100: NEEDED_PREAUTH: ipa-dnskeysyncd/ipa001.domain.local for krbtgt/DOMAIN.LOCAL, Additional pre-authentication required Jun 26 19:51:23 ipa001.domain.local krb5kdc[1029](info): closing down fd 11 I'm pinning this on krb5 as it's about the only relevant thing that changed in the affected compose. FreeIPA didn't, 389-ds didn't and bind didn't. And there are those Kerberos error messages. Proposing as a Beta blocker as this violates "It must be possible to successfully complete a direct upgrade from a fully updated installation of each of the last two stable Fedora Server releases with the system configured as a FreeIPA domain controller or postgresql server as specified in the relevant criteria. The upgraded system must meet all relevant release criteria, including criteria relating to functionality of the server software" - https://fedoraproject.org/wiki/Fedora_30_Beta_Release_Criteria#Server_upgrade_requirements
Created attachment 1584995 [details] /var/log tarball from failed test (server) This is a tarball of /var/log on the server after the failures. Note the upgrade starts around 23:40 UTC - which is 19:40 system local time - and completes around 8 minutes later. journalctl shows time stamps in your local timezone by default; use --utc to see UTC timestamps.
Created attachment 1584997 [details] named.run log after the failures Note this uses system local timestamps (so upgrade is at 19:40, ends at 19:48).
rcrit and rharwood asked for the output of `klist -kt -e /etc/krb5.keytab`. Ask and ye shall receive! Prior to the upgrade, it is: Keytab name: FILE:/etc/krb5.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 2 06/27/2019 11:43:03 host/ipa001.domain.local (aes256-cts-hmac-sha1-96) 2 06/27/2019 11:43:03 host/ipa001.domain.local (aes128-cts-hmac-sha1-96) 2 06/27/2019 11:43:03 host/ipa001.domain.local (DEPRECATED:des3-cbc-sha1) 2 06/27/2019 11:43:03 host/ipa001.domain.local (DEPRECATED:arcfour-hmac) 2 06/27/2019 11:43:03 host/ipa001.domain.local (camellia128-cts-cmac) 2 06/27/2019 11:43:03 host/ipa001.domain.local (camellia256-cts-cmac) after the upgrade, it is: Keytab name: FILE:/etc/krb5.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 2 06/27/2019 11:43:03 host/ipa001.domain.local (aes256-cts-hmac-sha1-96) 2 06/27/2019 11:43:03 host/ipa001.domain.local (aes128-cts-hmac-sha1-96) 2 06/27/2019 11:43:03 host/ipa001.domain.local (DEPRECATED:des3-hmac-sha1) 2 06/27/2019 11:43:03 host/ipa001.domain.local (DEPRECATED:arcfour-hmac) 2 06/27/2019 11:43:03 host/ipa001.domain.local (camellia128-cts-cmac) 2 06/27/2019 11:43:03 host/ipa001.domain.local (camellia256-cts-cmac) they're *nearly* identical, but there's one change: "(DEPRECATED:des3-cbc-sha1)" changes to "(DEPRECATED:des3-hmac-sha1)".
So the KDC is receiving initial requests that look correctly formed (no removed enctypes or anything), to which it's replying asking for preauth. But the client seems unwilling to follow up on that? I'd kind of like to see KRB5_TRACE output... I guess of whatever bind does for initial credential acquisition. SSSD appears to be having the same problem (/var/log/messages): Jun 26 19:47:16 ipa001 [sssd[ldap_child[15584]]][15584]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Cannot contact any KDC for realm 'DOMAIN.LOCAL'. Unable to create GSSAPI-encrypted LDAP connection. It might be easier to check SSSD - crank debug_level to 9. Maybe also just a copy of the keytab itself, if that's possible (no worries if that's hard). I guess it's also possible that selinux is to blame here. It looks like that has updated as well? I'm not sure how to check this, but I didn't see anything in the logs, so maybe not.
Tomasz Torcz just found something: <zdzichu> adamw: hey, about https://bugzilla.redhat.com/show_bug.cgi?id=1717229; could you fpaste output of 'grep netscape.security /etc/pki/pki-tomcat/ca/CS.cfg'? <zdzichu> adamw: I had a problem with freeipa recently, class names weren't updates to correct ones <adamw> zdzichu: https://paste.fedoraproject.org/paste/mIZl-Ktm0XK7oSyWQS7qng <zdzichu> yep, seems like the same. It should have org.mozilla.jss in front, like in https://paste.fedoraproject.org/paste/QY9ECyRbHOIWKbHw2ysoKQ <adamw> oo so I changed all those to add org.mozilla.jss on the front and rebooted. Then it seemed to blow up on a certificate validity issue. So I tried rewinding time and restarting certmonger...which fails because it can't talk to a KDC. At this point I would like to set the whole thing on fire and go punt it in a freaking lake. Can I just give you root permissions on the thing and let you go to town with it?
I think the error message on 19:47:16 and even earlier at 19:46 are due to 389-ds not yet fully started due to ipa-server-upgrade starting/restarting all services. Jun 26 19:46:06 ipa001 systemd[1]: Started 389 Directory Server DOMAIN-LOCAL.. Jun 26 19:46:07 ipa001 named-pkcs11[14818]: Failed to get initial credentials (TGT) using principal 'DNS/ipa001.domain.local' and keytab 'FILE:/etc/named.keytab' (Cannot contact any KDC for realm 'DOMAIN.LOCAL') In fact, there are multiple 389-ds restarts: $ grep 'starting up' dirsrv/slapd-DOMAIN-LOCAL/errors [26/Jun/2019:19:31:30.538443342 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up [26/Jun/2019:19:31:35.320162467 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up [26/Jun/2019:19:31:39.163542700 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up [26/Jun/2019:19:31:45.288498343 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up [26/Jun/2019:19:34:12.104069800 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up [26/Jun/2019:19:34:18.268245611 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up [26/Jun/2019:19:34:24.390636122 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up [26/Jun/2019:19:34:50.934026949 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up [26/Jun/2019:19:35:12.907075341 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up [26/Jun/2019:19:35:24.231159036 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up [26/Jun/2019:19:45:39.015522571 -0400] - INFO - main - 389-Directory/1.4.1.4 B2019.169.2126 starting up [26/Jun/2019:19:45:50.133540561 -0400] - INFO - main - 389-Directory/1.4.1.4 B2019.169.2126 starting up [26/Jun/2019:19:45:53.801909332 -0400] - INFO - main - 389-Directory/1.4.1.4 B2019.169.2126 starting up [26/Jun/2019:19:46:04.957738093 -0400] - INFO - main - 389-Directory/1.4.1.4 B2019.169.2126 starting up [26/Jun/2019:19:47:20.251421928 -0400] - INFO - main - 389-Directory/1.4.1.4 B2019.169.2126 starting up [26/Jun/2019:19:49:12.239974795 -0400] - INFO - main - 389-Directory/1.4.1.4 B2019.169.2126 starting up If we check the upgrade logs, there is certain correlation, definitely: $ grep dirsrv@ ipaupgrade.log 2019-06-26T23:45:38Z DEBUG args=['/bin/systemctl', 'is-active', 'dirsrv'] 2019-06-26T23:45:38Z DEBUG args=['/bin/systemctl', 'start', 'dirsrv'] 2019-06-26T23:45:40Z DEBUG Start of dirsrv complete 2019-06-26T23:45:47Z DEBUG args=['/bin/systemctl', 'restart', 'dirsrv'] 2019-06-26T23:45:51Z DEBUG Restart of dirsrv complete 2019-06-26T23:45:51Z DEBUG args=['/bin/systemctl', 'stop', 'dirsrv'] 2019-06-26T23:45:53Z DEBUG Stop of dirsrv complete 2019-06-26T23:45:53Z DEBUG args=['/bin/systemctl', 'is-active', 'dirsrv'] 2019-06-26T23:45:53Z DEBUG args=['/bin/systemctl', 'start', 'dirsrv'] 2019-06-26T23:45:54Z DEBUG args=['/bin/systemctl', 'is-active', 'dirsrv'] 2019-06-26T23:45:54Z DEBUG Start of dirsrv complete 2019-06-26T23:46:03Z DEBUG args=['/sbin/restorecon', '/etc/systemd/system/dirsrv.d/ipa-env.conf'] 2019-06-26T23:46:03Z DEBUG args=['/bin/systemctl', 'stop', 'dirsrv'] 2019-06-26T23:46:04Z DEBUG Stop of dirsrv complete 2019-06-26T23:46:04Z DEBUG args=['/bin/systemctl', 'start', 'dirsrv'] 2019-06-26T23:46:06Z DEBUG args=['/bin/systemctl', 'is-active', 'dirsrv'] 2019-06-26T23:46:06Z DEBUG Start of dirsrv complete 2019-06-26T23:47:16Z DEBUG args=['/bin/systemctl', 'stop', 'dirsrv'] 2019-06-26T23:47:18Z DEBUG Stop of dirsrv complete So we can ignore anything before 19:47:19-04:00 because that was simply part of the upgrade sequence which completed at that time: 2019-06-26T23:47:19Z INFO The ipa-server-upgrade command was successful Since krb5kdc uses LDAPI (as well as bind-dyndb-ldap in named), we should be checking when LDAPI access becomes available: $ grep 'Listening on /var' messages Jun 26 19:31:36 ipa001 ns-slapd[6331]: [26/Jun/2019:19:31:36.284311814 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests Jun 26 19:31:40 ipa001 ns-slapd[6380]: [26/Jun/2019:19:31:40.065880083 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests Jun 26 19:31:46 ipa001 ns-slapd[6488]: [26/Jun/2019:19:31:46.190858516 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests Jun 26 19:34:13 ipa001 ns-slapd[8869]: [26/Jun/2019:19:34:13.037814733 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests Jun 26 19:34:19 ipa001 ns-slapd[8964]: [26/Jun/2019:19:34:19.164672460 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests Jun 26 19:34:25 ipa001 ns-slapd[9042]: [26/Jun/2019:19:34:25.308606786 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests Jun 26 19:34:52 ipa001 ns-slapd[9956]: [26/Jun/2019:19:34:52.024964480 -0400] - INFO - slapd_daemon - slapd started. Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests Jun 26 19:35:14 ipa001 ns-slapd[10024]: [26/Jun/2019:19:35:14.048293669 -0400] - INFO - slapd_daemon - slapd started. Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests Jun 26 19:35:25 ipa001 ns-slapd[10078]: [26/Jun/2019:19:35:25.461827754 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests Jun 26 19:45:40 ipa001 ns-slapd[14073]: [26/Jun/2019:19:45:40.042291711 -0400] - INFO - slapd_daemon - slapd started. Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests Jun 26 19:45:51 ipa001 ns-slapd[14108]: [26/Jun/2019:19:45:51.100285602 -0400] - INFO - slapd_daemon - slapd started. Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests Jun 26 19:45:54 ipa001 ns-slapd[14145]: [26/Jun/2019:19:45:54.737239713 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests Jun 26 19:46:06 ipa001 ns-slapd[14634]: [26/Jun/2019:19:46:06.052633915 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests Jun 26 19:47:21 ipa001 ns-slapd[15621]: [26/Jun/2019:19:47:21.187925693 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests Jun 26 19:49:13 ipa001 ns-slapd[964]: [26/Jun/2019:19:49:13.995964793 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests E.g. after 19:49:13-04:00 we should be able to talk to the LDAPI socket unless 389-ds did crash. Jun 26 19:49:15 ipa001 named-pkcs11[1045]: Failed to get initial credentials (TGT) using principal 'DNS/ipa001.domain.local' and keytab 'FILE:/etc/named.keytab' (Pre-authentication failed: Invalid argument) Jun 26 19:49:15 ipa001 named-pkcs11[1045]: Failed to get initial credentials (TGT) using principal 'DNS/ipa001.domain.local' and keytab 'FILE:/etc/named.keytab' (Pre-authentication failed: Invalid argument) Jun 26 19:50:17 ipa001 named-pkcs11[1045]: Failed to get initial credentials (TGT) using principal 'DNS/ipa001.domain.local' and keytab 'FILE:/etc/named.keytab' (Pre-authentication failed: Invalid argument) Jun 26 19:51:17 ipa001 named-pkcs11[1045]: Failed to get initial credentials (TGT) using principal 'DNS/ipa001.domain.local' and keytab 'FILE:/etc/named.keytab' (Pre-authentication failed: Invalid argument) Jun 26 19:52:17 ipa001 named-pkcs11[1045]: Failed to get initial credentials (TGT) using principal 'DNS/ipa001.domain.local' and keytab 'FILE:/etc/named.keytab' (Pre-authentication failed: Invalid argument) According to krb5kdc.log, at the same time the only thing happened was receiving of the AS_REQ requests mentioned above. And 389-ds access log confirms that at least until the last of these requests KDC did contact 389-ds over LDAPI socket: [26/Jun/2019:19:50:43.830134200 -0400] conn=6 op=20 SRCH base="cn=Default Host Password Policy,cn=computers,cn=accounts,dc=domain,dc=local" scope=0 filter= "(objectClass=*)" attrs="krbMaxPwdLife krbMinPwdLife krbPwdMinDiffChars krbPwdMinLength krbPwdHistoryLength krbPwdMaxFailure krbPwdFailureCountInterval krb PwdLockoutDuration" [26/Jun/2019:19:50:43.830258319 -0400] conn=6 op=20 RESULT err=0 tag=101 nentries=1 etime=0.0000264641 [26/Jun/2019:19:51:17.971345902 -0400] conn=6 op=22 SRCH base="dc=domain,dc=local" scope=2 filter="(&(|(objectClass=krbprincipalaux)(objectClass=krbprincip al)(objectClass=ipakrbprincipal))(|(ipaKrbPrincipalAlias=DNS/ipa001.domain.local)(krbPrincipalName:caseIgnoreIA5Match:=DNS/ipa001.domain.local @DOMAIN.LOCAL)))" attrs="krbPrincipalName krbCanonicalName krbUPEnabled krbPrincipalKey krbTicketPolicyReference krbPrincipalExpiration krbPasswordExpirati on krbPwdPolicyReference krbPrincipalType krbPwdHistory krbLastPwdChange krbPrincipalAliases krbLastSuccessfulAuth krbLastFailedAuth krbLoginFailedCount kr bPrincipalAuthInd krbExtraData krbLastAdminUnlock krbObjectReferences krbTicketFlags krbMaxTicketLife krbMaxRenewableAge nsAccountLock passwordHistory ipaK rbAuthzData ipaUserAuthType ipatokenRadiusConfigLink objectClass" [26/Jun/2019:19:51:17.972786043 -0400] conn=6 op=22 RESULT err=0 tag=101 nentries=1 etime=0.0001862323 [26/Jun/2019:19:51:17.972989777 -0400] conn=6 op=23 SRCH base="cn=DOMAIN.LOCAL,cn=kerberos,dc=domain,dc=local" scope=0 filter="(objectClass=krbticketpolicy aux)" attrs="krbMaxTicketLife krbMaxRenewableAge krbTicketFlags" [26/Jun/2019:19:51:17.973194096 -0400] conn=6 op=23 RESULT err=0 tag=101 nentries=1 etime=0.0000359671 [26/Jun/2019:19:51:17.973390908 -0400] conn=6 op=24 SRCH base="dc=domain,dc=local" scope=2 filter="(&(|(objectClass=krbprincipalaux)(objectClass=krbprincip al)(objectClass=ipakrbprincipal))(|(ipaKrbPrincipalAlias=krbtgt/DOMAIN.LOCAL)(krbPrincipalName:caseIgnoreIA5Match:=krbtgt/DOMAIN.LOCAL@DOMAIN. LOCAL)))" attrs="krbPrincipalName krbCanonicalName krbUPEnabled krbPrincipalKey krbTicketPolicyReference krbPrincipalExpiration krbPasswordExpiration krbPw dPolicyReference krbPrincipalType krbPwdHistory krbLastPwdChange krbPrincipalAliases krbLastSuccessfulAuth krbLastFailedAuth krbLoginFailedCount krbPrincip alAuthInd krbExtraData krbLastAdminUnlock krbObjectReferences krbTicketFlags krbMaxTicketLife krbMaxRenewableAge nsAccountLock passwordHistory ipaKrbAuthzD ata ipaUserAuthType ipatokenRadiusConfigLink objectClass" [26/Jun/2019:19:51:17.973965382 -0400] conn=6 op=24 RESULT err=0 tag=101 nentries=1 etime=0.0000739196 [26/Jun/2019:19:51:17.974161567 -0400] conn=6 op=25 SRCH base="cn=Default Service Password Policy,cn=services,cn=accounts,dc=domain,dc=local" scope=0 filte r="(objectClass=*)" attrs="krbMaxPwdLife krbMinPwdLife krbPwdMinDiffChars krbPwdMinLength krbPwdHistoryLength krbPwdMaxFailure krbPwdFailureCountInterval k rbPwdLockoutDuration" [26/Jun/2019:19:51:17.974420173 -0400] conn=6 op=25 RESULT err=0 tag=101 nentries=1 etime=0.0000416655 There are few more requests in the 389-ds access log after that from another krb5kdc process (conn=7) about ipa-dnskeysyncd/ipa001.domain.local principal so it seems both krb5kdc processes were able to connect to the LDAP server. This means they either got incorrect data back or interpreted it wrong?
Sorry, disregard #c5 - that's meant to be for 1717229 .
Robbie: "It might be easier to check SSSD - crank debug_level to 9." - that should already be set to 9 for all openQA tests, it's in the test scripts. Did you check the logs? We run 'sss_debuglevel 9' right after the client enrols: https://pagure.io/fedora-qa/os-autoinst-distri-fedora/blob/master/f/tests/realmd_join_sssd.pm#_82
I did. There's nothing in the krb5_child.log - I would have expected to see krb5 trace output in there. Indeed, most of these appear much shorter than 9 would suggest - sssd.log is also empty, even.
Alexander suggested on IRC that this is probably because the way SSSD's debug level is being set isn't persistent across restarts. Moreover, I think he's right that this failure occurs while the KDC is down for upgrading, and isn't actually a problem. I've attempted to reproduce this setup locally, and am unable to. Specifically, a default IPA server installation on fc30 (with DNS) upgrades without a hitch to rawhide, and I observe no problems from named. What I can't figure out is how you got a keytab with those enctypes in it to begin with - even setting crypto-policies to LEGACY doesn't do it for me. Is this on a machine you'd be willing/able to give me access to, or can you give me reproduction instructions (since you've reproduced it above already)?
The errors I posted happen constantly, over and over, until the test ends (when it finishes uploading the logs). It's not a transient error. However...this actually seems to have been resolved somehow in the last couple of composes. It failed from 20190604.n.0 through 20190703.n.0; it started working again in 20190704.n.1 and worked again in 20190707.n.0 yesterday. That pattern is exactly the same on staging and production, which means it's unlikely to be some kind of test blip, but someone really did something that fixed this in 0704.n.1. I guess let's close this for now, if it starts happening again we can re-open...
freeipa-4.8.0-1.fc31 showed up in that compose, so that's the obvious candidate for what fixed this. There was also an NSS update, not sure if that's relevant. Previous freeipa was 4.7.90.pre1-6.fc31 .
freeipa removed their 3des configuration, which I imagine is why. I've finally managed to reproduce this outside of freeipa: [root@kdc ~]# rpm -qv krb5-workstation krb5-workstation-1.17-34.fc31.x86_64 [root@kdc ~]# klist -ek Keytab name: FILE:/etc/krb5.keytab KVNO Principal ---- -------------------------------------------------------------------------- 2 host/kdc.example.com (aes256-cts-hmac-sha1-96) 2 host/kdc.example.com (aes128-cts-hmac-sha1-96) 2 host/kdc.example.com (DEPRECATED:des3-hmac-sha1) 2 host/kdc.example.com (DEPRECATED:arcfour-hmac) 2 host/kdc.example.com (camellia256-cts-cmac) 2 host/kdc.example.com (camellia128-cts-cmac) [root@kdc ~]# kinit -k kinit: Bad encryption type while getting initial credentials [root@kdc ~]# KRB5_TRACE=/dev/stderr kinit -k [776] 1562788938.944469: Getting initial credentials for host/kdc.example.com [776] 1562788938.944470: Looked up etypes in keytab: aes256-cts, aes128-cts, rc4-hmac, camellia256-cts, camellia128-cts [776] 1562788938.944472: Sending unauthenticated request [776] 1562788938.944473: Sending request (215 bytes) to EXAMPLE.COM [776] 1562788938.944474: Resolving hostname kdc.example.com [776] 1562788938.944475: Sending initial UDP request to dgram 192.168.121.184:88 [776] 1562788938.944476: Received answer (793 bytes) from dgram 192.168.121.184:88 [776] 1562788938.944477: Sending DNS URI query for _kerberos.EXAMPLE.COM. [776] 1562788939.42448: No URI records found [776] 1562788939.42449: Sending DNS SRV query for _kerberos-master._udp.EXAMPLE.COM. [776] 1562788939.42450: Sending DNS SRV query for _kerberos-master._tcp.EXAMPLE.COM. [776] 1562788939.42451: No SRV records found [776] 1562788939.42452: Response was not from master KDC [776] 1562788939.42453: Processing preauth types: PA-ETYPE-INFO2 (19) [776] 1562788939.42454: Selected etype info: etype aes256-cts, salt "EXAMPLE.COMhostkdc.example.com", params "" [776] 1562788939.42455: Produced preauth for next request: (empty) [776] 1562788939.42456: Getting AS key, salt "EXAMPLE.COMhostkdc.example.com", params "" [776] 1562788939.42457: Retrieving host/kdc.example.com from FILE:/etc/krb5.keytab (vno 0, enctype aes256-cts) with result: -1765328196/Bad encryption type [776] 1562788939.42458: Getting initial credentials for host/kdc.example.com [776] 1562788939.42459: Looked up etypes in keytab: aes256-cts, aes128-cts, rc4-hmac, camellia256-cts, camellia128-cts [776] 1562788939.42461: Sending unauthenticated request [776] 1562788939.42462: Sending request (215 bytes) to EXAMPLE.COM (master) [776] 1562788939.42463: Sending DNS URI query for _kerberos.EXAMPLE.COM. [776] 1562788939.42464: No URI records found [776] 1562788939.42465: Sending DNS SRV query for _kerberos-master._udp.EXAMPLE.COM. [776] 1562788939.42466: Sending DNS SRV query for _kerberos-master._tcp.EXAMPLE.COM. [776] 1562788939.42467: No SRV records found kinit: Bad encryption type while getting initial credentials [root@kdc ~]# while in krb5kdc.log: Jul 10 20:02:18 kdc.example.com krb5kdc[770](info): AS_REQ (7 etypes {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), DEPRECATED:arcfour-hmac(23), camellia128-cts-cmac(25), camellia256-cts-cmac(26), aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19)}) 192.168.121.184: ISSUE: authtime 1562788938, etypes {rep=aes256-cts-hmac-sha1-96(18), tkt=aes256-cts-hmac-sha1-96(18), ses=aes256-cts-hmac-sha1-96(18)}, host/kdc.example.com for krbtgt/EXAMPLE.COM
Cool. Aha - now I think about it again...there's another possibility for why the tests stopped failing: I bet a newer FreeIPA showed up in *Fedora 30*, so the *start point* for the upgrade process is different. And...yeah, that looks to be exactly what happened: https://bodhi.fedoraproject.org/updates/FEDORA-2019-76a8f0144a Still...this started happening in June, when both F30 and Rawhide should have had 4.7.90-pre1 as the current FreeIPA version. So...I guess somehow upgrading from F30+4.7.90-pre1 to Rawhide+4.7.90-pre1 triggers the bug, but upgrading from F30+4.8.0 to Rawhide+4.8.0 doesn't?
Yes, it looks like freeIPA backported the 3DES non-configuration change to fc30 as well. I guess the freeIPA openqa is the only person/thing testing migration, because this bug turns out to be actually important for me to have fixed before fc31 release. (Fix posted upstream; will backport once merged.)
So, based on Comment #14, this isn't currently an issue going from F30 4.8.0 to F31 4.8.0, or it's still a problem if you've been upgrading from 4.7.x since earlier in F30? If the latter, I'd call that a +1 blocker. If it's the former, we formally only support upgrades from the latest stable packages in the previous release, so I'd be -1 blocker, +1 FE.
I am honestly not sure exactly when you might run into it at this point. Robbie, what do you see as still outstanding here?
I was just gonna fix it and call it a day... the build in "fixed in" (krb5-1.17-35.fc31) resolves the issue using the patch I wrote upstream.
Do we also need to fix it in F30?
Discussed during the 2019-07-15 blocker review meeting: [1] The decision to delay the classification of this as a blocker bug was made as there is definitely a significant bug here, but as things stand we aren't clear if it breaks the criteria for F31 due to the changes in F30. We will investigate this further and decide blocker status once we have more information. [1] https://meetbot.fedoraproject.org/fedora-blocker-review/2019-07-15/f31-blocker-review.2019-07-15-16.05.txt
No, because 3DES removal isn't part of fc30.