Bug 1046949
| Summary: | Client enrollment fails with ipa-client-install; client enrollment works fine by hand. | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | corpitsysadmins | ||||||||
| Component: | ipa | Assignee: | Martin Kosek <mkosek> | ||||||||
| Status: | CLOSED NOTABUG | QA Contact: | Namita Soman <nsoman> | ||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 6.5 | CC: | abokovoy, corpitsysadmins, mkosek, pviktori, rcritten, ssorce | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | 6.5 | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2014-01-15 09:38:32 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
corpitsysadmins
2013-12-27 16:52:25 UTC
Thanks for the report (though please rather include the log as the attachement to the Bugzilla instead of pasting it directly - it allows more clarity). This part seems indeed interesting: ~~~~~~~~ Connecting: 10.10.2.46:0 Connection to https://freeipa-one.corp.modmed.com/ipa/xml failed with KerbTransport instance has no attribute '_conn' trying https://freeipa-two.corp.modmed.com/ipa/xml NSSConnection init freeipa-two.corp.modmed.com Connection to https://freeipa-two.corp.modmed.com/ipa/xml failed with [Errno -8053] (SEC_ERROR_BUSY) NSS could not shutdown. Objects are still in use. Cannot connect to the server due to generic error: cannot connect to Gettext('any of the configured servers', domain='ipa', localedir=None): https://freeipa-one.corp.modmed.com/ipa/xml, https://freeipa-two.corp.modmed.com/ipa/xml Installation failed. Rolling back changes. ~~~~~~~~ It looks like there was some cached certificate object left in NSS, causing ipa-client-install to fail. What is the version of the nss package? I also wonder, do you know why joining to freeipa-one.corp.modmed.com is failing in the first place? Maybe an expired or a custom untrusted SSL certificate on this IPA server? Does this issue occur on all clients (faulty freeipa-one)? Hi Martin, thanks for the reply and the tips. I will attach the logs going forward. NSS is nss.x86_64 3.15.3-3.el6_5 Not sure why the the join is failing in the first place. It works via ipa-client-install for some hosts, but no others (all running the same OS version, w/ the same firewall configs, etc). I will take a look at the SSL certs in the IPA server and report back. I don't think we have a problem with the IPA server though. Joining by "hand" always works in Linux, and we haven't had any issues joining Macs via a script we wrote for this. Moving to RHEL product, as you you packages based on this one, rather than on Fedora. As for the bug itself, in the end, I managed to reproduce by damaging S4U2proxy on the server thus making the ipa-client-install failover in the same place as in your case (though with different error). There is a NSS resource leak somewhere, preventing it to failover to other server. I did not find though where the leak is, this code is quite complicated. I am now wondering why it failed to connect to the first (freeipa-one) in the first place. Are there any related logs in /var/log/krb5kdc.log or /var/log/httpd/error_log on freeipa-one? Does the install succeed if you install directly against freeipa-two? i.e. ipa-client-install --server freeipa-two.corp.modmed.com --domain corp.modmed.com ? Created attachment 847844 [details]
See name
Thanks for the update Martin. We are using self-signed certs...not sure if this makes a difference; but if they did, all client enrollments would fail and that is not the case. Since we open the bug we've had to reinstall the OS in some of the client machines that were "broken" but I am attaching the freeipa-one logs you requested from 12/27/2013. The good thing about rebuilding those systems is that we can now reproduce the problem. While connecting to freeipa-one 1. ipa-client-install (will succeed, no problems at all) 2. ipa-client-install --uninstall 3. Reboot 4. ipa-client-install (will fail in the same place as before, see attached logs) Tried ipa-client-install --server freeipa-two.corp.modmed.com --domain corp.modmed.com too (see attached logs). NP here, even after uninstall / install. I wonder what's wrong w/ my primary server. The only diff between these two servers, that comes to mind, is that freeipa-one has a public IP; freeipa-two does not. I'll report back if I find the root cause. Sorry for the amount of logs btw. I grepped/named them so they are easier to read. Thank you again. Created attachment 847846 [details]
Logs
Created attachment 847871 [details]
Logs
Thanks for the well prepared logs, I wish all our logs are prepared that great. I am bit confused by following section: (In reply to corpitsysadmins from comment #6) > ... > The good thing about rebuilding those systems is that we can now reproduce > the problem. While connecting to freeipa-one > > 1. ipa-client-install (will succeed, no problems at all) > 2. ipa-client-install --uninstall > 3. Reboot > 4. ipa-client-install (will fail in the same place as before, see attached > logs) Does it mean that after reinstallation of a system, the first join to freeipa-one succeeded and the second install (after uninstall&reboot) to freeipa-one failed? I still did not discover the root cause of the fail, the real root cause is unfortunately hidden behind Connection to https://freeipa-one.corp.modmed.com/ipa/xml failed with KerbTransport instance has no attribute '_conn'\ trying https://freeipa-two.corp.modmed.com/ipa/xml\ hides the real exception. As you mentioned DNS, let's do a little practice to rule out any potential DNS issue as that is in 90% the root cause of Kerberos system issues. On freeipa-one, run: # cat /etc/hosts # hostname # host `hostname` # host $IP_ADDRESS_OF_THIS_HOSTNAME # host ovirt-two.corp.modmed.com # host $IP_ADDRESS_OF_OVIR_TWO On ovirt-two: # cat /etc/hosts # hostname # host `hostname` # host $IP_ADDRESS_OF_THIS_HOSTNAME # host freeipa-one.corp.modmed.com # host $IP_ADDRESS_OF_FREEIPA_ONE I also noticed one more issue related to DNS principal in the kerberos log: Jan 09 16:14:22 freeipa-two.corp.modmed.com krb5kdc[1337](info): TGS_REQ (4 etypes {18 17 16 23}) 10.10.2.61: UNKNOWN_SERVER: authtime 0, host/ovirt-two.corp.modmed.com.COM for DNS/dns-server-primary.corp.modmed.com.COM, Server not found in Kerberos database Not sure what it is. Thank you Martin! Yes, it means that after reinstallation of a system, the first join to freeipa-one succeeds, and the second install (after uninstall&reboot) to freeipa-one fails. I verified this again today with two other CentOS clients. Here is the info requested: [fredy.sanchez@freeipa-one ~]$ sudo cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.10.2.46 freeipa-one.corp.modmed.com freeipa-one 107.21.44.86 m2inf-kerberos-01.corp.modmed.com m2inf-kerberos-01 [fredy.sanchez@freeipa-one ~]$ hostname freeipa-one.corp.modmed.com [fredy.sanchez@freeipa-one ~]$ host `hostname` freeipa-one.corp.modmed.com has address 199.189.197.89 [fredy.sanchez@freeipa-one ~]$ host freeipa-one.corp.modmed.com freeipa-one.corp.modmed.com has address 199.189.197.89 [fredy.sanchez@freeipa-one ~]$ host ovirt-two.corp.modmed.com ovirt-two.corp.modmed.com has address 10.10.2.61 [fredy.sanchez@freeipa-one ~]$ host 10.10.2.61 61.2.10.10.in-addr.arpa domain name pointer ovirt-two.corp.modmed.com. 199.189.197.89 is the public IP of the server, 10.10.2.46 is the internal one. [fredy.sanchez@ovirt-two ~]$ sudo cat /etc/hosts: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 [fredy.sanchez@ovirt-two ~]$ hostname ovirt-two.corp.modmed.com [fredy.sanchez@ovirt-two ~]$ host `hostname` ovirt-two.corp.modmed.com has address 10.10.2.61 [fredy.sanchez@ovirt-two ~]$ host 10.10.2.61 61.2.10.10.in-addr.arpa domain name pointer ovirt-two.corp.modmed.com. [fredy.sanchez@ovirt-two ~]$ host freeipa-one.corp.modmed.com freeipa-one.corp.modmed.com has address 199.189.197.89 [fredy.sanchez@ovirt-two ~]$ host 199.189.197.89 89.197.189.199.in-addr.arpa domain name pointer freeipa-one.corp.modmed.com. To avoid any sort of "flakiness" I removed the 10.10.2.46 from /etc/hosts in freeipa-one, but this effectively broke freeipa for this is the NIC IP assigned to the server (the public one comes to it via firewall NAT). After fixing this, I tried to re-enroll our two dns servers, but they both failed in the same way as ovirt-two. So I removed the public IP from the equation by forcing the clients to resolve freeipa-one to 10.10.2.46 and voila! The problem is gone. So there is probably a problem with the firewall that's doing the NAT translation, and I will get that fixed. I think this is the root cause, what's weird is that that enrollment would work once, and then fail after that; but this probably also related to the firewall's config. Complicating the problem all the right ports are open, and the Web GUI of freeipa-one resolves to the public IP and works fine. Thanks a lot for your help Martin, and apologies for taking so much of your time. As far as I go you can go ahead close this ticket. Might have spoken too soon. The internal IP does work, but I am not so sure anymore the firewall is to blame. Adding info showing the right firewall ports are open: [fredy.sanchez@ovirt-two ~]$ sudo nc -zv 199.189.197.89 80 Connection to 199.189.197.89 80 port [tcp/http] succeeded! [fredy.sanchez@ovirt-two ~]$ sudo nc -zv 199.189.197.89 443 Connection to 199.189.197.89 443 port [tcp/https] succeeded! [fredy.sanchez@ovirt-two ~]$ sudo nc -zv 199.189.197.89 389 Connection to 199.189.197.89 389 port [tcp/ldap] succeeded! [fredy.sanchez@ovirt-two ~]$ sudo nc -zv 199.189.197.89 636 Connection to 199.189.197.89 636 port [tcp/ldaps] succeeded! [fredy.sanchez@ovirt-two ~]$ sudo nc -zv 199.189.197.89 88 Connection to 199.189.197.89 88 port [tcp/kerberos] succeeded! [fredy.sanchez@ovirt-two ~]$ sudo nc -zv 199.189.197.89 464 Connection to 199.189.197.89 464 port [tcp/kpasswd] succeeded! [fredy.sanchez@ovirt-two ~]$ sudo nc -zvu 199.189.197.89 88 Connection to 199.189.197.89 88 port [udp/kerberos] succeeded! [fredy.sanchez@ovirt-two ~]$ sudo nc -zvu 199.189.197.89 464 Connection to 199.189.197.89 464 port [udp/kpasswd] succeeded! [fredy.sanchez@ovirt-two ~]$ sudo nc -zvu 199.189.197.89 123 Connection to 199.189.197.89 123 port [udp/ntp] succeeded! From another machine: [fredy.sanchez@dns-server-secondary ~]$ sudo nc -zv 199.189.197.89 80 Connection to 199.189.197.89 80 port [tcp/http] succeeded! [fredy.sanchez@dns-server-secondary ~]$ sudo nc -zv 199.189.197.89 443 Connection to 199.189.197.89 443 port [tcp/https] succeeded! [fredy.sanchez@dns-server-secondary ~]$ sudo nc -zv 199.189.197.89 389 Connection to 199.189.197.89 389 port [tcp/ldap] succeeded! [fredy.sanchez@dns-server-secondary ~]$ sudo nc -zv 199.189.197.89 636 Connection to 199.189.197.89 636 port [tcp/ldaps] succeeded! [fredy.sanchez@dns-server-secondary ~]$ sudo nc -zv 199.189.197.89 88 Connection to 199.189.197.89 88 port [tcp/kerberos] succeeded! [fredy.sanchez@dns-server-secondary ~]$ sudo nc -zvu 199.189.197.89 88 Connection to 199.189.197.89 88 port [udp/kerberos] succeeded! [fredy.sanchez@dns-server-secondary ~]$ sudo nc -zv 199.189.197.89 464 Connection to 199.189.197.89 464 port [tcp/kpasswd] succeeded! [fredy.sanchez@dns-server-secondary ~]$ sudo nc -zvu 199.189.197.89 464 Connection to 199.189.197.89 464 port [udp/kpasswd] succeeded! [fredy.sanchez@dns-server-secondary ~]$ sudo nc -zvu 199.189.197.89 123 Connection to 199.189.197.89 123 port [udp/ntp] succeeded! Good investigation! It seems this issue is indeed related to DNS, though it may not be the firewall, as you suggested in Comment 11. But as we see, it is caused by the freeipa-one mixing the internal/external IP for the internal client, as it seems. It is hard to tell, what was the exact root cause in your environment, if you find it, please share. But if you are satisfied with the current fixed state, I may close this bug, as indicated. I see no further reported issues, closing as NOTABUG. Thank you Martin, just wanted to add that yesterday we spent a good deal of time trying to get our freeipa servers working w/ an external CA. We went thru many problems, but came across https://www.redhat.com/archives/freeipa-users/2013-May/msg00192.html, from which we run # cd /etc/pki/nssdb/ # ln -s /usr/lib64/nss/libnssckbi.so . to fix a couple of error messages we saw in the logs in the form of ...KerbTransport instance has no attribute '_conn' ipa: ERROR: ...((SEC_ERROR_UNTRUSTED_ISSUER) Peer's certificate issuer has been marked as not trusted by the user.) Since then we haven't had the original problem, w/ the private or public IP. So you were probably right when you said "It looks like there was some cached certificate object left in NSS, causing ipa-client-install to fail." This doesn't explain why were able to enroll at times. But checking for cached certs is a great troubleshooting step. Do keep the ticket closed, I agree that is not a bug. Thank you again! |