Bug 696193
Summary: | Client install fails on ipa-join when master is down, and replica is running. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Namita Soman <nsoman> | ||||||
Component: | ipa | Assignee: | Rob Crittenden <rcritten> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Chandrasekar Kannan <ckannan> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 6.1 | CC: | benl, dpal, jgalipea, mkosek, shaines, syeghiay | ||||||
Target Milestone: | rc | Keywords: | Reopened | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | ipa-2.1.0-1.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: |
Cause: If one of the IPA servers is down then clients enrollment may fail.
Consequence: client enrollment is unpredictable if one of the IPA servers is down.
Fix: Do not configure sssd on an IPA server to do failover. A running server may be configured to use services on another that is down.
Result: sssd is predictable on an IPA server. When the IPA services are running then sssd is available.
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2011-12-06 18:21:34 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 713473 | ||||||||
Bug Blocks: | |||||||||
Attachments: |
|
Description
Namita Soman
2011-04-13 14:10:55 UTC
On a fresh client yes, sssd has no part in enrollment. It indirectly might if the client is already configured to use sssd. In IRC you said ipa-client-install was run with no options so it is using DNS discovery. Since it got a 500 error it talked to something, the ipaclient-install.log may have details on that. Look in /var/log/httpd/errors on the replica to see what was logged there. A 500 error should have generated a traceback or other error. ipa-client-install log has the ipa-join error pasted in initial description. /var/log/httpd/error_log has: the below - when client install failed: [Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] mod_wsgi (pid=14275): Exception occurred processing WSGI script '/usr/share/ipa/wsgi.py'. [Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] Traceback (most recent call last): [Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] File "/usr/share/ipa/wsgi.py", line 48, in application [Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] return api.Backend.session(environ, start_response) [Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] File "/usr/lib/python2.6/site-packages/ipaserver/rpcserver.py", line 141, in __call__ [Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] self.create_context(ccache=environ.get('KRB5CCNAME')) [Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] File "/usr/lib/python2.6/site-packages/ipalib/backend.py", line 110, in create_context [Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] self.Backend.ldap2.connect(ccache=ccache) [Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] File "/usr/lib/python2.6/site-packages/ipalib/backend.py", line 62, in connect [Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] conn = self.create_connection(*args, **kw) [Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] File "/usr/lib/python2.6/site-packages/ipalib/encoder.py", line 188, in new_f [Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] return f(*new_args, **kwargs) [Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] File "/usr/lib/python2.6/site-packages/ipaserver/plugins/ldap2.py", line 336, in create_connection [Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] _handle_errors(e, **{}) [Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] File "/usr/lib/python2.6/site-packages/ipaserver/plugins/ldap2.py", line 117, in _handle_errors [Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] raise errors.DatabaseError(desc=desc, info=info) [Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] DatabaseError: Local error: SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Cannot contact any KDC for realm 'TESTRELM') Created attachment 491799 [details]
/var/log/httpd/error_log from replica
Created attachment 491800 [details]
ipaclient-install.log
I wonder if the uninstall actually removed the record on master and if it did whether the change has been propagated to the replica. I suspect that replica still has the host entry from the first installation at the moment of the second registration. To get the info for comment6 above, - On Slave, can't kinit, so can't run ipa host-find or host-show - So, on Master, ran ipactl start - On Master, ran ipa host-show, and this client is listed, with its Keytab false. - With master running, doing a kinit on Slave was successful, but cannot run ipa host-show. Got error - ipa: ERROR: Kerberos error: Kerberos error: ('Unspecified GSS failure. Minor code may provide more information', 851968)/('Cannot contact any KDC for requested realm', -1765328228)/ - ldapsearch on slave lists this client, and has values similar to what is shown on master for ipa host-show, except Keytab. That value is not listed, and not sure how to get that. In a plain master - slave config, with no client installs done, verified that i can bring master down, then do a kinit on the slave, do a ipa host-find on slave. That looks good :) But incomplete client install then brought replica to a bad state. I haven't restarted anything on replica-in-bad-state yet... The enrollment failed because it didn't forward a TGT (it was authenticated, but didn't delegate the credentials). This isn't a problem of the servers not knowing about the client, though the fact that Keytab is false means the client isn't enrolled. krb5.conf on an IPA server points only to itself so I don't see how a kinit on the replica was possible. It does explain why ipa host-show failed, it couldn't get a ticket for the remote HTTP service. Doing ipa -v host-show will tell us what server(s) it is trying to contact. If you enroll a client pointing to a specific server then if that server goes down your client will not work. If you enroll a client using DNS srv records and a server goes down you may still need to remove the downed server from the srv records. Both sssd and the ipa tool do failover via the srv records but they probably do it in very different ways. In general though a partly-configured client isn't really supported, we can't predict how it will perform. So the bug here is the below? "The enrollment failed because it didn't forward a TGT (it was authenticated, but didn't delegate the credentials)." I'll keep my note short to avoid confusion.... when i started my client install, master was down, and stayed down, replica was up, and stayed up. But client install failed. Think your comments above do not relate to this scenario. I'm a bit fuzzy on the reproduction steps. Did you have bind configured with both master and replica configured as SRV records? I have been unable to verify this. My set up consists of: Original master with DNS on panther Replica install with DNS on slinky Confirmed that both have SRV records for the domain. On panther run ipactl to completely shut down IPA. On client lion configure /etc/resolv.conf with both panther as the nameserver: # ipa-client-install (wait 15 seconds or so) DNS discovery failed to determine your DNS domain Please provide the domain name of your IPA server (ex: example.com): Ok, that is expected. Add slinky to /etc/resolv.conf: # ipa-client-install root : ERROR LDAP Error: Can't contact LDAP server: Failed to verify that slinky.greyoak.com is an IPA Server. This may mean that the remote server is not up or is not reachable due to network or firewall settings. This is expected too as slinky is still a SRV record for the domain. I can keep trying and eventually I'll get slinky as the server to use: # ipa-client-install Discovery was successful! Hostname: lion.greyoak.com Realm: GREYOAK.COM DNS Domain: greyoak.com IPA Server: slinky.greyoak.com BaseDN: dc=greyoak,dc=com Continue to configure the system with these values? [no]: y Enrollment principal: admin Password for admin: Enrolled in IPA realm GREYOAK.COM Created /etc/ipa/default.conf Configured /etc/sssd/sssd.conf Configured /etc/krb5.conf for IPA realm GREYOAK.COM Warning: Hostname (lion.greyoak.com) not found in DNS DNS server record set to: lion.greyoak.com -> 192.168.166.32 SSSD enabled Kerberos 5 enabled NTP enabled Client configuration complete. [root@lion rcrit]# id admin uid=1457600000(admin) gid=1457600000(admins) groups=1457600000(admins) Seems to be working fine. To make things easier I could have removed the panther SRV records from DNS. Note that there may still be sporadic failures because sssd and Kerberos are both configured to use DNS discovery and panther is still down, but my basic tests work. I cannot reproduce this, can you provide a more detailed case? will set this up and will update. maybe with the new replica install, this is not an issue. Steps followed: 1> Install master with DNS (dell-p690-01.testrelm) 2> install slave with DNS (apollo.testrelm) 3> install client (hp-xw4200-01.testrelm) specifying to install with --server pointing to master (dell-p690-01) Next: 1> On master (dell-p690-01) ipactl stop 2> On slave (apollo) kinit admin this fails: kinit: Cannot contact any KDC for realm 'TESTRELM' while getting initial credentials So now.... 1> on slave (apollo) did #cat /var/lib/sss/pubconf/kdcinfo.TESTRELM this had the master's IP 2> Edited /var/lib/sss/pubconf/kdcinfo.TESTRELM to have slave's IP 3> Can kinit Finally..... 1> Install client. It picked the slave (apollo) to install against. Services on master are still down 2> Installed successfully. The issue: kdcinfo.TESTRELM on slave was incorrect Missed two steps..... In the section above "Steps followed:" 4> On client, kinit admin was successful. 5> Uninstalled client blocking sssd bug https://bugzilla.redhat.com/show_bug.cgi?id=696193 (In reply to comment #22) > blocking sssd bug https://bugzilla.redhat.com/show_bug.cgi?id=696193 oops https://bugzilla.redhat.com/show_bug.cgi?id=713473 After some discussion we decided to configure IPA servers to not use SRV records and only talk to the local install. Fixed upstream: master: https://fedorahosted.org/freeipa/changeset/d0af8b28d7552b301d5d2c1af93ed1604dc5df8f ipa-2-0: https://fedorahosted.org/freeipa/changeset/99181f694d2d78aa01e402a95a9423741456d2af testing this Verified using steps below: 1. with master and replica started, installed client. 2. uninstalled client 3. stopped master 4. installed clint 5. kinit'd on replica and client using newly added user 6. started master, and kinited with new user there as well, and saw client listed correctly with its keytab true, when running ipa host-find Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: If one of the IPA servers is down then clients enrollment may fail. Consequence: client enrollment is unpredictable if one of the IPA servers is down. Fix: Do not configure sssd on an IPA server to do failover. A running server may be configured to use services on another that is down. Result: sssd is predictable on an IPA server. When the IPA services are running then sssd is available. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1533.html |