This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 696193 - Client install fails on ipa-join when master is down, and replica is running.
Client install fails on ipa-join when master is down, and replica is running.
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: ipa (Show other bugs)
6.1
Unspecified Unspecified
high Severity medium
: rc
: ---
Assigned To: Rob Crittenden
Chandrasekar Kannan
: Reopened
Depends On: 713473
Blocks:
  Show dependency treegraph
 
Reported: 2011-04-13 10:10 EDT by Namita Soman
Modified: 2015-01-04 18:47 EST (History)
6 users (show)

See Also:
Fixed In Version: ipa-2.1.0-1.el6
Doc Type: Bug Fix
Doc Text:
Cause: If one of the IPA servers is down then clients enrollment may fail. Consequence: client enrollment is unpredictable if one of the IPA servers is down. Fix: Do not configure sssd on an IPA server to do failover. A running server may be configured to use services on another that is down. Result: sssd is predictable on an IPA server. When the IPA services are running then sssd is available.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-12-06 13:21:34 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
/var/log/httpd/error_log from replica (9.19 KB, application/octet-stream)
2011-04-13 11:11 EDT, Namita Soman
no flags Details
ipaclient-install.log (4.04 KB, application/octet-stream)
2011-04-13 11:12 EDT, Namita Soman
no flags Details

  None (edit)
Description Namita Soman 2011-04-13 10:10:55 EDT
Description of problem:
With the master down, and replica running, ipa-client-install fails with error 
Joining realm failed because of failing XML-RPC request.
  This error may be caused by incompatible server/client major versions.


Version-Release number of selected component (if applicable):
ipa-client-2.0.0-20.el6.x86_64

How reproducible:
i have seen this twice

Steps to Reproduce:
1. With Master and Replica up, installed client successfully
2. Then uninstalled client
3. Ran ipactl stop on Master
4. Reinstalled client, and got error in install log - 
2011-04-13 09:48:01,332 DEBUG args=/usr/sbin/ipa-join -s rhel61-server4.testrelm
2011-04-13 09:48:01,333 DEBUG stdout=
2011-04-13 09:48:01,333 DEBUG stderr=HTTP response code is 500, not 200
5. On replica, tried kinit admin, and got error - 
kinit: Cannot contact any KDC for realm 'TESTRELM' while getting initial credentials
  
Actual results:
client install failed

Expected results:
client install to succeed


Additional info:
Yesterday, when I saw this, i restarted sssd, and that seemed to get me around this issue. Spoke to sgallagh and this is not related to sssd, but something in the sequence of steps taken when doing ipa-join.
Comment 2 Rob Crittenden 2011-04-13 10:40:48 EDT
On a fresh client yes, sssd has no part in enrollment. It indirectly might if the client is already configured to use sssd.

In IRC you said ipa-client-install was run with no options so it is using DNS discovery. Since it got a 500 error it talked to something, the ipaclient-install.log may have details on that.

Look in /var/log/httpd/errors on the replica to see what was logged there. A 500 error should have generated a traceback or other error.
Comment 3 Namita Soman 2011-04-13 11:07:04 EDT
ipa-client-install log has the ipa-join error pasted in initial description.

/var/log/httpd/error_log has: the below - when client install failed:

[Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] mod_wsgi (pid=14275): Exception occurred processing WSGI script '/usr/share/ipa/wsgi.py'.
[Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] Traceback (most recent call last):
[Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84]   File "/usr/share/ipa/wsgi.py", line 48, in application
[Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84]     return api.Backend.session(environ, start_response)
[Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84]   File "/usr/lib/python2.6/site-packages/ipaserver/rpcserver.py", line 141, in __call__
[Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84]     self.create_context(ccache=environ.get('KRB5CCNAME'))
[Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84]   File "/usr/lib/python2.6/site-packages/ipalib/backend.py", line 110, in create_context
[Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84]     self.Backend.ldap2.connect(ccache=ccache)
[Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84]   File "/usr/lib/python2.6/site-packages/ipalib/backend.py", line 62, in connect
[Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84]     conn = self.create_connection(*args, **kw)
[Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84]   File "/usr/lib/python2.6/site-packages/ipalib/encoder.py", line 188, in new_f
[Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84]     return f(*new_args, **kwargs)
[Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84]   File "/usr/lib/python2.6/site-packages/ipaserver/plugins/ldap2.py", line 336, in create_connection
[Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84]     _handle_errors(e, **{})
[Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84]   File "/usr/lib/python2.6/site-packages/ipaserver/plugins/ldap2.py", line 117, in _handle_errors
[Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84]     raise errors.DatabaseError(desc=desc, info=info)
[Wed Apr 13 10:43:08 2011] [error] [client 10.16.18.84] DatabaseError: Local error: SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information (Cannot contact any KDC for realm 'TESTRELM')
Comment 4 Namita Soman 2011-04-13 11:11:29 EDT
Created attachment 491799 [details]
/var/log/httpd/error_log from replica
Comment 5 Namita Soman 2011-04-13 11:12:02 EDT
Created attachment 491800 [details]
ipaclient-install.log
Comment 6 Dmitri Pal 2011-04-13 20:22:37 EDT
I wonder if the uninstall actually removed the record on master and if it did whether the change has been propagated to the replica. I suspect that replica still has the host entry from the first installation at the moment of the second registration.
Comment 7 Namita Soman 2011-04-13 22:39:20 EDT
To get the info for comment6 above,
- On Slave, can't kinit, so can't run ipa host-find or host-show

- So, on Master, ran ipactl start

- On Master, ran ipa host-show, and this client is listed, with its Keytab false.

- With master running, doing a kinit on Slave was successful, but cannot run ipa host-show. Got error -
ipa: ERROR: Kerberos error: Kerberos error: ('Unspecified GSS failure.  Minor code may provide more information', 851968)/('Cannot contact any KDC for requested realm', -1765328228)/

- ldapsearch on slave lists this client, and has values similar to what is shown on master for ipa host-show, except Keytab. That value is not listed, and not sure how to get that.
Comment 8 Namita Soman 2011-04-14 08:00:17 EDT
In a plain master - slave config, with no client installs done, verified that i can bring master down, then do a kinit on the slave, do a ipa host-find on slave. That looks good :) 

But incomplete client install then brought replica to a bad state. I haven't restarted anything on replica-in-bad-state yet...
Comment 9 Rob Crittenden 2011-04-14 10:56:36 EDT
The enrollment failed because it didn't forward a TGT (it was authenticated, but didn't delegate the credentials).

This isn't a problem of the servers not knowing about the client, though the fact that Keytab is false means the client isn't enrolled.

krb5.conf on an IPA server points only to itself so I don't see how a kinit on the replica was possible. It does explain why ipa host-show failed, it couldn't get a ticket for the remote HTTP service. Doing ipa -v host-show will tell us what server(s) it is trying to contact.

If you enroll a client pointing to a specific server then if that server goes down your client will not work. If you enroll a client using DNS srv records and a server goes down you may still need to remove the downed server from the srv records. Both sssd and the ipa tool do failover via the srv records but they probably do it in very different ways. In general though a partly-configured client isn't really supported, we can't predict how it will perform.
Comment 10 Namita Soman 2011-04-14 11:21:26 EDT
So the bug here is the below?
"The enrollment failed because it didn't forward a TGT (it was authenticated,
but didn't delegate the credentials)."


I'll keep my note short to avoid confusion....
when i started my client install, master was down, and stayed down, replica was up, and stayed up. But client install failed.

Think your comments above do not relate to this scenario.
Comment 13 Dmitri Pal 2011-04-21 09:43:11 EDT
https://fedorahosted.org/freeipa/ticket/1187
Comment 14 Rob Crittenden 2011-04-27 15:56:45 EDT
I'm a bit fuzzy on the reproduction steps. Did you have bind configured with both master and replica configured as SRV records?
Comment 16 Rob Crittenden 2011-06-09 17:06:38 EDT
I have been unable to verify this. My set up consists of:

Original master with DNS on panther
Replica install with DNS on slinky

Confirmed that both have SRV records for the domain.

On panther run ipactl to completely shut down IPA.

On client lion configure /etc/resolv.conf with both panther as the nameserver:

# ipa-client-install (wait 15 seconds or so)
DNS discovery failed to determine your DNS domain
Please provide the domain name of your IPA server (ex: example.com):

Ok, that is expected. Add slinky to /etc/resolv.conf:

# ipa-client-install 
root        : ERROR    LDAP Error: Can't contact LDAP server: 
Failed to verify that slinky.greyoak.com is an IPA Server.
This may mean that the remote server is not up or is not reachable
due to network or firewall settings.

This is expected too as slinky is still a SRV record for the domain. I can keep trying and eventually I'll get slinky as the server to use:

# ipa-client-install 
Discovery was successful!
Hostname: lion.greyoak.com
Realm: GREYOAK.COM
DNS Domain: greyoak.com
IPA Server: slinky.greyoak.com
BaseDN: dc=greyoak,dc=com


Continue to configure the system with these values? [no]: y
Enrollment principal: admin
Password for admin@GREYOAK.COM: 

Enrolled in IPA realm GREYOAK.COM
Created /etc/ipa/default.conf
Configured /etc/sssd/sssd.conf
Configured /etc/krb5.conf for IPA realm GREYOAK.COM
Warning: Hostname (lion.greyoak.com) not found in DNS
DNS server record set to: lion.greyoak.com -> 192.168.166.32
SSSD enabled
Kerberos 5 enabled
NTP enabled
Client configuration complete.
[root@lion rcrit]# id admin
uid=1457600000(admin) gid=1457600000(admins) groups=1457600000(admins)

Seems to be working fine.

To make things easier I could have removed the panther SRV records from DNS.

Note that there may still be sporadic failures because sssd and Kerberos are both configured to use DNS discovery and panther is still down, but my basic tests work.
Comment 18 Rob Crittenden 2011-06-14 10:20:04 EDT
I cannot reproduce this, can you provide a more detailed case?
Comment 19 Namita Soman 2011-06-14 12:28:13 EDT
will set this up and will update. maybe with the new replica install, this is not an issue.
Comment 20 Namita Soman 2011-06-15 09:36:07 EDT
Steps followed:
1> Install master with DNS (dell-p690-01.testrelm)
2> install slave with DNS (apollo.testrelm)
3> install client (hp-xw4200-01.testrelm) specifying to install with --server pointing to master (dell-p690-01)

Next:
1> On master (dell-p690-01) ipactl stop
2> On slave (apollo) kinit admin
this fails: kinit: Cannot contact any KDC for realm 'TESTRELM' while getting initial credentials


So now....
1> on slave (apollo) did #cat /var/lib/sss/pubconf/kdcinfo.TESTRELM
this had the master's IP
2> Edited /var/lib/sss/pubconf/kdcinfo.TESTRELM to have slave's IP
3> Can kinit

Finally.....
1> Install client. It picked the slave (apollo) to install against. Services on master are still down
2> Installed successfully.

The issue:
kdcinfo.TESTRELM on slave was incorrect
Comment 21 Namita Soman 2011-06-15 09:38:04 EDT
Missed two steps.....
In the section above "Steps followed:"
4> On client, kinit admin was successful.
5> Uninstalled client
Comment 22 Jenny Galipeau 2011-06-15 10:32:56 EDT
blocking sssd bug https://bugzilla.redhat.com/show_bug.cgi?id=696193
Comment 23 Jenny Galipeau 2011-06-15 10:33:41 EDT
(In reply to comment #22)
> blocking sssd bug https://bugzilla.redhat.com/show_bug.cgi?id=696193

oops https://bugzilla.redhat.com/show_bug.cgi?id=713473
Comment 24 Rob Crittenden 2011-06-17 11:19:42 EDT
After some discussion we decided to configure IPA servers to not use SRV records and only talk to the local install.
Comment 27 Namita Soman 2011-09-22 13:40:31 EDT
testing this
Comment 28 Namita Soman 2011-09-22 14:35:27 EDT
Verified using steps below:
1. with master and replica started, installed client.
2. uninstalled client
3. stopped master
4. installed clint
5. kinit'd on replica and client using newly added user
6. started master, and kinited with new user there as well, and saw client listed correctly with its keytab true, when running ipa host-find
Comment 29 Rob Crittenden 2011-10-31 13:45:57 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: If one of the IPA servers is down then clients enrollment may fail.
Consequence: client enrollment is unpredictable if one of the IPA servers is down.
Fix: Do not configure sssd on an IPA server to do failover. A running server may be configured to use services on another that is down.
Result: sssd is predictable on an IPA server. When the IPA services are running then sssd is available.
Comment 30 errata-xmlrpc 2011-12-06 13:21:34 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1533.html

Note You need to log in before you can comment on or make changes to this bug.