Red Hat Bugzilla – Bug 985940
authentication does not failover when ldap server is unreachable
Last modified: 2016-02-10 14:34:28 EST
Description of problem:
We have a number of machines in our IdM environment. Unfortunatly, the first/highest ranked server listed in our IdM environment is unreachable by one of our RHEV-M hosts.
In previous versions of RHEV-M, the first login after an ovirt-engine restart would take a while to complete as it times out in its attempt to hit the first server. But then subsequent logins would go directly to the 2nd ldap server listed.
The behavior now does not failover. It just denys logins.
Version-Release number of selected component (if applicable):
[firstname.lastname@example.org ovirt-engine]$ rpm -q rhevm
Steps to Reproduce:
1. Add an IdM environment that has many many masters
2. Make rhev-m unavailable to the highest ranked IdM server in the domain
3. Attempt a login
Never gets a login
Possibly the first login takes longer, but subsequent logins work quickly.
If it matters, this is our dns config. And idm1.phx is unreachable by my rhevm host.
;; ANSWER SECTION:
_kerberos._tcp.salab.redhat.com. 300 IN SRV 4 100 88 idm2.rdu.salab.redhat.com.
_kerberos._tcp.salab.redhat.com. 300 IN SRV 0 100 88 idm1.phx.salab.redhat.com.
_kerberos._tcp.salab.redhat.com. 300 IN SRV 1 100 88 idm1.rdu.salab.redhat.com.
_kerberos._tcp.salab.redhat.com. 300 IN SRV 3 100 88 idm2.phx.salab.redhat.com.
I get the following error in rhev-m logs.
2013-07-18 10:22:32,413 ERROR [org.ovirt.engine.core.bll.adbroker.GetRootDSE] (ajp-/127.0.0.1:8702-38) Failed to query rootDSE for LDAP server LDAP://idm1.phx.salab.redhat.com:389 due to connection timeout
2013-07-18 10:22:32,415 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (ajp-/127.0.0.1:8702-38) Failed ldap search server LDAP://idm1.phx.salab.redhat.com:389 using user mdavis@SALAB.REDHAT.COM due to connection timeout. We
should try the next server
2013-07-18 10:22:32,415 ERROR [org.ovirt.engine.core.bll.adbroker.LdapBrokerCommandBase] (ajp-/127.0.0.1:8702-38) Failed to run command LdapAuthenticateUserCommand. Domain is salab.redhat.com. User is mdavis.
2013-07-18 10:22:32,416 ERROR [org.ovirt.engine.core.bll.LoginAdminUserCommand] (ajp-/127.0.0.1:8702-38) USER_FAILED_TO_AUTHENTICATE : mdavis
2013-07-18 10:22:32,416 WARN [org.ovirt.engine.core.bll.LoginAdminUserCommand] (ajp-/127.0.0.1:8702-38) CanDoAction of action LoginAdminUser failed. Reasons:USER_FAILED_TO_AUTHENTICATE
It even says it should try the next server, but never does.
A workaround is to use the -ldapServers parameter in rhevm-manage-domains.
# rhevm-manage-domains -action=add -domain=SALAB.REDHAT.COM -provider=IPA -user=admin -interactive -ldapServers=$SERVER1,$SERVER2
This is working as a suitable workaround.
May we get full logs?
I think I know what is causing this, but I would like to be sure.
Just so I'm understood - engine.log ( + rotations like engine.log.1, etc.. if exists) and server.log
Attached an external tracker to oVirt gerrit with patch that might solve the issue
Actually, looking again at the bug description, the patch DOES solve this, we saw similar issue with bugs:
Moving to closed-duplicate.
*** This bug has been marked as a duplicate of bug 973566 ***