Description of problem: We have a number of machines in our IdM environment. Unfortunatly, the first/highest ranked server listed in our IdM environment is unreachable by one of our RHEV-M hosts. In previous versions of RHEV-M, the first login after an ovirt-engine restart would take a while to complete as it times out in its attempt to hit the first server. But then subsequent logins would go directly to the 2nd ldap server listed. The behavior now does not failover. It just denys logins. Version-Release number of selected component (if applicable): [root ovirt-engine]$ rpm -q rhevm rhevm-3.2.1-0.39.el6ev.noarch How reproducible: Everytime Steps to Reproduce: 1. Add an IdM environment that has many many masters 2. Make rhev-m unavailable to the highest ranked IdM server in the domain 3. Attempt a login Actual results: Never gets a login Expected results: Possibly the first login takes longer, but subsequent logins work quickly. Additional info:
If it matters, this is our dns config. And idm1.phx is unreachable by my rhevm host. ;; ANSWER SECTION: _kerberos._tcp.salab.redhat.com. 300 IN SRV 4 100 88 idm2.rdu.salab.redhat.com. _kerberos._tcp.salab.redhat.com. 300 IN SRV 0 100 88 idm1.phx.salab.redhat.com. _kerberos._tcp.salab.redhat.com. 300 IN SRV 1 100 88 idm1.rdu.salab.redhat.com. _kerberos._tcp.salab.redhat.com. 300 IN SRV 3 100 88 idm2.phx.salab.redhat.com. I get the following error in rhev-m logs. 2013-07-18 10:22:32,413 ERROR [org.ovirt.engine.core.bll.adbroker.GetRootDSE] (ajp-/127.0.0.1:8702-38) Failed to query rootDSE for LDAP server LDAP://idm1.phx.salab.redhat.com:389 due to connection timeout 2013-07-18 10:22:32,415 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (ajp-/127.0.0.1:8702-38) Failed ldap search server LDAP://idm1.phx.salab.redhat.com:389 using user mdavis.COM due to connection timeout. We should try the next server 2013-07-18 10:22:32,415 ERROR [org.ovirt.engine.core.bll.adbroker.LdapBrokerCommandBase] (ajp-/127.0.0.1:8702-38) Failed to run command LdapAuthenticateUserCommand. Domain is salab.redhat.com. User is mdavis. 2013-07-18 10:22:32,416 ERROR [org.ovirt.engine.core.bll.LoginAdminUserCommand] (ajp-/127.0.0.1:8702-38) USER_FAILED_TO_AUTHENTICATE : mdavis 2013-07-18 10:22:32,416 WARN [org.ovirt.engine.core.bll.LoginAdminUserCommand] (ajp-/127.0.0.1:8702-38) CanDoAction of action LoginAdminUser failed. Reasons:USER_FAILED_TO_AUTHENTICATE It even says it should try the next server, but never does.
A workaround is to use the -ldapServers parameter in rhevm-manage-domains. # rhevm-manage-domains -action=add -domain=SALAB.REDHAT.COM -provider=IPA -user=admin -interactive -ldapServers=$SERVER1,$SERVER2 This is working as a suitable workaround.
May we get full logs? I think I know what is causing this, but I would like to be sure.
Just so I'm understood - engine.log ( + rotations like engine.log.1, etc.. if exists) and server.log
Attached an external tracker to oVirt gerrit with patch that might solve the issue
Actually, looking again at the bug description, the patch DOES solve this, we saw similar issue with bugs: BZ973566 BZ974148 Moving to closed-duplicate. *** This bug has been marked as a duplicate of bug 973566 ***