Bug 985940

Summary: authentication does not failover when ldap server is unreachable
Product: Red Hat Enterprise Virtualization Manager Reporter: Matthew Davis <mdavis>
Component: ovirt-engineAssignee: Yair Zaslavsky <yzaslavs>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: urgent    
Version: 3.2.0CC: acathrow, bazulay, iheim, lpeer, mdavis, oourfali, Rhev-m-bugs, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.2.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-21 12:20:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matthew Davis 2013-07-18 14:38:18 UTC
Description of problem:

We have a number of machines in our IdM environment. Unfortunatly, the first/highest ranked server listed in our IdM environment is unreachable by one of our RHEV-M hosts.

In previous versions of RHEV-M, the first login after an ovirt-engine restart would take a while to complete as it times out in its attempt to hit the first server. But then subsequent logins would go directly to the 2nd ldap server listed.

The behavior now does not failover. It just denys logins.


Version-Release number of selected component (if applicable):
[root ovirt-engine]$ rpm -q rhevm
rhevm-3.2.1-0.39.el6ev.noarch


How reproducible:
Everytime

Steps to Reproduce:
1. Add an IdM environment that has many many masters
2. Make rhev-m unavailable to the highest ranked IdM server in the domain
3. Attempt a login

Actual results:
Never gets a login

Expected results:
Possibly the first login takes longer, but subsequent logins work quickly.

Additional info:

Comment 1 Matthew Davis 2013-07-18 14:40:04 UTC
If it matters, this is our dns config. And idm1.phx is unreachable by my rhevm host.

;; ANSWER SECTION:
_kerberos._tcp.salab.redhat.com. 300 IN SRV     4 100 88 idm2.rdu.salab.redhat.com.
_kerberos._tcp.salab.redhat.com. 300 IN SRV     0 100 88 idm1.phx.salab.redhat.com.
_kerberos._tcp.salab.redhat.com. 300 IN SRV     1 100 88 idm1.rdu.salab.redhat.com.
_kerberos._tcp.salab.redhat.com. 300 IN SRV     3 100 88 idm2.phx.salab.redhat.com.


I get the following error in rhev-m logs.

2013-07-18 10:22:32,413 ERROR [org.ovirt.engine.core.bll.adbroker.GetRootDSE] (ajp-/127.0.0.1:8702-38) Failed to query rootDSE for LDAP server LDAP://idm1.phx.salab.redhat.com:389 due to connection timeout
2013-07-18 10:22:32,415 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (ajp-/127.0.0.1:8702-38) Failed ldap search server LDAP://idm1.phx.salab.redhat.com:389 using user mdavis.COM due to connection timeout. We
 should try the next server
2013-07-18 10:22:32,415 ERROR [org.ovirt.engine.core.bll.adbroker.LdapBrokerCommandBase] (ajp-/127.0.0.1:8702-38) Failed to run command LdapAuthenticateUserCommand. Domain is salab.redhat.com. User is mdavis.
2013-07-18 10:22:32,416 ERROR [org.ovirt.engine.core.bll.LoginAdminUserCommand] (ajp-/127.0.0.1:8702-38) USER_FAILED_TO_AUTHENTICATE : mdavis
2013-07-18 10:22:32,416 WARN  [org.ovirt.engine.core.bll.LoginAdminUserCommand] (ajp-/127.0.0.1:8702-38) CanDoAction of action LoginAdminUser failed. Reasons:USER_FAILED_TO_AUTHENTICATE



It even says it should try the next server, but never does.

Comment 2 Matthew Davis 2013-07-18 19:08:34 UTC
A workaround is to use the -ldapServers parameter in rhevm-manage-domains.

# rhevm-manage-domains -action=add -domain=SALAB.REDHAT.COM -provider=IPA -user=admin -interactive -ldapServers=$SERVER1,$SERVER2

This is working as a suitable workaround.

Comment 3 Yair Zaslavsky 2013-07-21 12:15:17 UTC
May we get full logs?
I think I know what is causing this, but I would like to be sure.

Comment 4 Yair Zaslavsky 2013-07-21 12:16:03 UTC
Just so I'm understood - engine.log ( + rotations like engine.log.1, etc.. if exists) and server.log

Comment 5 Yair Zaslavsky 2013-07-21 12:17:51 UTC
Attached an external tracker to oVirt gerrit with patch that might solve the issue

Comment 6 Yair Zaslavsky 2013-07-21 12:20:45 UTC
Actually, looking again at the bug description, the patch DOES solve this, we saw similar issue with bugs:

BZ973566
BZ974148

Moving to closed-duplicate.

*** This bug has been marked as a duplicate of bug 973566 ***