Bug 973566

Summary: [rhevm-manage-domains] RHEVM doesn't try the next LDAP server when child domain controller not active.
Product: Red Hat Enterprise Virtualization Manager Reporter: vvyazmin <vvyazmin>
Component: ovirt-engineAssignee: Martin Perina <mperina>
Status: CLOSED CURRENTRELEASE QA Contact: Ondra Machacek <omachace>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: acathrow, adevolder, byount, iheim, jkt, jraju, lpeer, mdavis, parsonsa, pstehlik, Rhev-m-bugs, yeylon, yzaslavs
Target Milestone: ---Keywords: Reopened
Target Release: 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: is7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-11 13:27:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
## Logs rhevm
none
oVirt 3.3 test engine.log none

Description vvyazmin@redhat.com 2013-06-12 08:43:01 UTC
Created attachment 760024 [details]
## Logs rhevm

Description of problem: RHEVM doesn't try the next LDAP server when child domain controller not active.

Version-Release number of selected component (if applicable):
RHEVM 3.2 - SF17.1 environment: 

RHEVM: rhevm-3.2.0-11.28.el6ev.noarch 
VDSM: vdsm-4.10.2-21.0.el6ev.x86_64 
LIBVIRT: libvirt-0.10.2-18.el6_4.5.x86_64 
QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.355.el6_4.3.x86_64 
SANLOCK: sanlock-2.6-2.el6.x86_64

How reproducible:
100%

Related to BZ675749

Steps to Reproduce:
1. Create Windows Domain Controller (Master Domain Controller) - qa1-tlv.qa.lab.tlv.redhat.com
2. Add additional (Child) Domain Controller (Slave Domain Controller) - qa2-tlv.qa.lab.tlv.redhat.com
3. Registrar RHEVM to domain with “rhevm-manage-domains” tool.
1. rhevm-manage-domains -action=add -domain=qa.lab.tlv.redhat.com -user=kokomen -interactive -addPermissions -provider=ActiveDirectory 
4. Verify that all works OK, and you can login with LDAP user.
5. Power off Child Domain Controller
  
Actual results:
Failed login with LDAP user, because RHEVM continue send request Child Domain Controller, and  doesn't try the next LDAP server

Expected results:
Succeed login with LDAP user

Impact on user:
Failed login with LDAP user

Workaround:
In /etc/hosts file redirect IP of Master Domain Controller to Child Domain Controller hostname

Additional info:

/var/log/ovirt-engine/engine.log

2013-06-05 15:10:20,332 ERROR [org.ovirt.engine.core.bll.adbroker.GetRootDSE] (QuartzScheduler_Worker-9) [1046cc64] Failed to query rootDSE for LDAP server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 due to connection timeout
2013-06-05 15:10:20,332 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (QuartzScheduler_Worker-9) [1046cc64] Failed ldap search server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 using user vdcadmin.TLV.REDHAT.COM due t
o connection timeout. We should try the next server
2013-06-05 15:10:20,332 ERROR [org.ovirt.engine.core.bll.adbroker.LdapBrokerCommandBase] (QuartzScheduler_Worker-9) [1046cc64] Failed to run command LdapSearchUserByQueryCommand. Domain is qa.lab.tlv.redhat.com. User is vdcadmin.T
LV.REDHAT.COM.}

/var/log/vdsm/vdsm.log

Comment 1 Yair Zaslavsky 2013-06-25 15:41:08 UTC
Worth mentioning that the initial order of ldap servers depends on the priorities of the SRV records (dig SRV _ldap._tcp.<DNS_DOMAIN> )

Comment 2 Martin Perina 2013-07-11 13:17:31 UTC
Created attachment 772217 [details]
oVirt 3.3 test engine.log

Comment 3 Martin Perina 2013-07-11 13:27:42 UTC
I've not been able to reproduce this bug on current oVirt 3.3 codebase, here's the test steps:
 
1) Add domain qa.lab.tlv.redhat.com

$ ./bin/engine-manage-domains -action=add -domain=qa.lab.tlv.redhat.com -user=vdcadmin -interactive -addPermissions -provider=ActiveDirectory
Enter password:

Successfully added domain qa.lab.tlv.redhat.com. oVirt Engine restart is required in order for the changes to take place (service ovirt-engine restart).
Manage Domains completed successfully

$ ./bin/engine-manage-domains -action=list
Domain: qa.lab.tlv.redhat.com
        User name: vdcadmin.TLV.REDHAT.COM
Manage Domains completed successfully

$ ./bin/engine-manage-domains -action=validate
Cannot connect to LDAP server qa2-tlv.qa.lab.tlv.redhat.com:389. Trying next LDAP server in list (if exists)
Domain qa.lab.tlv.redhat.com is valid.
The configured user for domain qa.lab.tlv.redhat.com is vdcadmin.TLV.REDHAT.COM
Manage Domains completed successfully


2) Start engine and log in as vdcadmin.tlv.redhat.com => user logged in successfully
   (I've added engine.log as attachment and I've also added logging to see what LDAP servers are configured for domain)


If I repeat those steps using RHEVM 3.2 SF18, there's an error in engine.log:

2013-07-11 15:02:25,193 ERROR [org.ovirt.engine.core.bll.adbroker.GetRootDSE] (QuartzScheduler_Worker-1) Failed to query rootDSE for LDAP server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 due to connection timeout
2013-07-11 15:02:25,200 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (QuartzScheduler_Worker-1) Failed ldap search server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 using user vdcadmin.TLV.REDHAT.COM due to connection timeout. We should try the next server
2013-07-11 15:02:25,200 ERROR [org.ovirt.engine.core.bll.adbroker.LdapBrokerCommandBase] (QuartzScheduler_Worker-1) Failed to run command LdapSearchUserByQueryCommand. Domain is qa.lab.tlv.redhat.com. User is vdcadmin.TLV.REDHAT.COM.}
2013-07-11 15:02:25,201 WARN  [org.ovirt.engine.core.bll.DbUserCacheManager] (QuartzScheduler_Worker-1) User vdcadmin.TLV.REDHAT.COM not found in directory sevrer, its status switched to InActive
2013-07-11 15:02:55,225 ERROR [org.ovirt.engine.core.bll.adbroker.GetRootDSE] (ajp-/127.0.0.1:8702-4) Failed to query rootDSE for LDAP server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 due to connection timeout
2013-07-11 15:02:55,226 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (ajp-/127.0.0.1:8702-4) Failed ldap search server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 using user vdcadmin.TLV.REDHAT.COM due to connection timeout. We should try the next server
2013-07-11 15:02:55,227 ERROR [org.ovirt.engine.core.bll.adbroker.LdapBrokerCommandBase] (ajp-/127.0.0.1:8702-4) Failed to run command LdapAuthenticateUserCommand. Domain is qa.lab.tlv.redhat.com. User is vdcadmin.}
2013-07-11 15:02:55,228 ERROR [org.ovirt.engine.core.bll.LoginAdminUserCommand] (ajp-/127.0.0.1:8702-4) USER_FAILED_TO_AUTHENTICATE : vdcadmin
2013-07-11 15:02:55,229 WARN  [org.ovirt.engine.core.bll.LoginAdminUserCommand] (ajp-/127.0.0.1:8702-4) CanDoAction of action LoginAdminUser failed. Reasons:USER_FAILED_TO_AUTHENTICATE


and user vdcadmin.tlv.redhat.com cannot log in.

Comment 4 Martin Perina 2013-07-15 07:51:07 UTC
I've found out that there's some error even in oVirt 3.3. When I modify the list of domains LDAP servers (so the turned off servers are returned first), user cannot log in and following errors appear:

2013-07-15 09:45:51,180 INFO  [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (http--0.0.0.0-8080-1) Ldap server list: LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389, LDAP://qa1.qa.lab.tlv.redhat.com:389
2013-07-15 09:46:21,753 ERROR [org.ovirt.engine.core.bll.adbroker.LdapSearchExceptionHandler] (http--0.0.0.0-8080-1) Error in communicating with LDAP server qa2-tlv.qa.lab.tlv.redhat.com:389; nested exception is javax.naming.CommunicationException: qa2-tlv.qa.lab.tlv.redhat.com:389 [Root exception is java.net.SocketTimeoutException: connect timed out]
2013-07-15 09:46:21,756 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (http--0.0.0.0-8080-1) Failed ldap search server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 using user vdcadmin.TLV.REDHAT.COM due to connection timeout. We should try the next server
2013-07-15 09:46:21,757 ERROR [org.ovirt.engine.core.bll.adbroker.LdapBrokerCommandBase] (http--0.0.0.0-8080-1) Failed to run command LdapAuthenticateUserCommand. Domain is qa.lab.tlv.redhat.com. User is vdcadmin.
2013-07-15 09:46:21,758 ERROR [org.ovirt.engine.core.bll.LoginAdminUserCommand] (http--0.0.0.0-8080-1) USER_FAILED_TO_AUTHENTICATE : vdcadmin
2013-07-15 09:46:21,758 WARN  [org.ovirt.engine.core.bll.LoginAdminUserCommand] (http--0.0.0.0-8080-1) CanDoAction of action LoginAdminUser failed. Reasons:USER_FAILED_TO_AUTHENTICATE

I will continue to investigate this.

Comment 5 Martin Perina 2013-07-15 12:20:33 UTC
The bug has already been partially resolved upstream. The only remaining error was in root DSE query code block: when the first LDAP server in list was not available, there was an uncaught RuntimeException that prevents querying next LDAP server and makes login unsuccessful at once.

Comment 6 Yair Zaslavsky 2013-07-21 12:20:46 UTC
*** Bug 985940 has been marked as a duplicate of this bug. ***

Comment 10 Yair Zaslavsky 2013-12-07 12:30:07 UTC
*** Bug 1032143 has been marked as a duplicate of this bug. ***

Comment 11 Itamar Heim 2014-01-21 22:23:08 UTC
Closing - RHEV 3.3 Released

Comment 12 Itamar Heim 2014-01-21 22:24:14 UTC
Closing - RHEV 3.3 Released

Comment 13 Itamar Heim 2014-01-21 22:27:54 UTC
Closing - RHEV 3.3 Released