Bug 973566 - [rhevm-manage-domains] RHEVM doesn't try the next LDAP server when child domain controller not active.
Summary: [rhevm-manage-domains] RHEVM doesn't try the next LDAP server when child doma...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.2.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 3.3.0
Assignee: Martin Perina
QA Contact: Ondra Machacek
URL:
Whiteboard: infra
: 985940 1032143 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-06-12 08:43 UTC by vvyazmin@redhat.com
Modified: 2018-12-05 16:05 UTC (History)
13 users (show)

Fixed In Version: is7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-11 13:27:42 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
## Logs rhevm (410.64 KB, application/x-xz)
2013-06-12 08:43 UTC, vvyazmin@redhat.com
no flags Details
oVirt 3.3 test engine.log (7.08 KB, application/x-compressed-tar)
2013-07-11 13:17 UTC, Martin Perina
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 70533 0 None None None 2018-12-03 20:43:58 UTC
Red Hat Knowledge Base (Solution) 803513 0 None None None Never
oVirt gerrit 16859 0 None None None Never

Description vvyazmin@redhat.com 2013-06-12 08:43:01 UTC
Created attachment 760024 [details]
## Logs rhevm

Description of problem: RHEVM doesn't try the next LDAP server when child domain controller not active.

Version-Release number of selected component (if applicable):
RHEVM 3.2 - SF17.1 environment: 

RHEVM: rhevm-3.2.0-11.28.el6ev.noarch 
VDSM: vdsm-4.10.2-21.0.el6ev.x86_64 
LIBVIRT: libvirt-0.10.2-18.el6_4.5.x86_64 
QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.355.el6_4.3.x86_64 
SANLOCK: sanlock-2.6-2.el6.x86_64

How reproducible:
100%

Related to BZ675749

Steps to Reproduce:
1. Create Windows Domain Controller (Master Domain Controller) - qa1-tlv.qa.lab.tlv.redhat.com
2. Add additional (Child) Domain Controller (Slave Domain Controller) - qa2-tlv.qa.lab.tlv.redhat.com
3. Registrar RHEVM to domain with “rhevm-manage-domains” tool.
1. rhevm-manage-domains -action=add -domain=qa.lab.tlv.redhat.com -user=kokomen -interactive -addPermissions -provider=ActiveDirectory 
4. Verify that all works OK, and you can login with LDAP user.
5. Power off Child Domain Controller
  
Actual results:
Failed login with LDAP user, because RHEVM continue send request Child Domain Controller, and  doesn't try the next LDAP server

Expected results:
Succeed login with LDAP user

Impact on user:
Failed login with LDAP user

Workaround:
In /etc/hosts file redirect IP of Master Domain Controller to Child Domain Controller hostname

Additional info:

/var/log/ovirt-engine/engine.log

2013-06-05 15:10:20,332 ERROR [org.ovirt.engine.core.bll.adbroker.GetRootDSE] (QuartzScheduler_Worker-9) [1046cc64] Failed to query rootDSE for LDAP server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 due to connection timeout
2013-06-05 15:10:20,332 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (QuartzScheduler_Worker-9) [1046cc64] Failed ldap search server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 using user vdcadmin.TLV.REDHAT.COM due t
o connection timeout. We should try the next server
2013-06-05 15:10:20,332 ERROR [org.ovirt.engine.core.bll.adbroker.LdapBrokerCommandBase] (QuartzScheduler_Worker-9) [1046cc64] Failed to run command LdapSearchUserByQueryCommand. Domain is qa.lab.tlv.redhat.com. User is vdcadmin.T
LV.REDHAT.COM.}

/var/log/vdsm/vdsm.log

Comment 1 Yair Zaslavsky 2013-06-25 15:41:08 UTC
Worth mentioning that the initial order of ldap servers depends on the priorities of the SRV records (dig SRV _ldap._tcp.<DNS_DOMAIN> )

Comment 2 Martin Perina 2013-07-11 13:17:31 UTC
Created attachment 772217 [details]
oVirt 3.3 test engine.log

Comment 3 Martin Perina 2013-07-11 13:27:42 UTC
I've not been able to reproduce this bug on current oVirt 3.3 codebase, here's the test steps:
 
1) Add domain qa.lab.tlv.redhat.com

$ ./bin/engine-manage-domains -action=add -domain=qa.lab.tlv.redhat.com -user=vdcadmin -interactive -addPermissions -provider=ActiveDirectory
Enter password:

Successfully added domain qa.lab.tlv.redhat.com. oVirt Engine restart is required in order for the changes to take place (service ovirt-engine restart).
Manage Domains completed successfully

$ ./bin/engine-manage-domains -action=list
Domain: qa.lab.tlv.redhat.com
        User name: vdcadmin.TLV.REDHAT.COM
Manage Domains completed successfully

$ ./bin/engine-manage-domains -action=validate
Cannot connect to LDAP server qa2-tlv.qa.lab.tlv.redhat.com:389. Trying next LDAP server in list (if exists)
Domain qa.lab.tlv.redhat.com is valid.
The configured user for domain qa.lab.tlv.redhat.com is vdcadmin.TLV.REDHAT.COM
Manage Domains completed successfully


2) Start engine and log in as vdcadmin.tlv.redhat.com => user logged in successfully
   (I've added engine.log as attachment and I've also added logging to see what LDAP servers are configured for domain)


If I repeat those steps using RHEVM 3.2 SF18, there's an error in engine.log:

2013-07-11 15:02:25,193 ERROR [org.ovirt.engine.core.bll.adbroker.GetRootDSE] (QuartzScheduler_Worker-1) Failed to query rootDSE for LDAP server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 due to connection timeout
2013-07-11 15:02:25,200 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (QuartzScheduler_Worker-1) Failed ldap search server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 using user vdcadmin.TLV.REDHAT.COM due to connection timeout. We should try the next server
2013-07-11 15:02:25,200 ERROR [org.ovirt.engine.core.bll.adbroker.LdapBrokerCommandBase] (QuartzScheduler_Worker-1) Failed to run command LdapSearchUserByQueryCommand. Domain is qa.lab.tlv.redhat.com. User is vdcadmin.TLV.REDHAT.COM.}
2013-07-11 15:02:25,201 WARN  [org.ovirt.engine.core.bll.DbUserCacheManager] (QuartzScheduler_Worker-1) User vdcadmin.TLV.REDHAT.COM not found in directory sevrer, its status switched to InActive
2013-07-11 15:02:55,225 ERROR [org.ovirt.engine.core.bll.adbroker.GetRootDSE] (ajp-/127.0.0.1:8702-4) Failed to query rootDSE for LDAP server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 due to connection timeout
2013-07-11 15:02:55,226 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (ajp-/127.0.0.1:8702-4) Failed ldap search server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 using user vdcadmin.TLV.REDHAT.COM due to connection timeout. We should try the next server
2013-07-11 15:02:55,227 ERROR [org.ovirt.engine.core.bll.adbroker.LdapBrokerCommandBase] (ajp-/127.0.0.1:8702-4) Failed to run command LdapAuthenticateUserCommand. Domain is qa.lab.tlv.redhat.com. User is vdcadmin.}
2013-07-11 15:02:55,228 ERROR [org.ovirt.engine.core.bll.LoginAdminUserCommand] (ajp-/127.0.0.1:8702-4) USER_FAILED_TO_AUTHENTICATE : vdcadmin
2013-07-11 15:02:55,229 WARN  [org.ovirt.engine.core.bll.LoginAdminUserCommand] (ajp-/127.0.0.1:8702-4) CanDoAction of action LoginAdminUser failed. Reasons:USER_FAILED_TO_AUTHENTICATE


and user vdcadmin.tlv.redhat.com cannot log in.

Comment 4 Martin Perina 2013-07-15 07:51:07 UTC
I've found out that there's some error even in oVirt 3.3. When I modify the list of domains LDAP servers (so the turned off servers are returned first), user cannot log in and following errors appear:

2013-07-15 09:45:51,180 INFO  [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (http--0.0.0.0-8080-1) Ldap server list: LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389, LDAP://qa1.qa.lab.tlv.redhat.com:389
2013-07-15 09:46:21,753 ERROR [org.ovirt.engine.core.bll.adbroker.LdapSearchExceptionHandler] (http--0.0.0.0-8080-1) Error in communicating with LDAP server qa2-tlv.qa.lab.tlv.redhat.com:389; nested exception is javax.naming.CommunicationException: qa2-tlv.qa.lab.tlv.redhat.com:389 [Root exception is java.net.SocketTimeoutException: connect timed out]
2013-07-15 09:46:21,756 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (http--0.0.0.0-8080-1) Failed ldap search server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 using user vdcadmin.TLV.REDHAT.COM due to connection timeout. We should try the next server
2013-07-15 09:46:21,757 ERROR [org.ovirt.engine.core.bll.adbroker.LdapBrokerCommandBase] (http--0.0.0.0-8080-1) Failed to run command LdapAuthenticateUserCommand. Domain is qa.lab.tlv.redhat.com. User is vdcadmin.
2013-07-15 09:46:21,758 ERROR [org.ovirt.engine.core.bll.LoginAdminUserCommand] (http--0.0.0.0-8080-1) USER_FAILED_TO_AUTHENTICATE : vdcadmin
2013-07-15 09:46:21,758 WARN  [org.ovirt.engine.core.bll.LoginAdminUserCommand] (http--0.0.0.0-8080-1) CanDoAction of action LoginAdminUser failed. Reasons:USER_FAILED_TO_AUTHENTICATE

I will continue to investigate this.

Comment 5 Martin Perina 2013-07-15 12:20:33 UTC
The bug has already been partially resolved upstream. The only remaining error was in root DSE query code block: when the first LDAP server in list was not available, there was an uncaught RuntimeException that prevents querying next LDAP server and makes login unsuccessful at once.

Comment 6 Yair Zaslavsky 2013-07-21 12:20:46 UTC
*** Bug 985940 has been marked as a duplicate of this bug. ***

Comment 10 Yair Zaslavsky 2013-12-07 12:30:07 UTC
*** Bug 1032143 has been marked as a duplicate of this bug. ***

Comment 11 Itamar Heim 2014-01-21 22:23:08 UTC
Closing - RHEV 3.3 Released

Comment 12 Itamar Heim 2014-01-21 22:24:14 UTC
Closing - RHEV 3.3 Released

Comment 13 Itamar Heim 2014-01-21 22:27:54 UTC
Closing - RHEV 3.3 Released


Note You need to log in before you can comment on or make changes to this bug.