Created attachment 760024 [details] ## Logs rhevm Description of problem: RHEVM doesn't try the next LDAP server when child domain controller not active. Version-Release number of selected component (if applicable): RHEVM 3.2 - SF17.1 environment: RHEVM: rhevm-3.2.0-11.28.el6ev.noarch VDSM: vdsm-4.10.2-21.0.el6ev.x86_64 LIBVIRT: libvirt-0.10.2-18.el6_4.5.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.355.el6_4.3.x86_64 SANLOCK: sanlock-2.6-2.el6.x86_64 How reproducible: 100% Related to BZ675749 Steps to Reproduce: 1. Create Windows Domain Controller (Master Domain Controller) - qa1-tlv.qa.lab.tlv.redhat.com 2. Add additional (Child) Domain Controller (Slave Domain Controller) - qa2-tlv.qa.lab.tlv.redhat.com 3. Registrar RHEVM to domain with “rhevm-manage-domains” tool. 1. rhevm-manage-domains -action=add -domain=qa.lab.tlv.redhat.com -user=kokomen -interactive -addPermissions -provider=ActiveDirectory 4. Verify that all works OK, and you can login with LDAP user. 5. Power off Child Domain Controller Actual results: Failed login with LDAP user, because RHEVM continue send request Child Domain Controller, and doesn't try the next LDAP server Expected results: Succeed login with LDAP user Impact on user: Failed login with LDAP user Workaround: In /etc/hosts file redirect IP of Master Domain Controller to Child Domain Controller hostname Additional info: /var/log/ovirt-engine/engine.log 2013-06-05 15:10:20,332 ERROR [org.ovirt.engine.core.bll.adbroker.GetRootDSE] (QuartzScheduler_Worker-9) [1046cc64] Failed to query rootDSE for LDAP server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 due to connection timeout 2013-06-05 15:10:20,332 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (QuartzScheduler_Worker-9) [1046cc64] Failed ldap search server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 using user vdcadmin.TLV.REDHAT.COM due t o connection timeout. We should try the next server 2013-06-05 15:10:20,332 ERROR [org.ovirt.engine.core.bll.adbroker.LdapBrokerCommandBase] (QuartzScheduler_Worker-9) [1046cc64] Failed to run command LdapSearchUserByQueryCommand. Domain is qa.lab.tlv.redhat.com. User is vdcadmin.T LV.REDHAT.COM.} /var/log/vdsm/vdsm.log
Worth mentioning that the initial order of ldap servers depends on the priorities of the SRV records (dig SRV _ldap._tcp.<DNS_DOMAIN> )
Created attachment 772217 [details] oVirt 3.3 test engine.log
I've not been able to reproduce this bug on current oVirt 3.3 codebase, here's the test steps: 1) Add domain qa.lab.tlv.redhat.com $ ./bin/engine-manage-domains -action=add -domain=qa.lab.tlv.redhat.com -user=vdcadmin -interactive -addPermissions -provider=ActiveDirectory Enter password: Successfully added domain qa.lab.tlv.redhat.com. oVirt Engine restart is required in order for the changes to take place (service ovirt-engine restart). Manage Domains completed successfully $ ./bin/engine-manage-domains -action=list Domain: qa.lab.tlv.redhat.com User name: vdcadmin.TLV.REDHAT.COM Manage Domains completed successfully $ ./bin/engine-manage-domains -action=validate Cannot connect to LDAP server qa2-tlv.qa.lab.tlv.redhat.com:389. Trying next LDAP server in list (if exists) Domain qa.lab.tlv.redhat.com is valid. The configured user for domain qa.lab.tlv.redhat.com is vdcadmin.TLV.REDHAT.COM Manage Domains completed successfully 2) Start engine and log in as vdcadmin.tlv.redhat.com => user logged in successfully (I've added engine.log as attachment and I've also added logging to see what LDAP servers are configured for domain) If I repeat those steps using RHEVM 3.2 SF18, there's an error in engine.log: 2013-07-11 15:02:25,193 ERROR [org.ovirt.engine.core.bll.adbroker.GetRootDSE] (QuartzScheduler_Worker-1) Failed to query rootDSE for LDAP server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 due to connection timeout 2013-07-11 15:02:25,200 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (QuartzScheduler_Worker-1) Failed ldap search server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 using user vdcadmin.TLV.REDHAT.COM due to connection timeout. We should try the next server 2013-07-11 15:02:25,200 ERROR [org.ovirt.engine.core.bll.adbroker.LdapBrokerCommandBase] (QuartzScheduler_Worker-1) Failed to run command LdapSearchUserByQueryCommand. Domain is qa.lab.tlv.redhat.com. User is vdcadmin.TLV.REDHAT.COM.} 2013-07-11 15:02:25,201 WARN [org.ovirt.engine.core.bll.DbUserCacheManager] (QuartzScheduler_Worker-1) User vdcadmin.TLV.REDHAT.COM not found in directory sevrer, its status switched to InActive 2013-07-11 15:02:55,225 ERROR [org.ovirt.engine.core.bll.adbroker.GetRootDSE] (ajp-/127.0.0.1:8702-4) Failed to query rootDSE for LDAP server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 due to connection timeout 2013-07-11 15:02:55,226 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (ajp-/127.0.0.1:8702-4) Failed ldap search server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 using user vdcadmin.TLV.REDHAT.COM due to connection timeout. We should try the next server 2013-07-11 15:02:55,227 ERROR [org.ovirt.engine.core.bll.adbroker.LdapBrokerCommandBase] (ajp-/127.0.0.1:8702-4) Failed to run command LdapAuthenticateUserCommand. Domain is qa.lab.tlv.redhat.com. User is vdcadmin.} 2013-07-11 15:02:55,228 ERROR [org.ovirt.engine.core.bll.LoginAdminUserCommand] (ajp-/127.0.0.1:8702-4) USER_FAILED_TO_AUTHENTICATE : vdcadmin 2013-07-11 15:02:55,229 WARN [org.ovirt.engine.core.bll.LoginAdminUserCommand] (ajp-/127.0.0.1:8702-4) CanDoAction of action LoginAdminUser failed. Reasons:USER_FAILED_TO_AUTHENTICATE and user vdcadmin.tlv.redhat.com cannot log in.
I've found out that there's some error even in oVirt 3.3. When I modify the list of domains LDAP servers (so the turned off servers are returned first), user cannot log in and following errors appear: 2013-07-15 09:45:51,180 INFO [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (http--0.0.0.0-8080-1) Ldap server list: LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389, LDAP://qa1.qa.lab.tlv.redhat.com:389 2013-07-15 09:46:21,753 ERROR [org.ovirt.engine.core.bll.adbroker.LdapSearchExceptionHandler] (http--0.0.0.0-8080-1) Error in communicating with LDAP server qa2-tlv.qa.lab.tlv.redhat.com:389; nested exception is javax.naming.CommunicationException: qa2-tlv.qa.lab.tlv.redhat.com:389 [Root exception is java.net.SocketTimeoutException: connect timed out] 2013-07-15 09:46:21,756 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (http--0.0.0.0-8080-1) Failed ldap search server LDAP://qa2-tlv.qa.lab.tlv.redhat.com:389 using user vdcadmin.TLV.REDHAT.COM due to connection timeout. We should try the next server 2013-07-15 09:46:21,757 ERROR [org.ovirt.engine.core.bll.adbroker.LdapBrokerCommandBase] (http--0.0.0.0-8080-1) Failed to run command LdapAuthenticateUserCommand. Domain is qa.lab.tlv.redhat.com. User is vdcadmin. 2013-07-15 09:46:21,758 ERROR [org.ovirt.engine.core.bll.LoginAdminUserCommand] (http--0.0.0.0-8080-1) USER_FAILED_TO_AUTHENTICATE : vdcadmin 2013-07-15 09:46:21,758 WARN [org.ovirt.engine.core.bll.LoginAdminUserCommand] (http--0.0.0.0-8080-1) CanDoAction of action LoginAdminUser failed. Reasons:USER_FAILED_TO_AUTHENTICATE I will continue to investigate this.
The bug has already been partially resolved upstream. The only remaining error was in root DSE query code block: when the first LDAP server in list was not available, there was an uncaught RuntimeException that prevents querying next LDAP server and makes login unsuccessful at once.
*** Bug 985940 has been marked as a duplicate of this bug. ***
*** Bug 1032143 has been marked as a duplicate of this bug. ***
Closing - RHEV 3.3 Released