Hide Forgot
Description of problem: After migration from legacy kerberos/ldap directory intergration of AD in engine via https://github.com/machacekondra/ovirt-engine-kerbldap-migration/blob/ovirt-engine-kerbldap-migration-1.0.3/README.md it has stopped working. Search inside Admin Portal in this AD does not work, login ends in 'General Command failure'. Version-Release number of selected component (if applicable): rhevm-3.6.8.1-0.1.el6.noarch How reproducible: ??? Steps to Reproduce: 1. have legacy kerberos/ldap directory intergration of AD in engine 2. migrate via ovirt-engine-kerbldap-migration-tool 3. try to use AD in RHEVM, login, search for users Actual results: does not work, general command failure in login screen Expected results: should work Additional info:
Ondro, could you please take a look?
As temporarly workaround you can use only specific replica: Is it possible to use specific Active Directory site? http://www.ovirt.org/develop/release-management/features/infra/aaa_faq/
Moving to 4.0.4 and reducing severity, because this is primarily environment issue, because if AD sites are correctly replicated, there's no issue in getting data from AD.
Actually, this is one of the limitation the aaa-ldap implemtation has. It is impossible to use the RoundRobinDNSServerSet together with the DNSSRVRecordServerSet as nested level to enable dynamic resolution of the 2nd level. The immediate result is that multi-homed servers are not considered within proper ordering, for example, if highest priority server has 4 addresses, only one of these is considered before falling back into 2nd priority, instead of randomly select each until all exhausted, and only then attempt accessing the 2nd priority. So what I can suggest here is the usage of the site or increase the priority of stable servers.
Anything we need to do here?
In my opinion - no. Currently when there are two SRV records with same priority/weight we choose one randomly, if that server is not properly replicated, we fail. This is by design. Martin do you think we should test properly configured servers and try another one if possible? I think we should fail, and tell user it's wrong.
RoundRobinDNSServerSet and DNSSRVRecordServerSet server selection methods are great to find out server which we can connect to. But the situation described in the bug is different: we have two working servers, which are replicated, but replication is currently out of sync. In this particular case we find out replication error, because user account which we used to authenticate to LDAP server, was closed/disabled on the server we selected, but it was open/enabled on the 2nd server. But there is no simple easy way how to find out which server contains the correct data and which one is out of sync. So I'm closing this as WONTFIX, because it's LDAP administrator task to setup and manage proper server replication.