Bug 1368616

Summary: AD stopped working after migration from legacy kerberos/ldap directory intergration
Product: [oVirt] ovirt-engine Reporter: Jiri Belka <jbelka>
Component: AAAAssignee: Ondra Machacek <omachace>
Status: CLOSED WONTFIX QA Contact: Aleksei Slaikovskii <aslaikov>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.6.7CC: bugs, lsvaty, mgoldboi, mperina, omachace, oourfali, ylavi
Target Milestone: ---Flags: sbonazzo: ovirt-4.0.z-
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-09 14:16:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jiri Belka 2016-08-20 03:11:02 UTC
Description of problem:

After migration from legacy kerberos/ldap directory intergration of AD in engine via https://github.com/machacekondra/ovirt-engine-kerbldap-migration/blob/ovirt-engine-kerbldap-migration-1.0.3/README.md it has stopped working.

Search inside Admin Portal in this AD does not work, login ends in 'General Command failure'.

Version-Release number of selected component (if applicable):
rhevm-3.6.8.1-0.1.el6.noarch

How reproducible:
???

Steps to Reproduce:
1. have legacy kerberos/ldap directory intergration of AD in engine
2. migrate via ovirt-engine-kerbldap-migration-tool
3. try to use AD in RHEVM, login, search for users

Actual results:
does not work, general command failure in login screen

Expected results:
should work

Additional info:

Comment 2 Martin Perina 2016-08-22 12:55:12 UTC
Ondro, could you please take a look?

Comment 4 Ondra Machacek 2016-08-22 14:09:19 UTC
As temporarly workaround you can use only specific replica:

 Is it possible to use specific Active Directory site?
 http://www.ovirt.org/develop/release-management/features/infra/aaa_faq/

Comment 5 Martin Perina 2016-08-22 14:33:06 UTC
Moving to 4.0.4 and reducing severity, because this is primarily environment issue, because if AD sites are correctly replicated, there's no issue in getting data from AD.

Comment 6 Ondra Machacek 2016-10-04 08:19:13 UTC
Actually, this is one of the limitation the aaa-ldap implemtation has.

It is impossible to use the RoundRobinDNSServerSet together with the
DNSSRVRecordServerSet as nested level to enable dynamic resolution of the
2nd level.

The immediate result is that multi-homed servers are not considered within
proper ordering, for example, if highest priority server has 4 addresses,
only one of these is considered before falling back into 2nd priority,
instead of randomly select each until all exhausted, and only then attempt
accessing the 2nd priority.

So what I can suggest here is the usage of the site or increase the priority of 
stable servers.

Comment 7 Oved Ourfali 2016-11-09 09:20:59 UTC
Anything we need to do here?

Comment 8 Ondra Machacek 2016-11-09 11:25:00 UTC
In my opinion - no.
Currently when there are two SRV records with same priority/weight we choose one 
randomly, if that server is not properly replicated, we fail. This is by design.

Martin do you think we should test properly configured servers and try another 
one if possible? I think we should fail, and tell user it's wrong.

Comment 9 Martin Perina 2016-11-09 14:16:26 UTC
RoundRobinDNSServerSet and DNSSRVRecordServerSet server selection methods are great to find out server which we can connect to. But the situation described in the bug is different: we have two working servers, which are replicated, but replication is currently out of sync. In this particular case we find out replication error, because user account which we used to authenticate to LDAP server, was closed/disabled on the server we selected, but it was open/enabled on the 2nd server. But there is no simple easy way how to find out which server contains the correct data and which one is out of sync.

So I'm closing this as WONTFIX, because it's LDAP administrator task to setup and manage proper server replication.