Bug 1398670

Summary: Check IdM Topology for broken record caused by replication conflict before upgrading it
Product: Red Hat Enterprise Linux 7 Reporter: German Parente <gparente>
Component: ipaAssignee: IPA Maintainers <ipa-maint>
Status: CLOSED ERRATA QA Contact: Sudhir Menon <sumenon>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.3CC: ekeck, ipa-qe, jcholast, ksiddiqu, lkrispen, mbabinsk, mkosek, nsoman, pvoborni, rcritten, tbordaz
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: ipa-4.4.0-14.el7.3 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1404338 (view as bug list) Environment:
Last Closed: 2017-08-01 09:42:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1395848, 1404338    

Description German Parente 2016-11-25 14:19:01 UTC
Description of problem:

we can see at some customer's deployment the following issue:

ipa topologysuffix-verify domain
========================================================
Replication topology of suffix "domain" contains errors.
========================================================
-------------------------------------------------------------
Recommended maximum number of agreements per replica exceeded
-------------------------------------------------------------
  Maximum number of agreements per replica: 4
  Server "server1" has 5 agreements with servers:
    server2
    server3
    server4
    server5
    server4


So, we see server4 twice !

When inspecting cn=topology tree, we can see the following entries:

dn: cn=server1-to-server4,cn=domain,cn=topology,cn=ipa,cn=etc,dc=unix,dc=local
 objectClass: iparepltoposegment
 objectClass: top
 cn: server1-to-server4
 ipaReplTopoSegmentLeftNode: server1
 ipaReplTopoSegmentRightNode: server4
 ipaReplTopoSegmentDirection: left-right
 ipaReplTopoSegmentStatus: autogen


But also:

dn: cn=server4-to-server1,cn=domain,cn=topology,cn=ipa,cn=etc,dc=unix,dc=local
 objectClass: iparepltoposegment
 objectClass: top
 cn: server4-to-server1
 ipaReplTopoSegmentLeftNode: server4
 ipaReplTopoSegmentRightNode: server1
 ipaReplTopoSegmentDirection: both
 ipaReplTopoSegmentStatus: autogen

even if this could be valid as setting, I think in IPA we should not allow to have a "both" direction and a "left right" or "right left" direction between the same nodes simultaneously. What would be the point of having redundant replication conflicts.

Note that the DN is different because in one case is node1 -> node2 "both" and in the other node2 -> node1 "left right".





Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Petr Vobornik 2016-11-25 14:34:15 UTC
This indeed looks as wrong state. The segments should be merged by topology plugin. CCing Ludwig.

Comment 3 Ludwig 2016-11-29 11:33:33 UTC
The segments seem to be merged, we have a "both" segment. But the "left-right" should have been deleted.

Can you provide the segment with full entrystate: nscpentrywsi ?
Do you know teh sequence of actions leading to this state ?

Comment 4 German Parente 2016-11-30 16:17:54 UTC
Hi Ludwig,

at this moment, the issue was solved at customer side. 

I cannot answer your question but since this customer had conflicts as well, we have manipulated segments with the plugin disabled. I wonder whether the customer has created segments before re-enabling the plugin.

In any case, is there any reason to allow creating left-right segments in ipa. Shouldn't the segments be only "both" till we could allow read only replicas ?

Regards.

Comment 5 Ludwig 2016-11-30 16:26:19 UTC
the existence of left-right and both is a side effect of the "conflict problem" and Ihave seen them in my efforts to reproduce, so it is another part of the necessary cleanup to check for these semi-duplicate segments.

For the other question: no, we cannot do this. When the topology plugin is enabled, eg after raising the domain level, then it starts on each server independently and creating the segments from the existing agreements, at this time it does not know if there is an agreement in the other direction. in an existing topology admin might have created one-directional agreements.
So, segments fro agreements always start ondirectional and when they "meet" the other direction one is transformed to both and the other is removed.

Comment 6 German Parente 2016-11-30 16:39:32 UTC
Thanks for your answer Ludwig.

I didn't know we could create one-directional segment between two masters.

What would happen if I have:

A ===> B

between two masters and I do updates on B only ?

In traditional replication, this will provoke that the replicas are not synced.

Comment 7 Ludwig 2016-11-30 16:52:54 UTC
the IPA CLI does not allow to create one directional segments, but if we start creating segments by bootstrapping a topology from existing replication agreements, we start with one directional segments. Your example doe not work, but what prevented to create a toplogy like 
A <==> B <==> C <==> D <==> A  and add A ==> C and B ==> D.
we do not support the creation of this by creating segments, but if it exists then B==>D will become a left-right only segment

Comment 8 German Parente 2016-11-30 17:17:54 UTC
"and add A ==> C and B ==> D" by which means ? If the CLI does not support it, we can only do this manually.


If there's a B==>D segment in A <==> B <==> C <==> D <==> A 

and we decide later to disconnect, for instance, A <=> D and B <=> C, the graph is still connected but updates to D will never arrive to A and B.

I would say this bug was provoked by a bootstrapping process that has not managed to read all the segments to make them become from "left right" to "both" (either because not all the nodes were reachable at that moment or because of the conflicts issue).

If that's the case, then let's close this one.

Perhaps we could add a mechanism of "merge" not only at bootstrap but at some other point (for instance a sort of ipa replica-topologysuffix-verify --repair) since the code is already present.

Comment 9 Petr Vobornik 2016-12-02 15:29:22 UTC
I would not close the bug. The situation can be reported by topologysuffix-verify command. Or also fixed with e.g. --repair, as suggested by German.

I see the resolution as:

1. Can we prevent the situation? If not then #2. If yes, then fix where we can prevent it.

2. Can topology plugin fix it automatically? If yes then fix. If not then #3.

3. Do the stuff in ipa topologysuffix-verify outlined above.

Ludwig, is #1 or #2 possible? Seems to me that comment 5 says yes.

Comment 10 Petr Vobornik 2016-12-02 17:36:56 UTC
Upstream ticket:
https://fedorahosted.org/freeipa/ticket/6534

Comment 11 Ludwig 2016-12-05 08:21:41 UTC
(In reply to Petr Vobornik from comment #9)
> I would not close the bug. The situation can be reported by
> topologysuffix-verify command. Or also fixed with e.g. --repair, as
> suggested by German.
> 
> I see the resolution as:
> 
> 1. Can we prevent the situation? If not then #2. If yes, then fix where we
> can prevent it.
> 
> 2. Can topology plugin fix it automatically? If yes then fix. If not then #3.
> 
> 3. Do the stuff in ipa topologysuffix-verify outlined above.
> 
> Ludwig, is #1 or #2 possible? Seems to me that comment 5 says yes.

I am working on #1

#2 would be hard or impossible as the entries may be turned into conflict entries after generation of segments

#3 could be an option to report and resolve conflicts, we can evaluate what is needed when we have more results testing with #1

Comment 12 Petr Vobornik 2016-12-05 16:53:34 UTC
And #1 is on 389-ds side, right? So we should change component, right?

Comment 13 Ludwig 2016-12-05 17:04:39 UTC
(In reply to Petr Vobornik from comment #12)
> And #1 is on 389-ds side, right? So we should change component, right?

I would close it then  as duplicate of 1395848, that is the bug used to fix the conflicts

Comment 14 Ludwig 2016-12-06 08:59:43 UTC
I was thinking about it again, and there is something that could and should be done in IPA. 
The fix in DS will prevent the creation of visible conflicts and so prevent follow up errors, but in a deployment where the conflicts already exist the raising of the domain level can cause the problems with segments.

So I suggest to add a check to the "ipa domainlevel-set" command to check if there are conflicts below cn=topology and reject raising the domainlevel.

Comment 15 Martin Kosek 2016-12-06 11:44:25 UTC
Let's add Bug 1395848 as blocked by the IdM change. It would be indeed good to add this one check in the command to make sure the deployment does not get to a broken state after Domain Level upgrade.

Warning: when implementing this change, we will need to make sure we use the right filter for detecting collisions, given that their structure is being changed in Bug 1395848.

Comment 16 Ludwig 2016-12-06 11:49:25 UTC
no, bug 1395848 is not blocked, is only that the patch 1395848 will not address the existing conflicts, so they are complementary.

And the check has to use the filter to find the old conflict entries, the conflict entries created after applying fix for 1395848 should not be in the way.
But it would be ok to search for old AND new conflicts

Comment 17 Ludwig 2016-12-06 11:51:41 UTC
the summary should say "before raising the domain level" not "before upgrading".

Comment 18 Petr Vobornik 2016-12-07 16:46:43 UTC
I don't understand how checking for replication conflicts in segment entries  before raising domain level to 1 would help.

AFAIK on domain level 0 no segments exist so there should not be any replication conflicts. Conflicts happen after domain level is raised. Then topology plugin is "activated" as starts to create segemts, right? What am I missing? Or did you mean replication conflicts in suffix entries?

Comment 19 Ludwig 2016-12-07 16:58:04 UTC
the problem is the existing conflict entries for cn=domain or cn=ca.

if there are conflict entries it can happen that when the domain level is raised some segments will be put under the "conflict" entries and some under the "real" entries and we can avoid this by checking before raising the domain level.

if a replica is upgraded to 4.3+ then the cn=realm is deleted  and cn=domain is added and also cn=ca is added. If the upgrade is run in parallel on several replicas, it can happen that this creates conflict entries. 

a fix in ds can prevent/hide these, but if a deployment already upgraded to 7.3, but did not yet raise the domain level, the conflicts could already exist and the patch in DS would not cjhange this, but a check for conflicts before raising the domain level could prevent to make it worse

Comment 30 Sudhir Menon 2017-05-23 12:15:43 UTC
Marking the bug as VERIFIED as per steps mentioned in  https://bugzilla.redhat.com/show_bug.cgi?id=1404338#c8.
Upgraded from 7.2.z to 7.4 while verifying this bug.

ipa-server-4.5.0-13.el7.x86_64
sssd-1.15.2-31.el7.x86_64
python2-cryptography-1.7.2-1.el7.x86_64
selinux-policy-3.13.1-151.el7.noarch
krb5-server-1.15.1-8.el7.x86_64
pki-server-10.4.1-4.el7.noarch
389-ds-base-1.3.6.1-14.el7.x86_64


[root@ibm-x3650m4-01-vm-04 slapd-TESTRELM-TEST]# echo *** | kinit admin
    Password for admin@TESTRELM.TEST:
    [root@ibm-x3650m4-01-vm-04 slapd-TESTRELM-TEST]# klist -l
    Principal name                 Cache name
    --------------                 ----------
    admin@TESTRELM.TEST            KEYRING:persistent:0:0

[root@ibm-x3650m4-01-vm-04 slapd-TESTRELM-TEST]# ipa domainlevel-set 1
    -----------------------
    Current domain level: 1
    -----------------------
     
[root@replica-01-vm-15 slapd-TESTRELM-TEST]# ipa domainlevel-set 1
    ipa: ERROR: no modifications to be performed

[root@replica-01-vm-15 slapd-TESTRELM-TEST]# ipa domainlevel-get
    -----------------------
    Current domain level: 1
    -----------------------

Comment 32 errata-xmlrpc 2017-08-01 09:42:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2304