Bug 1382784

Summary: Replication cannot handle MODRDN conflict properly.
Product: Red Hat Enterprise Linux 6 Reporter: German Parente <gparente>
Component: 389-ds-baseAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED WONTFIX QA Contact: Viktor Ashirov <vashirov>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.8CC: lkrispen, msauton, nkinder, rmeggins, tbordaz, tlavigne
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-26 15:54:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1461138    
Attachments:
Description Flags
lib389 reproducible test case none

Description German Parente 2016-10-07 17:54:39 UTC
Description of problem:

we have seen this issue in customer site. Two differen connections do the same MODRDN in two different nodes at the same time.

the operation suceeds but the replicated op from one node to the other fails to be applied and aborts replication session.

Please, see the logs:


Node 01:

[07/Oct/2016:06:33:01 -0300] conn=26518 op=4799 MODRDN dn="cn=23175505199,ou=Nuevos,dc=abierto,dc=anses,dc=gov,dc=ar" newrdn="cn=23175505199" newsuperior="ou=Activos,dc=abierto,dc=anses,dc=gov,dc=ar"
[07/Oct/2016:06:33:01 -0300] conn=26518 op=4799 RESULT err=0 tag=109 nentries=0 etime=0 csn=57f76dd10002000a0000

Node 02:

[07/Oct/2016:06:33:02 -0300] conn=2680 op=4810 MODRDN dn="cn=23175505199,ou=Nuevos,dc=abierto,dc=anses,dc=gov,dc=ar" newrdn="cn=23175505199" newsuperior="ou=Activos,dc=abierto,dc=anses,dc=gov,dc=ar"
[07/Oct/2016:06:33:02 -0300] conn=2680 op=4810 RESULT err=0 tag=109 nentries=0 etime=0 csn=57f76dd2000100140000

In node 01 we see this message repeatedly:


07/Oct/2016:06:33:03 -0300] NSMMReplicationPlugin - agmt="cn=MMR-ansesrhds02" (ansesrhds02:636): Consumer failed to replay change (uniqueid 9f680901-8c7011e6-99f9afaa-6e3226cf, CSN 57f76dd10002000a0000): Server is unwilling to perform (53). Will retry later.


And in 02 repeatedly:

[07/Oct/2016:06:33:03 -0300] NSMMReplicationPlugin - process_postop: Failed to apply update (57f76dd10002000a0000) error (53).  Aborting replication session(conn=3 op=15371)

because the entry is not found. Has been the original entry transformed in a tombstone ?

Also in 01 logs:


[07/Oct/2016:07:37:43 -0300] conn=23631 op=18437 csn=57f77cfa000100140000 - Failed to convert cn=27006139499 to RDN
[07/Oct/2016:07:37:43 -0300] NSMMReplicationPlugin - process_postop: Failed to apply update (57f77cfa000100140000) error (1).  Aborting replication session(conn=23631 op=18437)

Because the customer, by mistake, is applying the same operations to both nodes all the time.


Version-Release number of selected component (if applicable):

389-ds-base-1.2.11.15-74.el6.x86_64


How reproducible:

I have not tried yet. I will do on Monday.

Comment 13 thierry bordaz 2016-10-13 15:27:25 UTC
Created attachment 1210179 [details]
lib389 reproducible test case

Comment 14 thierry bordaz 2016-10-13 15:28:57 UTC
With the attachment https://bugzilla.redhat.com/attachment.cgi?id=1210179

it creates the error err=68 that looks good to me and that does not break replication session
[07/Oct/2016:19:29:09.235324915 +0200] conn=5 op=5 MODRDN dn="cn=new_account0,cn=subtree1,dc=example,dc=com" newrdn="cn=new_account0" newsuperior="cn=subtree2,dc=example,dc=com"
[07/Oct/2016:19:29:09.235546262 +0200] conn=5 op=5 RESULT err=68 tag=109 nentries=0 etime=0 csn=57f7db62000000020000

Comment 21 Ludwig 2016-11-04 07:55:29 UTC
I think the problem is the missing fix for 918687 in 1.2.11