Hide Forgot
This bug is created as a clone of upstream ticket: https://fedorahosted.org/389/ticket/47788 although the consumer returns a failure, and this failure should stop (break) replication (ignore_error_and_keep_going() does not ignore unwilling to perform), the operations following the failing one are replicated and the update the RUV. At the end it is like the failing operation is skipped. Here is the test case scenario (in bold the failing CSN and the CSN following the failing one) M1: MOD entry_18: csn=53592ba7000000010000 DEL entry_1: csn=53592ba7000100010000 MOD entry_18: csn=53592ba8000000010000 <-- M2 MOD entry_19: csn=53592ba9000000020000 MOD entry_1: csn=53592baa000000020000 (error) MOD entry_19: csn=53592baa000100020000 MOD entry_19: csn=53592bb6000000010000 MOD entry_19: csn=53592bbb000000010000 MOD entry_19: csn=53592bc1000000010000 ... MOD entry_19: csn=53592beb000000010000 <-- M2 MOD entry_19: csn=53592bf2000000020000 MOD entry_19: csn=53592bf8000000020000 ... M2: MOD entry_19: csn=53592ba9000000020000 MOD entry_1: csn=53592baa000000020000 MOD entry_19: csn=53592baa000100020000 M1 --> MOD entry_18: csn=53592ba7000000010000 DEL entry_1: csn=53592ba7000100010000 MOD entry_18: csn=53592ba8000000010000 MOD entry_19: csn=53592bb6000000010000 MOD entry_19: csn=53592bbb000000010000 MOD entry_19: csn=53592bc1000000010000 ... ... On M1, we can see that the MOD on entry_1 failed but replication of M2 updates is not stopped [24/Apr/2014:17:20:13 +0200] conn=5 op=5 MOD dn="cn=new_account19,cn=staged user,dc=example,dc=com" [24/Apr/2014:17:20:14 +0200] conn=5 op=5 RESULT err=0 tag=103 nentries=0 etime=1 csn=53592ba9000000020000 [24/Apr/2014:17:20:14 +0200] conn=5 op=6 MOD dn="cn=new_account1,cn=staged user,dc=example,dc=com" [24/Apr/2014:17:20:14 +0200] conn=5 op=6 RESULT err=53 tag=103 nentries=0 etime=0 csn=53592baa000000020000 [24/Apr/2014:17:20:14 +0200] conn=5 op=7 MOD dn="cn=new_account19,cn=staged user,dc=example,dc=com" [24/Apr/2014:17:20:16 +0200] conn=5 op=7 RESULT err=0 tag=103 nentries=0 etime=2 csn=53592baa000100020000 [24/Apr/2014:17:20:19 +0200] conn=3 op=40 SRCH base="dc=example,dc=com" scope=2 filter="(cn=new_account19)" attrs=ALL [24/Apr/2014:17:20:19 +0200] conn=3 op=40 RESULT err=0 tag=101 nentries=1 etime=0 [24/Apr/2014:17:20:20 +0200] conn=5 op=8 EXT oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" [24/Apr/2014:17:20:20 +0200] conn=5 op=8 RESULT err=0 tag=120 nentries=0 etime=0 [24/Apr/2014:17:20:21 +0200] conn=5 op=9 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop" [24/Apr/2014:17:20:21 +0200] conn=3 op=42 MOD dn="cn=new_account19,cn=staged user,dc=example,dc=com" [24/Apr/2014:17:20:21 +0200] conn=5 op=9 RESULT err=0 tag=120 nentries=0 etime=1 [24/Apr/2014:17:20:22 +0200] conn=3 op=42 RESULT err=0 tag=103 nentries=0 etime=1 csn=53592bb6000000010000 ... [24/Apr/2014:17:21:25 +0200] conn=5 op=44 MOD dn="cn=new_account19,cn=staged user,dc=example,dc=com" [24/Apr/2014:17:21:25 +0200] conn=5 op=44 RESULT err=0 tag=103 nentries=0 etime=0 csn=53592bf2000000020000 ... [24/Apr/2014:17:21:30 +0200] conn=5 op=48 MOD dn="cn=new_account19,cn=staged user,dc=example,dc=com" [24/Apr/2014:17:21:30 +0200] conn=5 op=48 RESULT err=0 tag=103 nentries=0 etime=0 csn=53592bf8000000020000 On M1, we can see that the failing MOD is add to the pending list, but when the next update (53592baa000100020000) is commited, the failing update is no longer in the pending list. [24/Apr/2014:17:20:14 +0200] NSMMReplicationPlugin - 53592baa000000020000, not committed [24/Apr/2014:17:20:14 +0200] NSMMReplicationPlugin - ruv_add_csn_inprogress: successfully inserted csn 53592baa000000020000 into pending list [24/Apr/2014:17:20:14 +0200] ldbm_back_modify - Attempt to modify a tombstone entry nsuniqueid=d131a60c-cbc311e3-847ba7c2-9c316dbb,cn=new_account1,cn=staged user,dc=example,dc=com [24/Apr/2014:17:20:14 +0200] NSMMReplicationPlugin - conn=5 op=6 csn=53592baa000000020000 process postop: canceling operation csn [24/Apr/2014:17:20:15 +0200] NSMMReplicationPlugin - csnplInsert: CSN Pending list content: [24/Apr/2014:17:20:15 +0200] NSMMReplicationPlugin - 53592baa000100020000, not committed [24/Apr/2014:17:20:15 +0200] NSMMReplicationPlugin - ruv_add_csn_inprogress: successfully inserted csn 53592baa000100020000 into pending list [24/Apr/2014:17:20:16 +0200] NSMMReplicationPlugin - replica_update_ruv: csn 53592baa000100020000 [24/Apr/2014:17:20:16 +0200] NSMMReplicationPlugin - csnplCommit: CSN Pending list content: [24/Apr/2014:17:20:16 +0200] NSMMReplicationPlugin - 53592baa000100020000, not committed [24/Apr/2014:17:20:16 +0200] NSMMReplicationPlugin - ruv_update_ruv: successfully committed csn 53592baa000100020000 [24/Apr/2014:17:20:16 +0200] NSMMReplicationPlugin - ruv_update_ruv: rolled up to csn 53592baa000100020000 On M2, we can see that the second replication session find 53592baa000100020000 in the RUV [24/Apr/2014:17:20:12 +0200] Consumer RUV: [24/Apr/2014:17:20:12 +0200] {replicageneration} 53590540000000010000 [24/Apr/2014:17:20:12 +0200] {replica 1 ldap://localhost.localdomain:44389} 53590546000000010000 53592ba8000000010000 00000000 [24/Apr/2014:17:20:12 +0200] {replica 2 ldap://localhost.localdomain:45389} ... [24/Apr/2014:17:20:13 +0200] agmt="cn=meTo_localhost.localdomain:44389" (localhost:44389) - load=1 rec=2 csn=53592baa000000020000 [24/Apr/2014:17:20:13 +0200] NSMMReplicationPlugin - agmt="cn=meTo_localhost.localdomain:44389" (localhost:44389): replay_update: Sending modify operation (dn="cn=new_account1,cn=staged user,dc=example,dc=com" csn=53592baa000000020000) [24/Apr/2014:17:20:13 +0200] NSMMReplicationPlugin - agmt="cn=meTo_localhost.localdomain:44389" (localhost:44389): replay_update: Consumer successfully sent operation with csn 53592baa000000020000 [24/Apr/2014:17:20:13 +0200] agmt="cn=meTo_localhost.localdomain:44389" (localhost:44389) - load=1 rec=3 csn=53592baa000100020000 [24/Apr/2014:17:20:13 +0200] NSMMReplicationPlugin - agmt="cn=meTo_localhost.localdomain:44389" (localhost:44389): replay_update: Sending modify operation (dn="cn=new_account19,cn=staged user,dc=example,dc=com" csn=53592baa000100020000) [24/Apr/2014:17:20:13 +0200] NSMMReplicationPlugin - agmt="cn=meTo_localhost.localdomain:44389" (localhost:44389): replay_update: Consumer successfully sent operation with csn 53592baa000100020000 [24/Apr/2014:17:20:14 +0200] NSMMReplicationPlugin - agmt="cn=meTo_localhost.localdomain:44389" (localhost:44389): Consumer failed to replay change (uniqueid d131a60c-cbc311e3-847ba7c2-9c316dbb, CSN 53592baa000000020000): Server is unwilling to perform (53). Will retry later. ... [24/Apr/2014:17:20:21 +0200] Consumer RUV: [24/Apr/2014:17:20:21 +0200] {replicageneration} 53590540000000010000 [24/Apr/2014:17:20:21 +0200] {replica 1 ldap://localhost.localdomain:44389} 53590546000000010000 53592ba8000000010000 00000000 [24/Apr/2014:17:20:21 +0200] {replica 2 ldap://localhost.localdomain:45389} 53592ba9000000020000 53592baa000100020000 00000000 So replication did not stop on 53592baa000000020000, and the update is skipped although the supplier intended to retry it. It is possibly related to the update being removed from the pending list, the next commited update 53592baa000100020000 can go to the RUV
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune@redhat.com with any questions
Bug Verified Work fine for me no error occure during the adding and deleteing entry in MMR setup Step1) Set up MMR Step 2) Add 1 million user in instance using ldclt ldclt -D "cn=Directory Manager" -w test1234 -f cn=XXXXXX -b "ou=people,dc=example,dc=com" -e add,person,incr,noloop,commoncounter -r1 -R1000000 -I 68 -I 32 -H ldap://localhost:389 Step 3) delete same 1 million user ldclt -D "cn=Directory Manager" -w test1234 -f cn=XXXXXX -b "ou=people,dc=example,dc=com" -e delete,person,incr,noloop,commoncounter -r1 -R1000000 -I 68 -I 32 -H ldap://localhost:389 Step 4) Check access log [root@dhcp210-197 slapd-test1]# pwd /var/log/dirsrv/slapd-test1 [root@dhcp210-197 slapd-test1]# grep -c err=53 access* access:0 access.20160727-183248:0 access.20160728-064940:0 access.20160728-192225:0 access.20160801-090001:0 access.20160801-222119:0 access.rotationinfo:0 [root@dhcp210-197 slapd-test]# pwd /var/log/dirsrv/slapd-test [root@dhcp210-197 slapd-test]# grep -c err=53 access* access:0 access.20160727-183209:0 access.20160728-063057:0 access.20160728-190940:0 access.20160801-093749:0 access.20160801-222320:0 access.rotationinfo:0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2594.html
*** Bug 1339693 has been marked as a duplicate of this bug. ***