Bug 1309962

Summary: keep alive entries can break replication
Product: Red Hat Enterprise Linux 6 Reporter: Jan Kurik <jkurik>
Component: 389-ds-baseAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED ERRATA QA Contact: Viktor Ashirov <vashirov>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.0CC: amsharma, ekeck, gparente, mkolaja, msauton, nhosoi, nkinder, rmeggins, tbordaz
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 389-ds-base-1.2.11.15-71.el6_7 Doc Type: Bug Fix
Doc Text:
Previously, a keep alive entry was being created at too many opportunities during replication, potentially causing a race condition when adding the entry to the replica changelog and resulting in operations being dropped from the replication. With this update, unnecessary keep alive entry creation has been eliminated, and missing replication no longer occurs.
Story Points: ---
Clone Of: 1307152 Environment:
Last Closed: 2016-03-22 14:47:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1307152    
Bug Blocks:    
Attachments:
Description Flags
M2_logs
none
M1 logs none

Description Jan Kurik 2016-02-19 06:04:29 UTC
This bug has been copied from bug #1307152 and has been proposed
to be backported to 6.7 z-stream (EUS).

Comment 4 Amita Sharma 2016-03-04 13:33:12 UTC
Hi tbordaz,

I Assume the fix verification for this bug is ::
=================================
1. Create Master-consumer replication with fractional replication.
2. Afetr total init, check on cosumer side, the keep alive entry should not be created
3. Create a dummy entry on master and modify it with filtered updates in fraction replication, till it reaches 100.
4. Now check on consumer, keep alive entry should be there.

Here is what I have done ::
=======================
[root@ipaqa64vml yum.repos.d]# rpm -qa | grep 389
389-ds-base-libs-1.2.11.15-71.el6_7.x86_64
389-ds-base-1.2.11.15-71.el6_7.x86_64


Create Master. and on Master ::

enable replication logs
ldapmodify -x -h localhost -p 389 -D "cn=directory manager" -w Secret123 << EOF
> dn: cn=config
> changetype: modify
> replace: nsslapd-errorlog-level
> nsslapd-errorlog-level: 8192
> EOF
modifying entry "cn=config"

created replication manager entry
ldapmodify -x -a -h localhost -p 389 -D "cn=Directory Manager" -w Secret123 << EOF
dn: cn=replication manager,cn=config 
objectClass: inetorgperson
objectClass: person
objectClass: top
cn: replication manager
sn: RM
userPassword: Secret123
passwordExpirationTime: 20380119031407Z
nsIdleTimeout: 0

Enable Changelog
ldapmodify -D "cn=directory manager" -w Secret123 -p 389 -x -h localhost -v -a << EOF
dn: cn=changelog5,cn=config
changetype: add
objectclass: top
objectclass: extensibleObject
cn: changelog5
nsslapd-changelogdir: /var/lib/dirsrv/slapd-master/changelogdb
nsslapd-changelogmaxage: 10d
EOF

Configure Replica
ldapmodify -D "cn=directory manager" -w Secret123 -p 389 -x -h localhost -v -a << EOF
dn: cn=replica,cn=dc\=example\,dc\=com,cn=mapping tree,cn=config
changetype: add
objectclass: top
objectclass: nsds5replica
objectclass: extensibleObject
cn: replica
nsds5replicaroot: dc=example,dc=com
nsds5replicaid: 1
nsds5replicatype: 3
nsds5flags: 1
nsds5ReplicaPurgeDelay: 604800
nsds5ReplicaBindDN: cn=replication manager,cn=config

On Supplier
=========
Add Replication manager
ldapmodify -x -a -h localhost -p 1389 -D "cn=Directory Manager" -w Secret123 << EOF
dn: cn=replication manager,cn=config 
objectClass: inetorgperson
objectClass: person
objectClass: top
cn: replication manager
sn: RM
userPassword: Secret123
passwordExpirationTime: 20380119031407Z
nsIdleTimeout: 0

Create Replica
ldapmodify -D "cn=directory manager" -w Secret123 -p 1389 -x -h localhost -v -a << EOF
dn: cn=replica,cn=dc\=example\,dc\=com,cn=mapping tree,cn=config
changetype: add
objectclass: top
objectclass: nsds5replica
objectclass: extensibleObject
cn: replica
nsds5replicaroot: dc=example,dc=com
nsds5replicaid: 2
nsds5replicatype: 2
nsds5ReplicaBindDN: cn=replication manager,cn=config
nsds5flags: 0


On consumer
=========
Add replication agreement
ldapmodify -v -h localhost -p 389 -D "cn=directory manager" -w Secret123 << EOF
dn: cn=ExampleAgreement,cn=replica,cn="dc=example,dc=com",cn=mapping tree,cn=config
changetype: add
objectclass: top
objectclass: nsds5replicationagreement
cn: ExampleAgreement
nsds5replicahost: localhost
nsds5replicaport: 1389
nsds5ReplicaBindDN: cn=replication manager,cn=config
nsds5replicabindmethod: SIMPLE
nsds5replicaroot: dc=example,dc=com
description: agreement between Master and Slave
nsds5replicaupdateschedule: 0000-0500 1
nsds5replicatedattributelist: (objectclass=*) $ EXCLUDE mail
nsds5replicacredentials: Secret123
nsds5BeginReplicaRefresh: start

And as soon as I add the replication agreement, It adds the keep alive entry in the consumer server.

error Logs ::
[04/Mar/2016:07:24:58 -0500] NSMMReplicationPlugin - agmt="cn=ExampleAgreement" (localhost:1389): Replica was successfully acquired.
[04/Mar/2016:07:24:58 -0500] NSMMReplicationPlugin - Beginning total update of replica "agmt="cn=ExampleAgreement" (localhost:1389)".
[04/Mar/2016:07:24:58 -0500] NSMMReplicationPlugin - Need to create replication keep alive entry <cn=repl keep alive 1,dc=example,dc=com>
[04/Mar/2016:07:24:58 -0500] NSMMReplicationPlugin - add dn: cn=repl keep alive 1,dc=example,dc=com
objectclass: top
objectclass: ldapsubentry
objectclass: extensibleObject
cn: repl keep alive 1
[04/Mar/2016:07:24:58 -0500] NSMMReplicationPlugin - ruv_add_csn_inprogress: successfully inserted csn 56d97e9a000100010000 into pending list

And ldapsearch on consumer ::
ldapsearch -x -D "cn=Directory Manager" -h localhost -p 389 -w Secret123 -b "cn=repl keep alive 1,dc=example,dc=com" -s base
# extended LDIF
#
# LDAPv3
# base <cn=repl keep alive 1,dc=example,dc=com> with scope baseObject
# filter: (objectclass=*)
# requesting: ALL
#

# repl keep alive 1, example.com
dn: cn=repl keep alive 1,dc=example,dc=com
objectClass: top
objectClass: ldapsubentry
objectClass: extensibleObject
cn: repl keep alive 1

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1
[root@ipaqa64vml yum.repos.d]# 


Please advice, what is the expected behavior and the verification steps.
Thanks,
Ami

Comment 5 thierry bordaz 2016-03-04 15:07:24 UTC
Hello Ami,

The described 'fix verification' (https://bugzilla.redhat.com/show_bug.cgi?id=1309962#c4) steps 1 and 3 are IMHO incorrect.

Each instance in the topology (master or consumer) may have created a 'keep alive' entry. The 'cn=repl keep alive 1' is created by the instance with replicaId:1 (here the master). The consumer (replicaId:2) may create a 'cn=repl keep alive 2' entry if it needs to.

The bug was that the consumer (replicaId:2) created the entry 'cn=repl keep alive 2' but the entry was cleared from its changelog and so not replicated. It basically existed only on the consumer.

Step 1 : In IPA all replica are actually masters, I mean can be updated. In the deployement make sure that both master and consumer are nsdsreplicatype: 3 (masters). 

Step 2 : 'cn=repl keep alive 2' should not exist. The notion of master and consumer are said only for the role during initialization, the consumer is also a master (can handle updates) but has been initialized by the 'master'

Step 3: The dummy entry can be created on master or consumer. But the filtered updates must be applied on the consumer. So after you created more than 100 filtered updates on the consumer, the replica will create 'cn=repl keep alive 2'.

Step 4: ok

Hope it will help, do not hesitate to ping me.

thanks
theirry

Comment 6 Amita Sharma 2016-03-07 07:26:42 UTC
Hi Theirry ,

Thanks for the reply, here are the steps I have followed now::
M1 - 389
M2 - 1389

Both are masters but only one line is missing in M2 agreement i.e. :: "nsds5BeginReplicaRefresh: start" so that only M1 initializes M2 and not vise verse.

On M1 the agreement looks like ::
 ================================
ldapmodify -v -h localhost -p 389 -D "cn=directory manager" -w Secret123 << EOF
dn: cn=ExampleAgreement,cn=replica,cn="dc=example,dc=com",cn=mapping tree,cn=config
changetype: add
objectclass: top
objectclass: nsds5replicationagreement
cn: ExampleAgreement
nsds5replicahost: localhost
nsds5replicaport: 389
nsds5ReplicaBindDN: cn=replication manager,cn=config
nsds5replicabindmethod: SIMPLE
nsds5replicaroot: dc=example,dc=com
description: agreement between Master and Slave
nsds5replicatedattributelist: (objectclass=*) $ EXCLUDE mail
nsds5replicacredentials: Secret123
nsds5BeginReplicaRefresh: start

At this point, M1 has created keep alive entry in M2, But M2 has not created any entry in M1 ::
=====================================================
M1 ::
[07/Mar/2016:01:57:58 -0500] NSMMReplicationPlugin - agmt="cn=ExampleAgreement" (localhost:1389): replay_update: Sending add operation (dn="cn=repl keep alive 1,dc=example,dc=com" csn=56dd2670000100010000)

M2::
[07/Mar/2016:01:57:58 -0500] @A - urp_add (cn=repl keep alive 1,dc=example,dc=com): an entry with this uniqueid already exists.

[root@ipaqa64vml ~]# grep -li "cn=repl keep alive 1" /var/log/dirsrv/slapd-*/errors
/var/log/dirsrv/slapd-M1/errors
/var/log/dirsrv/slapd-M2/errors
[root@ipaqa64vml ~]# grep -li "cn=repl keep alive 2" /var/log/dirsrv/slapd-*/errors
[root@ipaqa64vml ~]# 

I have executed this script on M2::
================================
Entry added on M2::
$ldapadd -x -h localhost -p 389 -D "cn=Directory Manager" -w Secret123  << EOF
dn: uid=ami,dc=example,dc=com
cn: ams
sn: ams
givenname: ams
objectclass: top
objectclass: person
objectclass: organizationalPerson
objectclass: inetOrgPerson
uid: ami
mail: ams
userpassword: Secret123
EOF

[root@ipaqa64vml export]# cat modify.sh 
CPT=0
while [ $CPT -ne 100 ]
do
        # do this update 'mail' that is not replicated
        # so after more than 100 update, it will create the consumer keep alive entry
        #
        /usr/bin/ldapmodify -x -h localhost -p 1389 -D "cn=Directory Manager" -w Secret123 << EOF
dn: uid=ami,dc=example,dc=com
changetype: modify
replace: mail
mail: aamii
EOF
	CPT=`expr $CPT + 1`
done


Which does 100 times modifications on mail attribute which is skipped under fractional replication.
Still I don't see keep alive entry on M1.

PFA for the error logs on both the servers.
Please let me know in case any step in missing or need modification.

Thanks,
Ami

Comment 7 Amita Sharma 2016-03-07 07:36:27 UTC
Created attachment 1133644 [details]
M2_logs

Comment 8 Amita Sharma 2016-03-07 07:36:55 UTC
Created attachment 1133645 [details]
M1 logs

Comment 12 Amita Sharma 2016-03-07 17:16:04 UTC
Thanks Thierry for help with reproducing.

Two agreements on M1 and M2, where M2 is initialized by M1, but M1 is not by M2.

1. dn: cn=M1_to_M2,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=confi
 g
objectClass: top
objectClass: nsds5replicationagreement
cn: ExampleAgreement
cn: M1_to_M2
nsDS5ReplicaHost: localhost
nsDS5ReplicaPort: 389
nsDS5ReplicaBindDN: cn=replication manager,cn=config
nsDS5ReplicaBindMethod: SIMPLE
nsDS5ReplicaRoot: dc=example,dc=com
description: agreement between Master and Slave
nsDS5ReplicatedAttributeList: (objectclass=*) $ EXCLUDE mail
nsds5ReplicaStripAttrs: modifiersname modifytimestamp internalmodifiersname in
 ternalmodifytimestamp

2. dn: cn=M2_to_M1,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=confi
 g
objectClass: top
objectClass: nsds5replicationagreement
cn: ExampleAgreement
cn: M1_to_M2
nsDS5ReplicaHost: localhost
nsDS5ReplicaPort: 1389
nsDS5ReplicaBindDN: cn=replication manager,cn=config
nsDS5ReplicaBindMethod: SIMPLE
nsDS5ReplicaRoot: dc=example,dc=com
description: agreement between Master and Slave
nsDS5ReplicatedAttributeList: (objectclass=*) $ EXCLUDE mail
nsds5ReplicaStripAttrs: modifiersname modifytimestamp internalmodifiersname in
 ternalmodifytimestamp

3. On M2 Keep alive is created at the time of initialization, but it was not present on M1.

4. After doing 100+ updates on mail attribute on M2, Keep Alive entry got created ::
/var/log/dirsrv/slapd-M1/errors:[07/Mar/2016:11:42:02 -0500] NSMMReplicationPlugin - Need to create replication keep alive entry <cn=repl keep alive 1,dc=example,dc=com>
/var/log/dirsrv/slapd-M1/errors:[07/Mar/2016:11:42:02 -0500] NSMMReplicationPlugin - add dn: cn=repl keep alive 1,dc=example,dc=com
/var/log/dirsrv/slapd-M1/errors:[07/Mar/2016:11:42:07 -0500] NSMMReplicationPlugin - agmt="cn=M1_to_M2" (localhost:1389): replay_update: Sending add operation (dn="cn=repl keep alive 1,dc=example,dc=com" csn=56ddaf5a000100010000)
/var/log/dirsrv/slapd-M1/errors:[07/Mar/2016:11:47:11 -0500] NSMMReplicationPlugin - agmt="cn=M1_to_M2" (localhost:1389): replay_update: Sending add operation (dn="cn=repl keep alive 1,dc=example,dc=com" csn=56ddaf5a000100010000)
/var/log/dirsrv/slapd-M1/errors:[07/Mar/2016:11:50:09 -0500] NSMMReplicationPlugin - agmt="cn=M1_to_M2" (localhost:1389): replay_update: Sending add operation (dn="cn=repl keep alive 1,dc=example,dc=com" csn=56ddaf5a000100010000)
/var/log/dirsrv/slapd-M1/errors:[07/Mar/2016:11:53:43 -0500] NSMMReplicationPlugin - agmt="cn=M1_to_M2" (localhost:1389): replay_update: Sending add operation (dn="cn=repl keep alive 1,dc=example,dc=com" csn=56ddaf5a000100010000)
/var/log/dirsrv/slapd-M1/errors:[07/Mar/2016:11:58:46 -0500] NSMMReplicationPlugin - agmt="cn=M1_to_M2" (localhost:1389): replay_update: Sending add operation (dn="cn=repl keep alive 1,dc=example,dc=com" csn=56ddaf5a000100010000)
/var/log/dirsrv/slapd-M1/errors:[07/Mar/2016:12:03:51 -0500] NSMMReplicationPlugin - agmt="cn=M1_to_M2" (localhost:1389): replay_update: Sending add operation (dn="cn=repl keep alive 1,dc=example,dc=com" csn=56ddaf5a000100010000)
/var/log/dirsrv/slapd-M1/errors:[07/Mar/2016:12:08:56 -0500] NSMMReplicationPlugin - agmt="cn=M1_to_M2" (localhost:1389): replay_update: Sending add operation (dn="cn=repl keep alive 1,dc=example,dc=com" csn=56ddaf5a000100010000)
/var/log/dirsrv/slapd-M1/errors:[07/Mar/2016:12:09:11 -0500] NSMMReplicationPlugin - Purged state information from entry cn=repl keep alive 2,dc=example,dc=com up to CSN 56d47b15000100020000
/var/log/dirsrv/slapd-M1/errors:[07/Mar/2016:12:09:13 -0500] NSMMReplicationPlugin - agmt="cn=M1_to_M2" (localhost:1389): replay_update: Sending add operation (dn="cn=repl keep alive 1,dc=example,dc=com" csn=56ddaf5a000100010000)
/var/log/dirsrv/slapd-M1/errors:[07/Mar/2016:12:09:19 -0500] NSMMReplicationPlugin - agmt="cn=M1_to_M2" (localhost:1389): replay_update: Sending add operation (dn="cn=repl keep alive 1,dc=example,dc=com" csn=56ddaf5a000100010000)
/var/log/dirsrv/slapd-M2/errors: - urp_add (cn=repl keep alive 1,dc=example,dc=com): an entry with this uniqueid already exists.
/var/log/dirsrv/slapd-M2/errors: - urp_add (cn=repl keep alive 1,dc=example,dc=com): an entry with this uniqueid already exists.
/var/log/dirsrv/slapd-M2/errors:[07/Mar/2016:11:50:09 -0500] �� - urp_add (cn=repl keep alive 1,dc=example,dc=com): an entry with this uniqueid already exists.
/var/log/dirsrv/slapd-M2/errors:[07/Mar/2016:11:53:43 -0500] �� - urp_add (cn=repl keep alive 1,dc=example,dc=com): an entry with this uniqueid already exists.
/var/log/dirsrv/slapd-M2/errors:[07/Mar/2016:11:58:46 -0500] �� - urp_add (cn=repl keep alive 1,dc=example,dc=com): an entry with this uniqueid already exists.
/var/log/dirsrv/slapd-M2/errors:[07/Mar/2016:12:03:51 -0500] �� - urp_add (cn=repl keep alive 1,dc=example,dc=com): an entry with this uniqueid already exists.
/var/log/dirsrv/slapd-M2/errors:[07/Mar/2016:12:08:37 -0500] NSMMReplicationPlugin - Need to create replication keep alive entry <cn=repl keep alive 2,dc=example,dc=com>
/var/log/dirsrv/slapd-M2/errors:[07/Mar/2016:12:08:37 -0500] NSMMReplicationPlugin - add dn: cn=repl keep alive 2,dc=example,dc=com
/var/log/dirsrv/slapd-M2/errors:[07/Mar/2016:12:08:38 -0500] NSMMReplicationPlugin - Purged state information from entry cn=repl keep alive 2,dc=example,dc=com up to CSN 56d47b16000000020000
/var/log/dirsrv/slapd-M2/errors:[07/Mar/2016:12:08:58 -0500] ��@ - urp_add (cn=repl keep alive 1,dc=example,dc=com): an entry with this uniqueid already exists.
/var/log/dirsrv/slapd-M2/errors:[07/Mar/2016:12:09:10 -0500] NSMMReplicationPlugin - agmt="cn=M1_to_M2" (localhost:389): replay_update: Sending add operation (dn="cn=repl keep alive 2,dc=example,dc=com" csn=56ddb595000100020000)
/var/log/dirsrv/slapd-M2/errors:[07/Mar/2016:12:09:11 -0500] NSMMReplicationPlugin - agmt="cn=M1_to_M2" (localhost:389): replay_update: Sending modify operation (dn="cn=repl keep alive 2,dc=example,dc=com" csn=56ddb596000100020000)
/var/log/dirsrv/slapd-M2/errors:[07/Mar/2016:12:09:14 -0500] ��@ - urp_add (cn=repl keep alive 1,dc=example,dc=com): an entry with this uniqueid already exists.
/var/log/dirsrv/slapd-M2/errors:[07/Mar/2016:12:09:21 -0500] ��@ - urp_add (cn=repl keep alive 1,dc=example,dc=com): an entry with this uniqueid already exists.
[root@ipaqa64vml export]# 

The number 100 may not be accurate here, but may be entry creation is taking some time.

But as it is not breaking the replication and original reported issues is solved, I am marking this bug as VERIFIED.

Comment 14 errata-xmlrpc 2016-03-22 14:47:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0468.html