Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1309963

Summary:	keep alive entries can break replication
Product:	Red Hat Enterprise Linux 7	Reporter:	Jan Kurik <jkurik>
Component:	389-ds-base	Assignee:	Noriko Hosoi <nhosoi>
Status:	CLOSED ERRATA	QA Contact:	Viktor Ashirov <vashirov>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	7.3	CC:	ekeck, mkolaja, msauton, nhosoi, nkinder, rmeggins, sramling, tbordaz
Target Milestone:	rc	Keywords:	ZStream
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	389-ds-base-1.3.4.0-27.el7	Doc Type:	Bug Fix
Doc Text:	Previously, a keep alive entry was being created at too many opportunities during replication, potentially causing a race condition when adding the entry to the replica changelog and resulting in operations being dropped from the replication. With this update, unnecessary keep alive entry creation has been eliminated, and missing replication no longer occurs.	Story Points:	---
Clone Of:	1307151	Environment:
Last Closed:	2016-03-31 22:04:39 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1307151
Bug Blocks:

Description Jan Kurik 2016-02-19 06:05:08 UTC

This bug has been copied from bug #1307151 and has been proposed
to be backported to 7.2 z-stream (EUS).

Comment 4 Sankar Ramalingam 2016-03-01 13:48:56 UTC

Hi Noriko, to verify this bug... I understand we need to create replication with "cn=repl keep alive" entry under the suffix. However, its not clear enough how the setup should be. So, request you to provide the steps to reproduce this.

Comment 5 Noriko Hosoi 2016-03-01 17:10:36 UTC

Hi Thierry, 

You put a reproducer in this comment which requires IPA:
https://fedorahosted.org/389/ticket/48445#comment:1

Could there be an easy steps that duplicate this scenario?  It seems total update triggered this bug.  Just repeating the total updates and checking that the data loss did not occur is good enough?

Or it's safer to run the IPA install and check?

Comment 6 thierry bordaz 2016-03-01 17:29:20 UTC

Hi Noriko,

The bug silently cleared (from the CL) the ADD of the replica keep alive entry.
This ADD occurred (on the consumer side) at the end of the total init.
It was not a detected, until the replica needs to update the keep alive entry. Those updates hit err=32 when they were replicated.

So the reproducer are:

  - Create a Master and a Replica

  - Configure the Master, with fractional replication so that it skips sending updates on 'description'

  - Do a total init of the replica
    (with the bug fix, it does not create the keep alive entry)

  - ON Master loop doing more than 100 updates of 'description' attribute (skipped by replica agreement)

  - Check that the keep alive entry (ADD) is created and sent the replica

  - Checks that the update of the keep alive entry is replicated

Comment 8 Sankar Ramalingam 2016-03-07 06:04:43 UTC

1. Created 6 master replication setup.
2. Created few users and groups on M1
3. Total init completed from M1. No fractional replication configured.
4. Configured fractional replication for telePhoneNumber attribute
5. Run total init to check if telePhoneNumber is removed from other Masters and Consumer than M1.
6. Its successful. telePhoneNumber attribute removed in all masters but M1
7. About 100 modify operations for telePhoneNumber attribute on M1.

[root@vm-idm-004 ~]# no=1 ; while [ $no -lt 101 ]; do ldapmodify -x -p 1189 -h localhost -D "cn=Directory Manager" -w Secret123 << EOF > /dev/null
dn: uid=users1189users7,ou=People,dc=passsync,dc=com
replace: telephoneNumber
telephoneNumber: 999999$no
EOF            
 sleep 0.3; no=`expr $no + 1`; done

8. Checked if telePhone attribute values exist on other masters. 

[root@vm-idm-004 ~]# for PORT in `echo "1189 1289 1389 1489 2189 2289"`; do ldapsearch -x -p $PORT -h localhost -D "cn=Directory Manager" -w Secret123 -b "uid=users1189users7,ou=People,dc=passsync,dc=com" |grep -i tele ; RC=$? ; if [ $RC -eq 0 ]; then echo "telePhoneNumber fractional attribute only on PORT-$PORT"; fi ; done
telephoneNumber: 999999100
telePhoneNumber fractional attribute only on PORT-1189

9. Checking "cn=repl keep alive" entry on all servers.
[root@vm-idm-004 ~]# grep -li "cn=repl keep" /var/log/dirsrv/slapd-*/errors
/var/log/dirsrv/slapd-M1/errors
/var/log/dirsrv/slapd-M4/errors
/var/log/dirsrv/slapd-M5/errors

I will redo the testing with the fractional replication setup in start and update the bug with my findings.

Comment 9 Sankar Ramalingam 2016-03-08 14:44:44 UTC

This time on a fresh setup.

1. Created 6 master replication setup with fractional replication for description and telephoneNumber attributes.
2. Created few users and groups on M1.
3. Total init completed from M1.
4. Checked if "cn=repl keep alive $replica_id,dc=passsync,dc=com is created. 
5. "cn=repl keep alive 2211,dc=passsync,dc=com" is created on all masters.


for PORT in `echo "1189 1289 1389 1489 2189 2289 3189 3289"`; do ldapsearch -x -p $PORT -h localhost -D "cn=Directory Manager" -w Secret123 -b "cn=repl keep alive 2211,dc=passsync,dc=com" -s base |grep -i "dn: cn=repl keep alive 2211" ; RC=$? ; if [ $RC -eq 0 ]; then echo "cn=repl keep alive entry created on PORT-$PORT"; fi ; done
dn: cn=repl keep alive 2211,dc=passsync,dc=com
cn=repl keep alive entry created on PORT-1189
dn: cn=repl keep alive 2211,dc=passsync,dc=com
cn=repl keep alive entry created on PORT-1289
dn: cn=repl keep alive 2211,dc=passsync,dc=com
cn=repl keep alive entry created on PORT-1389
dn: cn=repl keep alive 2211,dc=passsync,dc=com
cn=repl keep alive entry created on PORT-1489
dn: cn=repl keep alive 2211,dc=passsync,dc=com
cn=repl keep alive entry created on PORT-2189
dn: cn=repl keep alive 2211,dc=passsync,dc=com
cn=repl keep alive entry created on PORT-2289
dn: cn=repl keep alive 2211,dc=passsync,dc=com
cn=repl keep alive entry created on PORT-3189
dn: cn=repl keep alive 2211,dc=passsync,dc=com
cn=repl keep alive entry created on PORT-3289


Note: I haven't yet run ldapmodify for frational attributes in M1. As per your comment, the entry "cn=repl keep alive 2211,dc=passsync,dc=com" should be created only after ldapmodify operations. I appreciate, if you could give us more instructions to proceed with the bug verification.

Thanks.

Comment 10 thierry bordaz 2016-03-08 15:15:47 UTC

Hi Sankar,

keep alive entry is created in two cases:

* If a supplier does a total init of a consumer, it first checks that its own keep alive entry exists.

* If any instance needs to update its keep alive entry (for example, it skips more than 100 updates), then if the keep alive entry does not exist it creates it first.

In your case, the keep alive entry of M1 is created (although no updates was done on M1) because M1 did a total update of others replica.

thanks
theirry

Comment 11 Sankar Ramalingam 2016-03-09 14:16:47 UTC

Thanks Thierry for answers :). So, as per your comment #10, the feature seems to be working. Anyways, I modified fractional replication attributes more than 100 times and checked the cn=repl keep alive entry and replica states. Everything seems working fine. Hence, marking the bug as Verified.

[root@vm-idm-004 ~]# rpm -qa |grep -i 389-ds
389-ds-base-debuginfo-1.3.4.0-27.el7_2.x86_64
389-ds-base-libs-1.3.4.0-27.el7_2.x86_64
389-ds-base-devel-1.3.4.0-27.el7_2.x86_64
389-ds-base-1.3.4.0-27.el7_2.x86_64

Comment 13 errata-xmlrpc 2016-03-31 22:04:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0550.html