1259949 – Fractional replication evaluates several times the same CSN

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1259949 - Fractional replication evaluates several times the same CSN

Summary: Fractional replication evaluates several times the same CSN

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	389-ds-base
Sub Component:
Version:	7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Noriko Hosoi
QA Contact:	Viktor Ashirov
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-09-03 21:53 UTC by Noriko Hosoi
Modified:	2020-09-13 21:32 UTC (History)
CC List:	6 users (show)
Fixed In Version:	389-ds-base-1.3.4.0-18.el7
Doc Type:	Bug Fix
Doc Text:	When multiple replica update vector (RUV) updates were skipped in fractional replication, RUV was not updated at the end of the session, and the next session restarted evaluating the same skipped updates. This update prevents the unnecessary replays.
Clone Of:
Environment:
Last Closed:	2015-11-19 11:44:13 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	389ds 389-ds-base issues 1597	None	closed	Fractional replication evaluates several times the same CSN	2021-01-22 08:41:33 UTC
Github	389ds 389-ds-base issues 1615	None	closed	free target entry when add operation fails	2021-01-22 08:41:33 UTC
Red Hat Product Errata	RHBA-2015:2351	normal	SHIPPED_LIVE	389-ds-base bug fix and enhancement update	2015-11-19 10:28:44 UTC

Description Noriko Hosoi 2015-09-03 21:53:15 UTC

This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/48266

In MMR topology with Fractional replication, the updates are evaluated by the RA to determine if they should be skipped or sent.

During a replication session, if from the starting CSN all the CSNs will be skipped, then the next session will start at the same starting point and will evaluate the same set of already skipped CSN.
This does not prevent fractional replication to work, because the next update to send will increase the consumer RUV (starting point).
But this behaviour should be improve.

However it can create problem, if the set of skipped CSN is large. In that case the supplier takes time to evaluate all of them. During that time it prevent others suppliers to acquire the consumer replica and to send their own updates.
The increasing delay of the backoff timer of the others supplier, will give them less and less chance to send their updates.

Some cases reported delayed replication by more than 1h.

Comment 2 thierry bordaz 2015-09-04 16:33:48 UTC

 Workaround:

Under extreme condition (if all updates are skipped and the list of them is large) replication can appear to be broken. A workaround is to apply periodic dummy updates on all servers.
Those updates need to be replicated ones (not updating filtered attributes). Periodicity could be 1/h or less, depending on the update rate.

Comment 8 Amita Sharma 2015-09-22 11:46:32 UTC

Executed https://fedorahosted.org/389/attachment/ticket/48266/0001-Ticket-48266-test-case.2.patch

//etc/sysconfig/dirsrv-*
INFO:lib389:List from /root/.dirsrv
INFO:lib389:list instance {'RUN_DIR': '/var/run/dirsrv', 'SERVER_ID': 'master_2', 'hostname': 'localhost.localdomain', 'ldap-port': 42389, 'ldap-secureport': None, 'DS_ROOT': '', 'deployed-dir': '/', 'INST_DIR': '/usr/lib64/dirsrv/slapd-master_2', 'SERVER_DIR': '/usr/lib64/dirsrv', 'server-id': 'master_2', 'SERVERBIN_DIR': '/usr/sbin', 'root-dn': 'cn=Directory Manager', 'user-id': 'dirsrv', 'CONFIG_DIR': '/etc/dirsrv/slapd-master_2', 'PRODUCT_NAME': 'slapd', 'suffix': 'dc=example,dc=com'}

DEBUG:lib389:running: /usr/sbin/remove-ds.pl -i slapd-master_2 
Instance slapd-master_2 removed.


========================================== 3 passed in 191.87 seconds ==========================================
wrote pytestdebug information to /export/ds/dirsrvtests/tickets/pytestdebug.log

Hence VERIFIED.

Comment 9 Viktor Ashirov 2015-09-22 15:52:17 UTC

Bug fix introduced a new regression, so marking this bug as ASSIGNED per https://bugzilla.redhat.com/show_bug.cgi?id=1243970#c10 and #c11.

Comment 11 Sankar Ramalingam 2015-09-23 10:04:46 UTC

Tested with the latest build of 389-ds-base-1.3.4.0-18 build and found no crash for mmraccept tests. Hence, marking the bug as Verified.

The test execution for mmraccept shows 2/67 FAIL which is the same as previous build 1.3.4.0-15.


Subject: SUCCESS: Acceptance 389-ds-base-1.3.4.0-18.el7.x86_64 - 98% passed
TET Tag: none
Report : http://vm-idm-005.lab.eng.pnq.redhat.com/qa/archive/beaker/RHEL-7.2-20150917.0/x86_64/389-ds-base-1.3.4.0-18.el7.x86_64/Linux/20150923-090059.html
DS version: 389-ds-base-1.3.4.0-18.el7.x86_64
DStet revision: 6395


############## Result  for  backend test :   mmrepl mmraccept run
    mmrepl mmraccept run elapse time : 00:28:07
    mmrepl mmraccept run Tests FAIL      : 2% (2/67)
    mmrepl mmraccept run Tests PASS      : 97% (65/67)

Comment 12 errata-xmlrpc 2015-11-19 11:44:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2351.html

Note You need to log in before you can comment on or make changes to this bug.