Bug 1259949 - Fractional replication evaluates several times the same CSN
Summary: Fractional replication evaluates several times the same CSN
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: 389-ds-base
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Noriko Hosoi
QA Contact: Viktor Ashirov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-03 21:53 UTC by Noriko Hosoi
Modified: 2020-09-13 21:32 UTC (History)
6 users (show)

Fixed In Version: 389-ds-base-1.3.4.0-18.el7
Doc Type: Bug Fix
Doc Text:
When multiple replica update vector (RUV) updates were skipped in fractional replication, RUV was not updated at the end of the session, and the next session restarted evaluating the same skipped updates. This update prevents the unnecessary replays.
Clone Of:
Environment:
Last Closed: 2015-11-19 11:44:13 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github 389ds 389-ds-base issues 1597 0 None closed Fractional replication evaluates several times the same CSN 2021-01-22 08:41:33 UTC
Github 389ds 389-ds-base issues 1615 0 None closed free target entry when add operation fails 2021-01-22 08:41:33 UTC
Red Hat Product Errata RHBA-2015:2351 0 normal SHIPPED_LIVE 389-ds-base bug fix and enhancement update 2015-11-19 10:28:44 UTC

Description Noriko Hosoi 2015-09-03 21:53:15 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/48266

In MMR topology with Fractional replication, the updates are evaluated by the RA to determine if they should be skipped or sent.

During a replication session, if from the starting CSN all the CSNs will be skipped, then the next session will start at the same starting point and will evaluate the same set of already skipped CSN.
This does not prevent fractional replication to work, because the next update to send will increase the consumer RUV (starting point).
But this behaviour should be improve.

However it can create problem, if the set of skipped CSN is large. In that case the supplier takes time to evaluate all of them. During that time it prevent others suppliers to acquire the consumer replica and to send their own updates.
The increasing delay of the backoff timer of the others supplier, will give them less and less chance to send their updates.

Some cases reported delayed replication by more than 1h.

Comment 2 thierry bordaz 2015-09-04 16:33:48 UTC
 Workaround:

Under extreme condition (if all updates are skipped and the list of them is large) replication can appear to be broken. A workaround is to apply periodic dummy updates on all servers.
Those updates need to be replicated ones (not updating filtered attributes). Periodicity could be 1/h or less, depending on the update rate.

Comment 8 Amita Sharma 2015-09-22 11:46:32 UTC
Executed https://fedorahosted.org/389/attachment/ticket/48266/0001-Ticket-48266-test-case.2.patch

//etc/sysconfig/dirsrv-*
INFO:lib389:List from /root/.dirsrv
INFO:lib389:list instance {'RUN_DIR': '/var/run/dirsrv', 'SERVER_ID': 'master_2', 'hostname': 'localhost.localdomain', 'ldap-port': 42389, 'ldap-secureport': None, 'DS_ROOT': '', 'deployed-dir': '/', 'INST_DIR': '/usr/lib64/dirsrv/slapd-master_2', 'SERVER_DIR': '/usr/lib64/dirsrv', 'server-id': 'master_2', 'SERVERBIN_DIR': '/usr/sbin', 'root-dn': 'cn=Directory Manager', 'user-id': 'dirsrv', 'CONFIG_DIR': '/etc/dirsrv/slapd-master_2', 'PRODUCT_NAME': 'slapd', 'suffix': 'dc=example,dc=com'}

DEBUG:lib389:running: /usr/sbin/remove-ds.pl -i slapd-master_2 
Instance slapd-master_2 removed.


========================================== 3 passed in 191.87 seconds ==========================================
wrote pytestdebug information to /export/ds/dirsrvtests/tickets/pytestdebug.log

Hence VERIFIED.

Comment 9 Viktor Ashirov 2015-09-22 15:52:17 UTC
Bug fix introduced a new regression, so marking this bug as ASSIGNED per https://bugzilla.redhat.com/show_bug.cgi?id=1243970#c10 and #c11.

Comment 11 Sankar Ramalingam 2015-09-23 10:04:46 UTC
Tested with the latest build of 389-ds-base-1.3.4.0-18 build and found no crash for mmraccept tests. Hence, marking the bug as Verified.

The test execution for mmraccept shows 2/67 FAIL which is the same as previous build 1.3.4.0-15.


Subject: SUCCESS: Acceptance 389-ds-base-1.3.4.0-18.el7.x86_64 - 98% passed
TET Tag: none
Report : http://vm-idm-005.lab.eng.pnq.redhat.com/qa/archive/beaker/RHEL-7.2-20150917.0/x86_64/389-ds-base-1.3.4.0-18.el7.x86_64/Linux/20150923-090059.html
DS version: 389-ds-base-1.3.4.0-18.el7.x86_64
DStet revision: 6395


############## Result  for  backend test :   mmrepl mmraccept run
    mmrepl mmraccept run elapse time : 00:28:07
    mmrepl mmraccept run Tests FAIL      : 2% (2/67)
    mmrepl mmraccept run Tests PASS      : 97% (65/67)

Comment 12 errata-xmlrpc 2015-11-19 11:44:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2351.html


Note You need to log in before you can comment on or make changes to this bug.