Bug 1259949

Summary: Fractional replication evaluates several times the same CSN
Product: Red Hat Enterprise Linux 7 Reporter: Noriko Hosoi <nhosoi>
Component: 389-ds-baseAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED ERRATA QA Contact: Viktor Ashirov <vashirov>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: amsharma, nkinder, rmeggins, rvdwees, sramling, tbordaz
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.3.4.0-18.el7 Doc Type: Bug Fix
Doc Text:
When multiple replica update vector (RUV) updates were skipped in fractional replication, RUV was not updated at the end of the session, and the next session restarted evaluating the same skipped updates. This update prevents the unnecessary replays.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 11:44:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Noriko Hosoi 2015-09-03 21:53:15 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/48266

In MMR topology with Fractional replication, the updates are evaluated by the RA to determine if they should be skipped or sent.

During a replication session, if from the starting CSN all the CSNs will be skipped, then the next session will start at the same starting point and will evaluate the same set of already skipped CSN.
This does not prevent fractional replication to work, because the next update to send will increase the consumer RUV (starting point).
But this behaviour should be improve.

However it can create problem, if the set of skipped CSN is large. In that case the supplier takes time to evaluate all of them. During that time it prevent others suppliers to acquire the consumer replica and to send their own updates.
The increasing delay of the backoff timer of the others supplier, will give them less and less chance to send their updates.

Some cases reported delayed replication by more than 1h.

Comment 2 thierry bordaz 2015-09-04 16:33:48 UTC
 Workaround:

Under extreme condition (if all updates are skipped and the list of them is large) replication can appear to be broken. A workaround is to apply periodic dummy updates on all servers.
Those updates need to be replicated ones (not updating filtered attributes). Periodicity could be 1/h or less, depending on the update rate.

Comment 8 Amita Sharma 2015-09-22 11:46:32 UTC
Executed https://fedorahosted.org/389/attachment/ticket/48266/0001-Ticket-48266-test-case.2.patch

//etc/sysconfig/dirsrv-*
INFO:lib389:List from /root/.dirsrv
INFO:lib389:list instance {'RUN_DIR': '/var/run/dirsrv', 'SERVER_ID': 'master_2', 'hostname': 'localhost.localdomain', 'ldap-port': 42389, 'ldap-secureport': None, 'DS_ROOT': '', 'deployed-dir': '/', 'INST_DIR': '/usr/lib64/dirsrv/slapd-master_2', 'SERVER_DIR': '/usr/lib64/dirsrv', 'server-id': 'master_2', 'SERVERBIN_DIR': '/usr/sbin', 'root-dn': 'cn=Directory Manager', 'user-id': 'dirsrv', 'CONFIG_DIR': '/etc/dirsrv/slapd-master_2', 'PRODUCT_NAME': 'slapd', 'suffix': 'dc=example,dc=com'}

DEBUG:lib389:running: /usr/sbin/remove-ds.pl -i slapd-master_2 
Instance slapd-master_2 removed.


========================================== 3 passed in 191.87 seconds ==========================================
wrote pytestdebug information to /export/ds/dirsrvtests/tickets/pytestdebug.log

Hence VERIFIED.

Comment 9 Viktor Ashirov 2015-09-22 15:52:17 UTC
Bug fix introduced a new regression, so marking this bug as ASSIGNED per https://bugzilla.redhat.com/show_bug.cgi?id=1243970#c10 and #c11.

Comment 11 Sankar Ramalingam 2015-09-23 10:04:46 UTC
Tested with the latest build of 389-ds-base-1.3.4.0-18 build and found no crash for mmraccept tests. Hence, marking the bug as Verified.

The test execution for mmraccept shows 2/67 FAIL which is the same as previous build 1.3.4.0-15.


Subject: SUCCESS: Acceptance 389-ds-base-1.3.4.0-18.el7.x86_64 - 98% passed
TET Tag: none
Report : http://vm-idm-005.lab.eng.pnq.redhat.com/qa/archive/beaker/RHEL-7.2-20150917.0/x86_64/389-ds-base-1.3.4.0-18.el7.x86_64/Linux/20150923-090059.html
DS version: 389-ds-base-1.3.4.0-18.el7.x86_64
DStet revision: 6395


############## Result  for  backend test :   mmrepl mmraccept run
    mmrepl mmraccept run elapse time : 00:28:07
    mmrepl mmraccept run Tests FAIL      : 2% (2/67)
    mmrepl mmraccept run Tests PASS      : 97% (65/67)

Comment 12 errata-xmlrpc 2015-11-19 11:44:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2351.html