Bug 947486

Summary: [RFE] Introducing a user visible configuration variable for controlling replication retry time
Product: Red Hat Enterprise Linux 7 Reporter: Nathan Kinder <nkinder>
Component: 389-ds-baseAssignee: Rich Megginson <rmeggins>
Status: CLOSED CURRENTRELEASE QA Contact: IDM QE LIST <seceng-idm-qe-list>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 7.0CC: jgalipea, mkubik, mreynolds, nhosoi, sramling
Target Milestone: rcKeywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.3.1.2-1.el7 Doc Type: Enhancement
Doc Text:
Cause The default timeout of 5 minutes is too long for some environments. Consequence: The replication backoff timer goes to sleep for too long, possibly delaying replication. Change: Two new config attributes were created so the minimum and maximum values used by the backoff timer can be customized. Result: Replication can recover faster after a back off state has been entered.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-13 11:06:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nathan Kinder 2013-04-02 14:47:31 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/525

At present if one of the masters goes down in multi master replication scenario, the other jdb will try to replicate , on failure it retries after 3 seconds on consecutive failures it doubles the retry time to maximum of 300 seconds and finally it retries once in 300 seconds. 

It would be good if the max and min retry time interval been exposed as a configuration variable like replication timeout

Comment 1 mreynolds 2013-04-03 15:03:18 UTC
How to test new feature:

[1]  Setup a replication with a single master and a consumer
[2]  "replication" error logging must be enabled
[3]  Set the following attributes on the "master" replica:

  cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config

  nsds5ReplicaBackoffMin: <value in seconds>  default is 3 seconds
  nsds5ReplicaBackoffMax: <value in seconds>  default is 300 seconds

[4]  Stop the consumer
[5]  Make a udpate on master
[6]  Monitor the master's error log for messages like:

Replication session backing off for 49 seconds

In my testing I set the nsds5ReplicaBackoffMax to 50, but the way the backoff timer works it might not quite reach 50 everytime.  As you can see, mine hit 49 seconds, but it never goes above 50.

Comment 2 Rich Megginson 2013-10-01 23:26:31 UTC
moving all ON_QA bugs to MODIFIED in order to add them to the errata (can't add bugs in the ON_QA state to an errata).  When the errata is created, the bugs should be automatically moved back to ON_QA.

Comment 4 Milan Kubík 2014-02-13 16:31:54 UTC
The feature seems to be working with default values.

[13/Feb/2014:17:07:22 +0100] NSMMReplicationPlugin - agmt="cn=RA1" (dstet-rhel7:30004): Replication session backing off for 5 seconds
[13/Feb/2014:17:07:29 +0100] NSMMReplicationPlugin - agmt="cn=RA1" (dstet-rhel7:30004): Replication session backing off for 11 seconds
[13/Feb/2014:17:07:40 +0100] NSMMReplicationPlugin - agmt="cn=RA1" (dstet-rhel7:30004): Replication session backing off for 24 seconds
[13/Feb/2014:17:08:04 +0100] NSMMReplicationPlugin - agmt="cn=RA1" (dstet-rhel7:30004): Replication session backing off for 47 seconds
[13/Feb/2014:17:08:52 +0100] NSMMReplicationPlugin - agmt="cn=RA1" (dstet-rhel7:30004): Replication session backing off for 95 seconds
[13/Feb/2014:17:10:28 +0100] NSMMReplicationPlugin - agmt="cn=RA1" (dstet-rhel7:30004): Replication session backing off for 191 seconds
[13/Feb/2014:17:13:40 +0100] NSMMReplicationPlugin - agmt="cn=RA1" (dstet-rhel7:30004): Replication session backing off for 300 seconds
[13/Feb/2014:17:18:40 +0100] NSMMReplicationPlugin - agmt="cn=RA1" (dstet-rhel7:30004): Replication session backing off for 299 seconds
[13/Feb/2014:17:23:40 +0100] NSMMReplicationPlugin - agmt="cn=RA1" (dstet-rhel7:30004): Replication session backing off for 299 seconds
[13/Feb/2014:17:28:41 +0100] NSMMReplicationPlugin - agmt="cn=RA1" (dstet-rhel7:30004): Replication session backing off for 299 seconds

Comment 5 Sankar Ramalingam 2014-02-18 15:02:50 UTC
(In reply to Milan Kubík from comment #4)
> The feature seems to be working with default values.
> 
> [13/Feb/2014:17:07:22 +0100] NSMMReplicationPlugin - agmt="cn=RA1"
> (dstet-rhel7:30004): Replication session backing off for 5 seconds
> [13/Feb/2014:17:07:29 +0100] NSMMReplicationPlugin - agmt="cn=RA1"
> (dstet-rhel7:30004): Replication session backing off for 11 seconds
> [13/Feb/2014:17:07:40 +0100] NSMMReplicationPlugin - agmt="cn=RA1"
> (dstet-rhel7:30004): Replication session backing off for 24 seconds
> [13/Feb/2014:17:08:04 +0100] NSMMReplicationPlugin - agmt="cn=RA1"
> (dstet-rhel7:30004): Replication session backing off for 47 seconds
> [13/Feb/2014:17:08:52 +0100] NSMMReplicationPlugin - agmt="cn=RA1"
> (dstet-rhel7:30004): Replication session backing off for 95 seconds
> [13/Feb/2014:17:10:28 +0100] NSMMReplicationPlugin - agmt="cn=RA1"
> (dstet-rhel7:30004): Replication session backing off for 191 seconds
> [13/Feb/2014:17:13:40 +0100] NSMMReplicationPlugin - agmt="cn=RA1"
> (dstet-rhel7:30004): Replication session backing off for 300 seconds
So, it means, the retry happening after 5 mins. As per the Doc Text, it is expected to wait for a max of 2mins not 5.
> [13/Feb/2014:17:18:40 +0100] NSMMReplicationPlugin - agmt="cn=RA1"
> (dstet-rhel7:30004): Replication session backing off for 299 seconds
> [13/Feb/2014:17:23:40 +0100] NSMMReplicationPlugin - agmt="cn=RA1"
> (dstet-rhel7:30004): Replication session backing off for 299 seconds
> [13/Feb/2014:17:28:41 +0100] NSMMReplicationPlugin - agmt="cn=RA1"
> (dstet-rhel7:30004): Replication session backing off for 299 seconds

Comment 6 Milan Kubík 2014-02-19 13:12:24 UTC
Manually tried these combinations :

> nsds5ReplicaBackoffMin: 0
> nsds5ReplicaBackoffMax: 15
NSMMReplicationPlugin - agmt="cn=RA1" (dstet-rhel7:30004): Replication session backing off for 5 seconds
NSMMReplicationPlugin - agmt="cn=RA1" (dstet-rhel7:30004): Replication session backing off for 12 seconds
NSMMReplicationPlugin - agmt="cn=RA1" (dstet-rhel7:30004): Replication session backing off for 14 seconds
NSMMReplicationPlugin - agmt="cn=RA1" (dstet-rhel7:30004): Replication session backing off for 14 seconds

> nsds5ReplicaBackoffMin: 15
> nsds5ReplicaBackoffMax: 15
NSMMReplicationPlugin - agmt="cn=RA1" (dstet-rhel7:30004): Replication session backing off for 14 seconds
NSMMReplicationPlugin - agmt="cn=RA1" (dstet-rhel7:30004): Replication session backing off for 14 seconds
NSMMReplicationPlugin - agmt="cn=RA1" (dstet-rhel7:30004): Replication session backing off for 14 seconds

> nsds5ReplicaBackoffMin: 15
> nsds5ReplicaBackoffMax: 10
default values used

> nsds5ReplicaBackoffMin: 1
> nsds5ReplicaBackoffMax: 0

default values used

These work as expected. Marking bugzilla as verified. Any bugs will be filled as separate bugzillas.

Verified with 389-ds-base-1.3.1.6-18.el7.

Comment 7 Ludek Smid 2014-06-13 11:06:55 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.