Bug 1171358 - [RFE] - Make ReplicaWaitForAsyncResults configurable
Summary: [RFE] - Make ReplicaWaitForAsyncResults configurable
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: 389-ds-base
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Noriko Hosoi
QA Contact: Viktor Ashirov
URL:
Whiteboard:
Depends On:
Blocks: 1258997
TreeView+ depends on / blocked
 
Reported: 2014-12-06 00:08 UTC by Noriko Hosoi
Modified: 2020-09-13 21:16 UTC (History)
4 users (show)

Fixed In Version: 389-ds-base-1.3.4.0-1.el7
Doc Type: Enhancement
Doc Text:
To improve replication throughput, the nsDS5ReplicaWaitForAsyncResults attribute has been added to the nsDS5ReplicationAgreement class. The attribute defines how long a supplier waits for the response from a consumer. Its value is specified in milliseconds; the default is 1 second.
Clone Of:
Environment:
Last Closed: 2015-11-19 11:42:16 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github 389ds 389-ds-base issues 1288 0 None None None 2020-09-13 21:16:25 UTC
Red Hat Product Errata RHBA-2015:2351 0 normal SHIPPED_LIVE 389-ds-base bug fix and enhancement update 2015-11-19 10:28:44 UTC

Description Noriko Hosoi 2014-12-06 00:08:33 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/47957

Currently, supplier sleeps 1 second if it finds the response from consumer is not ready.
{{{
plugins/replication/repl5_inc_protocol.c
@@ -485,7 +486,7 @@ repl5_inc_waitfor_async_results(result_data *rd)
        }
        PR_Unlock(rd->lock);
        /* If not then sleep a bit */
        DS_Sleep(PR_SecondsToInterval(1));
}}}
The attached patch makes it configurable.

Comment 2 Viktor Ashirov 2015-08-28 09:11:27 UTC
Hi Noriko,

is there a design doc for this new parameter?
Also shouldn't this bug be an RFE? 

Thanks!

Comment 3 Noriko Hosoi 2015-09-01 17:23:38 UTC
(In reply to Viktor Ashirov from comment #2)
> is there a design doc for this new parameter?
> Also shouldn't this bug be an RFE? 

Thank, Viktor.  You are right.  This is an RFE.

The change is quite small...  The waiting time for the consumer ready used to be hardcoded to 1 second, which could be too long in the advanced network environment.  To improve the throughput of the replication, a new config parameter nsDS5ReplicaWaitForAsyncResults (in milliseconds) is introduced.  The default value is 1 second.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Description: Introducing a config attr nsDS5ReplicaWaitForAsyncResults
to the agreement entry.
  dn: cn=<AGREEMENT>,cn=replica,cn="<SUFFIX>",cn=mapping tree,cn=config
  nsDS5ReplicaWaitForAsyncResults: <integer in millisecond>

Prior to this patch, supplier sleeps 1 second if it finds the response
from consumer is not ready.  1 second could be too long if higher
replication throughput is required.

This patch makes the waiting time configurable, and change the default
to 100 millisecond.  If the attribute nsDS5ReplicaWaitForAsyncResults
does not exist or the value is 0, the default value is set.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Also, filed a doc bug: bz 1258997.

Comment 4 Simon Pichugin 2015-09-14 12:19:54 UTC
Hi Noriko,

I work on the test suite for this feature.
Can you please clarify some things for me?

My actions to detect the feature operability:
- Setup MMR 3 replica
- Set nsslapd-errorlog-level: 8192 (Replication debugging) for each instance
- Set nsDS5ReplicaWaitForAsyncResults: 5000 for each agreement
- Check, that nsDS5ReplicaWaitForAsyncResults value was set successfully
- Add many entries to the first replica in a short time(I used reproducer.sh from upsteam ticket - https://fedorahosted.org/389/attachment/ticket/47957/reproducer.tar.gz)
- Grep /var/log/dirsrv/slapd-instance/errors for repl5_inc_waitfor_async_results

My result:
cat /var/log/dirsrv/slapd-master1/errors | grep inc_waitfor_async
[14/Sep/2015:11:50:50 +051800] - repl5_inc_waitfor_async_results: 0 6
[14/Sep/2015:11:50:51 +051800] - repl5_inc_waitfor_async_results: 0 6
[14/Sep/2015:11:50:55 +051800] - repl5_inc_waitfor_async_results: 6 6
[14/Sep/2015:11:50:56 +051800] - repl5_inc_waitfor_async_results: 6 6
[14/Sep/2015:11:51:23 +051800] - repl5_inc_waitfor_async_results: 10 13
[14/Sep/2015:11:51:23 +051800] - repl5_inc_waitfor_async_results: 12 13
[14/Sep/2015:11:51:28 +051800] - repl5_inc_waitfor_async_results: 13 13
[14/Sep/2015:11:51:28 +051800] - repl5_inc_waitfor_async_results: 13 13
[14/Sep/2015:11:51:33 +051800] - repl5_inc_waitfor_async_results: 29 32
...

So, as you can see, difference between lines(where last_message_id_received number != last_message_id_sent number) is not always a five seconds.

What could be the issue?
My test plan is not good enough? Or have I found a bug?

Comment 5 Noriko Hosoi 2015-09-14 16:27:48 UTC
Hi simon,

Could it be possible to change the topology to 2-way MMR?

I assume 3-way MMR is configured like this loop?
    Master1 <--> Master2 <--> Master3 <--> Master1
and on Master1, it logs both to-Master2 and to-Master3 which cannot be distinguished in the "repl5_inc_waitfor_async_results: ## ##" log, unfortunately...  (Please ease the expectation to 4 through 6 seconds due to the granularity of the timestamp...)

Comment 6 Simon Pichugin 2015-09-15 14:08:38 UTC
Ho Noriko,

it makes perfect sense. Thank you!

Now everything has been tested and works properly.

Comment 7 Noriko Hosoi 2015-09-15 17:19:28 UTC
Thank you for the good news, Simon!!
(I'm soooo relieved I did not bring in a regression... ;)

Comment 8 Simon Pichugin 2015-09-17 08:59:16 UTC
Build tested:
389-ds-base-1.3.4.0-15.el7.x86_64

Test suite for The Upstream:
https://fedorahosted.org/389/attachment/ticket/47957/0001-Ticket-47957-Add-replication-test-suite-for-a-wait-a.patch

Test results:
================================================= test session starts =================================================
platform linux2 -- Python 2.7.5 -- py-1.4.27 -- pytest-2.7.0 -- /usr/bin/python
rootdir: /tmp/ds/dirsrvtests/suites/replication, inifile: 
collected 10 items 

dirsrvtests/suites/replication/wait_for_async_feature_test.py::test_not_int_value PASSED
dirsrvtests/suites/replication/wait_for_async_feature_test.py::test_multi_value PASSED
dirsrvtests/suites/replication/wait_for_async_feature_test.py::test_value_check[waitfor_async_attr0] PASSED
dirsrvtests/suites/replication/wait_for_async_feature_test.py::test_value_check[waitfor_async_attr1] PASSED
dirsrvtests/suites/replication/wait_for_async_feature_test.py::test_value_check[waitfor_async_attr2] PASSED
dirsrvtests/suites/replication/wait_for_async_feature_test.py::test_value_check[waitfor_async_attr3] PASSED
dirsrvtests/suites/replication/wait_for_async_feature_test.py::test_behavior_with_value[waitfor_async_attr0] PASSED
dirsrvtests/suites/replication/wait_for_async_feature_test.py::test_behavior_with_value[waitfor_async_attr1] PASSED
dirsrvtests/suites/replication/wait_for_async_feature_test.py::test_behavior_with_value[waitfor_async_attr2] PASSED
dirsrvtests/suites/replication/wait_for_async_feature_test.py::test_behavior_with_value[waitfor_async_attr3] PASSED

============================================= 10 passed in 198.21 seconds =============================================

VERIFIED

Comment 9 errata-xmlrpc 2015-11-19 11:42:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2351.html


Note You need to log in before you can comment on or make changes to this bug.