Bug 1419051 - replication: unable to receive response till nsds5replicaTimeout
Summary: replication: unable to receive response till nsds5replicaTimeout
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: 389-ds-base
Version: 6.8
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: rc
: ---
Assignee: Noriko Hosoi
QA Contact: Viktor Ashirov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-03 14:32 UTC by German Parente
Modified: 2020-12-14 08:08 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-15 20:47:00 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description German Parente 2017-02-03 14:32:57 UTC
Description of problem:

very often in IPA context and pure RHDS, we see these errors in the logs:

[03/Feb/2017:10:36:16.125219254 +0100] NSMMReplicationPlugin - agmt="cn=agmt" (host:389): Unable to receive the response for a startReplication extended operation to consumer (Timed out). Will retry later.
[03/Feb/2017:10:36:27.091234210 +0100] NSMMReplicationPlugin - agmt="cn=agmt2" (host2:389): Unable to receive the response for a startReplication extended operation to consumer (Timed out). Will retry later.

In RHDS, as the timeout by default is set to 10 munutes (#define DEFAULT_TIMEOUT 600)

So, this can provoke situations where at stop time we need to wait for 10 minutes for the server to stop.

Regarding this situation, I have a pstack in RHEL6 from customer:

Thread 2 (Thread 0x7eff4b5fe700 (LWP 31219)):
#0  0x00007eff6b658334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007eff6b65360e in _L_lock_995 () from /lib64/libpthread.so.0
#2  0x00007eff6b653576 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007eff6bca8669 in PR_Lock () from /lib64/libnspr4.so
#4  0x00007eff63c1f914 in conn_read_result_ex () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so
#5  0x00007eff63c282ea in release_replica () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so
#6  0x00007eff63c224a3 in repl5_inc_run () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so
#7  0x00007eff63c27a15 in prot_thread_main () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so
#8  0x00007eff6bcaec13 in ?? () from /lib64/libnspr4.so
#9  0x00007eff6b651aa1 in start_thread () from /lib64/libpthread.so.0
#10 0x00007eff6b39e93d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7eff6dd197c0 (LWP 9193)):
#0  0x00007eff6b65568c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff6bca914e in PR_WaitCondVar () from /lib64/libnspr4.so
#2  0x00007eff6bcae671 in PR_Cleanup () from /lib64/libnspr4.so
#3  0x000000000041f232 in main ()

The exact bug for RHEL7 is 1419050


Note You need to log in before you can comment on or make changes to this bug.